[ 
https://issues.apache.org/jira/browse/DRILL-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-7597:
-------------------------------
    Fix Version/s:     (was: 1.19.0)

> Read selected JSON colums as JSON text
> --------------------------------------
>
>                 Key: DRILL-7597
>                 URL: https://issues.apache.org/jira/browse/DRILL-7597
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.17.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> See DRILL-7598. The use case wishes to read selected JSON columns as JSON 
> text rather than parsing the JSON into a relational structure as is done 
> today in the JSON reader.
> The JSON reader supports "all text mode", but, despite the name, this mode 
> only works for scalars (primitives) such as numbers. It does not work for 
> structured types such as objects or arrays: such types are always parsed into 
> Drill structures (which causes the conflict describe in DRILL-7598.)
> Instead, we need a feature to read an entire JSON value, including structure, 
> as a JSON string.
> This feature would work best when the user can parse some parts of a JSON 
> input file into relational structure, others as JSON. (This is the use case 
> which the user list user faced.) So, we need a way to do that.
> Drill has a "provided schema" feature, which, at present, is used only for 
> text files (and recently with limited support in Avro.) We are working on a 
> project to add such support for JSON.
> Perhaps we can leverage this feature to allow the JSON reader to read chunks 
> of JSON as text which can be manipulated by those future JSON functions. In 
> the example, column "c" would be read as JSON text; Drill would not attempt 
> to parse it into a relational structure.
> As it turns out, the "new" JSON reader we're working on originally had a 
> feature to do just that, but we took it out because we were not sure it was 
> needed. Sounds like we should restore it as part of our "provided schema" 
> support. It could work this way: if you CREATE SCHEMA with column "c" as 
> VARCHAR (maybe with a hint to read as JSON), the JSON parser would read the 
> entire nested structure as JSON without trying to parse it into a relational 
> structure.
> This ticket asks to build the concept:
>  * Allow a `CREATE SCHEMA` option (to be designed) to designate a JSON field 
> to be read as JSON.
>  * Implement the "read column as JSON" feature in the new EVF-based JSON 
> reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to