[
https://issues.apache.org/jira/browse/DRILL-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Abhishek Girish updated DRILL-7597:
-----------------------------------
Target Version/s: 1.19.0
> Read selected JSON colums as JSON text
> --------------------------------------
>
> Key: DRILL-7597
> URL: https://issues.apache.org/jira/browse/DRILL-7597
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.17.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Major
> Fix For: 1.18.0
>
>
> See DRILL-7598. The use case wishes to read selected JSON columns as JSON
> text rather than parsing the JSON into a relational structure as is done
> today in the JSON reader.
> The JSON reader supports "all text mode", but, despite the name, this mode
> only works for scalars (primitives) such as numbers. It does not work for
> structured types such as objects or arrays: such types are always parsed into
> Drill structures (which causes the conflict describe in DRILL-7598.)
> Instead, we need a feature to read an entire JSON value, including structure,
> as a JSON string.
> This feature would work best when the user can parse some parts of a JSON
> input file into relational structure, others as JSON. (This is the use case
> which the user list user faced.) So, we need a way to do that.
> Drill has a "provided schema" feature, which, at present, is used only for
> text files (and recently with limited support in Avro.) We are working on a
> project to add such support for JSON.
> Perhaps we can leverage this feature to allow the JSON reader to read chunks
> of JSON as text which can be manipulated by those future JSON functions. In
> the example, column "c" would be read as JSON text; Drill would not attempt
> to parse it into a relational structure.
> As it turns out, the "new" JSON reader we're working on originally had a
> feature to do just that, but we took it out because we were not sure it was
> needed. Sounds like we should restore it as part of our "provided schema"
> support. It could work this way: if you CREATE SCHEMA with column "c" as
> VARCHAR (maybe with a hint to read as JSON), the JSON parser would read the
> entire nested structure as JSON without trying to parse it into a relational
> structure.
> This ticket asks to build the concept:
> * Allow a `CREATE SCHEMA` option (to be designed) to designate a JSON field
> to be read as JSON.
> * Implement the "read column as JSON" feature in the new EVF-based JSON
> reader.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)