[
https://issues.apache.org/jira/browse/DRILL-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers updated DRILL-7597:
-------------------------------
Description:
See DRILL-7598. The use case wishes to read selected JSON columns as JSON text
rather than parsing the JSON into a relational structure as is done today in
the JSON reader.
The JSON reader supports "all text mode", but, despite the name, this mode only
works for scalars (primitives) such as numbers. It does not work for structured
types such as objects or arrays: such types are always parsed into Drill
structures (which causes the conflict describe in DRILL-7598.)
Instead, we need a feature to read an entire JSON value, including structure,
as a JSON string.
This feature would work best when the user can parse some parts of a JSON input
file into relational structure, others as JSON. (This is the use case which the
user list user faced.) So, we need a way to do that.
Drill has a "provided schema" feature, which, at present, is used only for text
files (and recently with limited support in Avro.) We are working on a project
to add such support for JSON.
Perhaps we can leverage this feature to allow the JSON reader to read chunks of
JSON as text which can be manipulated by those future JSON functions. In the
example, column "c" would be read as JSON text; Drill would not attempt to
parse it into a relational structure.
As it turns out, the "new" JSON reader we're working on originally had a
feature to do just that, but we took it out because we were not sure it was
needed. Sounds like we should restore it as part of our "provided schema"
support. It could work this way: if you CREATE SCHEMA with column "c" as
VARCHAR (maybe with a hint to read as JSON), the JSON parser would read the
entire nested structure as JSON without trying to parse it into a relational
structure.
This ticket asks to build the concept:
* Allow a `CREATE SCHEMA` option (to be designed) to designate a JSON field to
be read as JSON.
* Implement the "read column as JSON" feature in the new EVF-based JSON reader.
was:
See . The use case wishes to read selected JSON columns as JSON text rather
than parsing the JSON into a relational structure as is done today in the JSON
reader.
The JSON reader supports "all text mode", but, despite the name, this mode only
works for scalars (primitives) such as numbers. It does not work for structured
types such as objects or arrays: such types are always parsed into Drill
structures (which causes the conflict describe in __.)
Instead, we need a feature to read an entire JSON value, including structure,
as a JSON string.
This feature would work best when the user can parse some parts of a JSON input
file into relational structure, others as JSON. (This is the use case which the
user list user faced.) So, we need a way to do that.
Drill has a "provided schema" feature, which, at present, is used only for text
files (and recently with limited support in Avro.) We are working on a project
to add such support for JSON.
Perhaps we can leverage this feature to allow the JSON reader to read chunks of
JSON as text which can be manipulated by those future JSON functions. In the
example, column "c" would be read as JSON text; Drill would not attempt to
parse it into a relational structure.
As it turns out, the "new" JSON reader we're working on originally had a
feature to do just that, but we took it out because we were not sure it was
needed. Sounds like we should restore it as part of our "provided schema"
support. It could work this way: if you CREATE SCHEMA with column "c" as
VARCHAR (maybe with a hint to read as JSON), the JSON parser would read the
entire nested structure as JSON without trying to parse it into a relational
structure.
This ticket asks to build the concept:
* Allow a `CREATE SCHEMA` option (to be designed) to designate a JSON field to
be read as JSON.
* Implement the "read column as JSON" feature in the new EVF-based JSON reader.
> Read selected JSON colums as JSON text
> --------------------------------------
>
> Key: DRILL-7597
> URL: https://issues.apache.org/jira/browse/DRILL-7597
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.17.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Major
> Fix For: 1.18.0
>
>
> See DRILL-7598. The use case wishes to read selected JSON columns as JSON
> text rather than parsing the JSON into a relational structure as is done
> today in the JSON reader.
> The JSON reader supports "all text mode", but, despite the name, this mode
> only works for scalars (primitives) such as numbers. It does not work for
> structured types such as objects or arrays: such types are always parsed into
> Drill structures (which causes the conflict describe in DRILL-7598.)
> Instead, we need a feature to read an entire JSON value, including structure,
> as a JSON string.
> This feature would work best when the user can parse some parts of a JSON
> input file into relational structure, others as JSON. (This is the use case
> which the user list user faced.) So, we need a way to do that.
> Drill has a "provided schema" feature, which, at present, is used only for
> text files (and recently with limited support in Avro.) We are working on a
> project to add such support for JSON.
> Perhaps we can leverage this feature to allow the JSON reader to read chunks
> of JSON as text which can be manipulated by those future JSON functions. In
> the example, column "c" would be read as JSON text; Drill would not attempt
> to parse it into a relational structure.
> As it turns out, the "new" JSON reader we're working on originally had a
> feature to do just that, but we took it out because we were not sure it was
> needed. Sounds like we should restore it as part of our "provided schema"
> support. It could work this way: if you CREATE SCHEMA with column "c" as
> VARCHAR (maybe with a hint to read as JSON), the JSON parser would read the
> entire nested structure as JSON without trying to parse it into a relational
> structure.
> This ticket asks to build the concept:
> * Allow a `CREATE SCHEMA` option (to be designed) to designate a JSON field
> to be read as JSON.
> * Implement the "read column as JSON" feature in the new EVF-based JSON
> reader.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)