[
https://issues.apache.org/jira/browse/DRILL-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293477#comment-16293477
]
Paul Rogers commented on DRILL-6035:
------------------------------------
h4. JSON Structure
The [JSON standard|https://tools.ietf.org/html/rfc7159] as [described more
clearly here|https://www.json.org] states that a JSON document is a single
value (null, scalar, object or list). Drill support a non-standard (but common)
extension that allows a list of objects.
|| Document Structure || JSON Standard || Drill Support ||
| Empty | Invalid | Empty list of records |
| null | Valid | Invalid |
| Scalar | Valid | Invalid |
| Array | Valid | Valid (in Drill 1.13) as long as the value is an array of
objects |
| Object | Valid | Single record |
| List of objects | Invalid | List of records |
In Drill, there must be no commas between top-level objects. This is a clear
difference compared to the JSON standard which requires commas to separate
items in a list or object. (This difference is because Drill's JSON file
structure is not JSON. Think of it instead as a serialized set of JSON objects.)
h4. Drill JSON Document Structure
Thus, a typical JSON input file in Drill is:
{code}
{a: 10, b: "foo"}
{a: 20, b: "bar"}
{code}
As noted above, for JSON compatibility, Drill also supports a top-level array
of objects:
{code}
[
{a: 10, b: "foo"},
{a: 20, b: "bar"}
]
{code}
Note that, when the objects are in an array, a comma must separate objects.
The above applies to a JSON text file. No separator is implied (or needed) if
the data comes from a document database, a Kafka stream or other non-file
sources.
> Specify Drill's JSON behavior
> -----------------------------
>
> Key: DRILL-6035
> URL: https://issues.apache.org/jira/browse/DRILL-6035
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.13.0
> Reporter: Paul Rogers
> Assignee: Pritesh Maker
>
> Drill supports JSON as its native data format. However, experience suggests
> that Drill may have limitations in the JSON that Drill supports. This ticket
> asks to clarify Drill's expected behavior on various kinds of JSON.
> Topics to be addressed:
> * Relational vs. non-relational structures
> * JSON structures used in practice and how they map to Drill
> * Support for varying data types
> * Support for missing values, especially across files
> These topics are complex, hence the request to provide a detailed
> specifications that clarifies what Drill does and does not support (or what
> is should and should not support.)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)