[
https://issues.apache.org/jira/browse/DRILL-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293316#comment-16293316
]
Paul Rogers edited comment on DRILL-6035 at 12/19/17 5:42 AM:
--------------------------------------------------------------
h4. JSON Arrays
Drill supports simple arrays in JSON using the following rules:
* Arrays must contain hetrogeneous elements: any of the scalars described
above, or a JSON object.
(See a later comment for nested arrays.)
For example, the following are scalar arrays:
{code}
[10, 20]
[10.30, 10.45]
["foo", "bar"]
[true, false]
{code}
h4. Schema Change in Arrays
The following will trigger errors:
{code}
{a: [10, "foo"]} // Mixed types
{a: [10]} {a: ["foo"]} // Schema change
{a: [10, 12.5]} // Conflicting types: integer and float
{code}
h4. Nulls in Arrays
Drill handles nulls in arrays using the {{LIST}} type, described in a separate
note below.
h4. Late Type Identification
As described earlier, Drill will defer picking an array type if it sees null
values. For example:
{code}
{id: 1}
{id: 2, a: null}
{id: 3, a: []}
{id: 4, a: [10, 20, 30]}
{code}
In the above example, for id=2, Drill sees column `a` but does not pick a type.
For id=3, Drill identifies that `a` is an array, but does not know the type.
Finally, for id=4, Drill identifies the array as {{BIGINT}}.
h4. Null-Only Arrays
A special case occurs if a JSON file contains only empty arrays or arrays of
nulls (such as a file that contains only the first three records above.)
In Drill 1.12 and earlier, the result is a list of {{LATE}} elements (See the
List section below.) It seems that {{SqlLine}} will correctly show the null
values.
An interesting case occurs when Drill reads two files: one with an array with
only nulls, another with real values. For example:
{noformat}
File A: {a: [null, null] }
File B: {a: [10, 20] }
{noformat}
Drill is distributed: one fragment will read File A, another will read File B.
At some point, the two arrays will come together. One fragment will have
created a list of {{LATE}}, another a list of {{BIGINT}}. Most operators will
trigger a schema change error in this case.
Interestingly, however, if the query is a simple {{SELECT *}}, then the lists
are compatible and {{SqlLine}} will display the correct results.
In Drill 1.13, if the first batch contains only nulls and/or empty arrays,
Drill guesses that the type is an array of {{VARCHAR}}. Since this is only a
guess, a schema change will result if the guess is wrong.
was (Author: paul.rogers):
h4. JSON Arrays
Drill supports simple arrays in JSON using the following rules:
* Arrays must contain hetrogeneous elements: any of the scalars described
above, or a JSON object.
(See a later comment for nested arrays.)
For example, the following are scalar arrays:
{code}
[10, 20]
[10.30, 10.45]
["foo", "bar"]
[true, false]
{code}
h4. Schema Change in Arrays
The following will trigger errors:
{code}
{a: [10, "foo"]} // Mixed types
{a: [10]} {a: ["foo"]} // Schema change
{a: [10, 12.5]} // Conflicting types: integer and float
{code}
h4. Nulls in Arrays
Drill handles nulls in arrays using the {{LIST}} type, described in a separate
note below.
h4. Late Type Identification
As described earlier, Drill will defer picking an array type if it sees null
values. For example:
{code}
{id: 1}
{id: 2, a: null}
{id: 3, a: []}
{id: 4, a: [10, 20, 30]}
{code}
In the above example, for id=2, Drill sees column `a` but does not pick a type.
For id=3, Drill identifies that `a` is an array, but does not know the type.
Finally, for id=4, Drill identifies the array as {{BIGINT}}.
> Specify Drill's JSON behavior
> -----------------------------
>
> Key: DRILL-6035
> URL: https://issues.apache.org/jira/browse/DRILL-6035
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.13.0
> Reporter: Paul Rogers
> Assignee: Pritesh Maker
>
> Drill supports JSON as its native data format. However, experience suggests
> that Drill may have limitations in the JSON that Drill supports. This ticket
> asks to clarify Drill's expected behavior on various kinds of JSON.
> Topics to be addressed:
> * Relational vs. non-relational structures
> * JSON structures used in practice and how they map to Drill
> * Support for varying data types
> * Support for missing values, especially across files
> These topics are complex, hence the request to provide a detailed
> specifications that clarifies what Drill does and does not support (or what
> is should and should not support.)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)