[
https://issues.apache.org/jira/browse/DRILL-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293316#comment-16293316
]
Paul Rogers edited comment on DRILL-6035 at 12/19/17 5:25 AM:
--------------------------------------------------------------
h4. JSON Arrays
Drill supports simple arrays in JSON using the following rules:
* Arrays must contain hetrogeneous elements: any of the scalars described
above, or a JSON object.
(See a later comment for nested arrays.)
For example, the following are scalar arrays:
{code}
[10, 20]
[10.30, 10.45]
["foo", "bar"]
[true, false]
{code}
h4. Schema Change in Arrays
The following will trigger errors:
{code}
{a: [10, "foo"]} // Mixed types
{a: [10]} {a: ["foo"]} // Schema change
{a: [10, 12.5]} // Conflicting types: integer and float
{code}
h4. Nulls in Arrays
Drill handles nulls in arrays using the {{LIST}} type, described in a separate
note below.
was (Author: paul.rogers):
h4. JSON Arrays
Drill supports simple arrays in JSON using the following rules:
* Arrays must contain hetrogeneous elements: any of the scalars described
above, or a JSON object.
(See a later comment for nested arrays.)
For example, the following are scalar arrays:
{code}
[10, 20]
[10.30, 10.45]
["foo", "bar"]
[true, false]
{code}
h4. Schema Change in Arrays
The following will trigger errors:
{code}
{a: [10, "foo"]} // Mixed types
{a: [10]} {a: ["foo"]} // Schema change
{a: [10, 12.5]} // Conflicting types: integer and float
{code}
h4. Nulls with Arrays
Rules for nulls are:
* Arrays may not contain nulls. (Drill does not support nulls as array
elements.)
* A null (or missing) array field is treated the same as an empty array.
The following is invalid:
{code}
[10, null, 20]
{code}
The following are all valid:
{code}
{id: 1}
{id: 2, a: null}
{id: 3, a: []}
{id: 4, a: [10, 20, 30]}
{code}
As described, Drill will defer picking an array type if it sees null values. In
the above example, for id=2, Drill sees column `a` but does not pick a type.
For id=3, Drill identifies that `a` is an array, but does not know the type.
Finally, for id=4, Drill identifies the array as {{Repeated BIGINT}}. (This is
the behavior for Drill 1.13, earlier versions may differ and require
investigation.)
As usual, if the first file or batch contains only nulls, Drill will guess
{{Nullable VARCHAR}} which will cause a schema change error if later records
reveal the type to be an array (of any type.)
If the first batch contains only nulls and/or empty arrays, Drill guesses that
the type is {{Repeated VARCHAR}}. (Again, this is specific to Drill 1.13.) For
example:
{code}
{id: 1}
{id: 2, a: null}
{id: 3, a: []}
{code}
> Specify Drill's JSON behavior
> -----------------------------
>
> Key: DRILL-6035
> URL: https://issues.apache.org/jira/browse/DRILL-6035
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.13.0
> Reporter: Paul Rogers
> Assignee: Pritesh Maker
>
> Drill supports JSON as its native data format. However, experience suggests
> that Drill may have limitations in the JSON that Drill supports. This ticket
> asks to clarify Drill's expected behavior on various kinds of JSON.
> Topics to be addressed:
> * Relational vs. non-relational structures
> * JSON structures used in practice and how they map to Drill
> * Support for varying data types
> * Support for missing values, especially across files
> These topics are complex, hence the request to provide a detailed
> specifications that clarifies what Drill does and does not support (or what
> is should and should not support.)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)