[
https://issues.apache.org/jira/browse/DRILL-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293362#comment-16293362
]
Paul Rogers commented on DRILL-6035:
------------------------------------
h4. JSON Projection Pushdown
The JSON reader supports "projection push-down." The simple rules are simple in
concept, but complex in details.
The project list comes from the query. In its simplest form, it is the list of
columns following the {{SELECT}} keyword:
{code}
SELECT a, b.c, d[0] FROM ...
{code}
|| Projection || JSON Value of `a` || Drill Result ||
| `a` | Scalar | Projects `a` |
| | Array | Projects all elements of `a` |
| | Object | Projects all members of `a` |
| | Missing | Creates a {{Nullable INT}} (Drill 1.12) or {{Nullable VARCHAR}}
(Drill 1.13) column |
| | {{null}} | As above |
| `a`.`b` | Scalar | Error (`a` must be an object) |
| | Scalar array | Error (`a` must be a map or an array of maps) |
| | Object that contains `b` | Projects just `b` from object `a` |
| | Object that does not contain `b` | Projects a nullable column `b` within
map `a` |
| | Object array that contains `b` | Projects just be from the objects within
array `a` |
| | Object array that does not contain `b` | Projects a nullable column `b`
within the array of maps |
| | Missing | Projects a map `a` that contains a nullable column `b` |
| | {{null}} | As above |
| a\[0] | Scalar | Error (`a` must be an array) |
| | Scalar array | Projects just `a\[0]` as a scalar (the reader projects the
entire array, a project operator pulls out the `a\[0]` element) |
| | Object | Error (`a` must e an array) |
| | Object array | Projects just object (map) `a\[0]` as described above |
| | {{null}} | JSON creates an array of null values, project pulls out `a\[0]` |
| | Missing | As above |
Notes:
* The rules above are for Drill 1.13. Drill 1.12 and earlier is different, and
requires investigation.
* The rules for null values are suble. The type of the null is inferred from
the project list in the case of a map (`a`.`b`) or an array (`a\[0]). Previous
sections described null handling for the {{SELECT *}} and {{SELECT `a`}} cases.
* The rules for projecting map columns apply to both arrays and single maps.
(In Drill 1.12 and earlier, the two cases appear to have behaved differently.)
> Specify Drill's JSON behavior
> -----------------------------
>
> Key: DRILL-6035
> URL: https://issues.apache.org/jira/browse/DRILL-6035
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.13.0
> Reporter: Paul Rogers
> Assignee: Pritesh Maker
>
> Drill supports JSON as its native data format. However, experience suggests
> that Drill may have limitations in the JSON that Drill supports. This ticket
> asks to clarify Drill's expected behavior on various kinds of JSON.
> Topics to be addressed:
> * Relational vs. non-relational structures
> * JSON structures used in practice and how they map to Drill
> * Support for varying data types
> * Support for missing values, especially across files
> These topics are complex, hence the request to provide a detailed
> specifications that clarifies what Drill does and does not support (or what
> is should and should not support.)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)