[
https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053326#comment-16053326
]
Paul Rogers commented on DRILL-4824:
------------------------------------
Thanks for the very detailed, informative proposal! I've gone through it and
added detailed comments.
The main themes are:
* Must coordinate with the work done in DRILL-5211 to avoid fragmentation. This
work has reworked the "vector writers", among other changes.
* Must handle null vectors in a generic way, not in JSON-specifc code.
* Need for type promotion in both assignment (assign smaller value to larger
vector) and in vector promotion (replace a smaller vector with a larger one
when presented with a larger value.)
* Backward compatibility with older JDBC and ODBC clients that do not
understand the new vector layouts.
Also, we probably should check with the Arrow project to see if they have
solved this problem or have plans to do so. It is a stated (but questioned)
goal of Drill to move to Arrow. So, changing the vectors in a way that Arrow
does not support will prevent us from switching to Arrow -- unless we can make
the same changes in Arrow.
> Null maps / lists and non-provided state support for JSON fields. Numeric
> types promotion.
> ------------------------------------------------------------------------------------------
>
> Key: DRILL-4824
> URL: https://issues.apache.org/jira/browse/DRILL-4824
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - JSON
> Affects Versions: 1.0.0
> Reporter: Roman
> Assignee: Volodymyr Vysotskyi
>
> There is incorrect output in case of JSON file with complex nested data.
> _JSON:_
> {code:none|title=example.json|borderStyle=solid}
> {
> "Field1" : {
> }
> }
> {
> "Field1" : {
> "InnerField1": {"key1":"value1"},
> "InnerField2": {"key2":"value2"}
> }
> }
> {
> "Field1" : {
> "InnerField3" : ["value3", "value4"],
> "InnerField4" : ["value5", "value6"]
> }
> }
> {code}
> _Query:_
> {code:sql}
> select Field1 from dfs.`/tmp/example.json`
> {code}
> _Incorrect result:_
> {code:none}
> +---------------------------+
> | Field1 |
> +---------------------------+
> {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{"key1":"value1"},"InnerField2"
> {"key2":"value2"},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--------------------------+
> {code}
> Theres is no need to output missing fields. In case of deeply nested
> structure we will get unreadable result for user.
> _Correct result:_
> {code:none}
> +--------------------------+
> | Field1 |
> +--------------------------+
> |{}
> {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}}
> {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--------------------------+
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)