[
https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050759#comment-16050759
]
Paul Rogers commented on DRILL-4824:
------------------------------------
Wonderful! One quick comment on section 2: Numeric Type Promotion. One goal of
the new vector writers created to solve DRILL-5211 is the ability to do type
promotion. There are three kinds:
* Non-conflicting type promotion. (call {{setInt()}} on a FLOAT8 or DECIMAL
vector, for example.)
* "Transparent" type promotion (call {{setDouble()}} on an INT, which requires
replacing one vector with another, but do so in the first batch where the
change is transparent to the downstream operators.)
* "Hard" type promotion: as above, but after the first batch. Causes a hard
schema change ({{OK_NEW_SCHEMA}}.
The code reviews for this work move quite slowly. Once the code is in master,
we can add the above type promotion to the basic mechanism.
Also, we should coordinate on this because another goal of DRILL-5211 is to rip
out the existing vector writers from various readers (including JSON) and
replace them with the new size-aware versions. So, your work should build on
the new set of vector writers, not the current set.
More comments to come.
> Null maps / lists and non-provided state support for JSON fields. Numeric
> types promotion.
> ------------------------------------------------------------------------------------------
>
> Key: DRILL-4824
> URL: https://issues.apache.org/jira/browse/DRILL-4824
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - JSON
> Affects Versions: 1.0.0
> Reporter: Roman
> Assignee: Volodymyr Vysotskyi
>
> There is incorrect output in case of JSON file with complex nested data.
> _JSON:_
> {code:none|title=example.json|borderStyle=solid}
> {
> "Field1" : {
> }
> }
> {
> "Field1" : {
> "InnerField1": {"key1":"value1"},
> "InnerField2": {"key2":"value2"}
> }
> }
> {
> "Field1" : {
> "InnerField3" : ["value3", "value4"],
> "InnerField4" : ["value5", "value6"]
> }
> }
> {code}
> _Query:_
> {code:sql}
> select Field1 from dfs.`/tmp/example.json`
> {code}
> _Incorrect result:_
> {code:none}
> +---------------------------+
> | Field1 |
> +---------------------------+
> {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{"key1":"value1"},"InnerField2"
> {"key2":"value2"},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--------------------------+
> {code}
> Theres is no need to output missing fields. In case of deeply nested
> structure we will get unreadable result for user.
> _Correct result:_
> {code:none}
> +--------------------------+
> | Field1 |
> +--------------------------+
> |{}
> {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}}
> {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--------------------------+
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)