[
https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15843040#comment-15843040
]
ASF GitHub Bot commented on DRILL-4824:
---------------------------------------
Github user Serhii-Harnyk commented on a diff in the pull request:
https://github.com/apache/drill/pull/580#discussion_r98232198
--- Diff:
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/MapVector.java
---
@@ -317,6 +317,12 @@ public Object getObject(int index) {
if (v != null && index < v.getAccessor().getValueCount()) {
Object value = v.getAccessor().getObject(index);
if (value != null) {
+ if ((v.getAccessor().getObject(index) instanceof Map
+ && ((Map) v.getAccessor().getObject(index)).size() ==
0)
+ || (v.getAccessor().getObject(index) instanceof List
+ && ((List) v.getAccessor().getObject(index)).size() ==
0)) {
+ continue;
+ }
--- End diff --
@paul-rogers, map fields have data mode required and they are the part of
the schema, that's why there are no difference between missing field in some
record, and the field that exists but empty.
This fix for your example will return result
`{"a":{"b":10},"c":[1,2,3]}`
`{"c":[4]}`
`{"b":[5]}`
`{"a":{"b":20}}`
`{"a":{"b":20}}`
> JSON with complex nested data produces incorrect output with missing fields
> ---------------------------------------------------------------------------
>
> Key: DRILL-4824
> URL: https://issues.apache.org/jira/browse/DRILL-4824
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - JSON
> Affects Versions: 1.0.0
> Reporter: Roman
> Assignee: Serhii Harnyk
>
> There is incorrect output in case of JSON file with complex nested data.
> _JSON:_
> {code:none|title=example.json|borderStyle=solid}
> {
> "Field1" : {
> }
> }
> {
> "Field1" : {
> "InnerField1": {"key1":"value1"},
> "InnerField2": {"key2":"value2"}
> }
> }
> {
> "Field1" : {
> "InnerField3" : ["value3", "value4"],
> "InnerField4" : ["value5", "value6"]
> }
> }
> {code}
> _Query:_
> {code:sql}
> select Field1 from dfs.`/tmp/example.json`
> {code}
> _Incorrect result:_
> {code:none}
> +---------------------------+
> | Field1 |
> +---------------------------+
> {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{"key1":"value1"},"InnerField2"
> {"key2":"value2"},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--------------------------+
> {code}
> Theres is no need to output missing fields. In case of deeply nested
> structure we will get unreadable result for user.
> _Correct result:_
> {code:none}
> +--------------------------+
> | Field1 |
> +--------------------------+
> |{}
> {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}}
> {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--------------------------+
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)