[
https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485938#comment-15485938
]
ASF GitHub Bot commented on DRILL-4824:
---------------------------------------
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/580#discussion_r78484822
--- Diff:
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/MapVector.java
---
@@ -317,6 +317,12 @@ public Object getObject(int index) {
if (v != null && index < v.getAccessor().getValueCount()) {
Object value = v.getAccessor().getObject(index);
if (value != null) {
+ if ((v.getAccessor().getObject(index) instanceof Map
+ && ((Map) v.getAccessor().getObject(index)).size() ==
0)
+ || (v.getAccessor().getObject(index) instanceof List
+ && ((List) v.getAccessor().getObject(index)).size() ==
0)) {
+ continue;
+ }
--- End diff --
Does this handle the difference beteen a missing field, and a field that
exists, but contains an empty map or list? Examples:
{ "a" : { "b" : 10 }, "c" : [ 1, 2, 3 ] } // Baseline case
{ "a" : { }, "c" : [ 4 ] } // Keep "a"?
{ "b" : [ 5 ] } // Remove a: OK
{ "a" : { "b": 20 }, "c" : [ ] } // Keep "c"?
{ "a" : { "b": 20 } } // Remove c: OK
That is, should we diffentiate between a empty map/list and a non-existent
one?
The code seems to discard all empty maps & lists. Should the check actually
be done in the parser to not pass along empty items? Is this possible (I'm on
thin ice here in my detailed knowledge of value vectors...)
> JSON with complex nested data produces incorrect output with missing fields
> ---------------------------------------------------------------------------
>
> Key: DRILL-4824
> URL: https://issues.apache.org/jira/browse/DRILL-4824
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - JSON
> Affects Versions: 1.0.0
> Reporter: Roman
> Assignee: Roman
> Fix For: 1.9.0
>
>
> There is incorrect output in case of JSON file with complex nested data.
> _JSON:_
> {code:none|title=example.json|borderStyle=solid}
> {
> "Field1" : {
> }
> }
> {
> "Field1" : {
> "InnerField1": {"key1":"value1"},
> "InnerField2": {"key2":"value2"}
> }
> }
> {
> "Field1" : {
> "InnerField3" : ["value3", "value4"],
> "InnerField4" : ["value5", "value6"]
> }
> }
> {code}
> _Query:_
> {code:sql}
> select Field1 from dfs.`/tmp/example.json`
> {code}
> _Incorrect result:_
> {code:none}
> +---------------------------+
> | Field1 |
> +---------------------------+
> {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{"key1":"value1"},"InnerField2"
> {"key2":"value2"},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--------------------------+
> {code}
> Theres is no need to output missing fields. In case of deeply nested
> structure we will get unreadable result for user.
> _Correct result:_
> {code:none}
> +--------------------------+
> | Field1 |
> +--------------------------+
> |{}
> {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}}
> {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--------------------------+
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)