[ 
https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857146#comment-15857146
 ] 

ASF GitHub Bot commented on DRILL-4824:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/580#discussion_r99970247
  
    --- Diff: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/MapVector.java 
---
    @@ -317,6 +317,12 @@ public Object getObject(int index) {
             if (v != null && index < v.getAccessor().getValueCount()) {
               Object value = v.getAccessor().getObject(index);
               if (value != null) {
    +            if ((v.getAccessor().getObject(index) instanceof Map
    +                    && ((Map) v.getAccessor().getObject(index)).size() == 
0)
    +                || (v.getAccessor().getObject(index) instanceof List
    +                    && ((List) v.getAccessor().getObject(index)).size() == 
0)) {
    +              continue;
    +            }
    --- End diff --
    
    See the JIRA entry for more comments. The key problem is that Drill does 
not support standard JSON rules. So, we are simply moving the problem around.
    
    Consider this case:
    
    {code}
    { }
    { "a": { } }
    { "a": { "b": {} } }
    {code}
    
    It appears that the code here would emit:
    { }
    { }
    { "a": { } }
    {code}
    
    If we are going to apply the empty-is-not-present rule we should do so 
recursively.


> JSON with complex nested data produces incorrect output with missing fields
> ---------------------------------------------------------------------------
>
>                 Key: DRILL-4824
>                 URL: https://issues.apache.org/jira/browse/DRILL-4824
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 1.0.0
>            Reporter: Roman
>            Assignee: Serhii Harnyk
>
> There is incorrect output in case of JSON file with complex nested data.
> _JSON:_
> {code:none|title=example.json|borderStyle=solid}
> {
>         "Field1" : {
>         }
> }
> {
>         "Field1" : {
>                 "InnerField1": {"key1":"value1"},
>                 "InnerField2": {"key2":"value2"}
>         }
> }
> {
>         "Field1" : {
>                 "InnerField3" : ["value3", "value4"],
>                 "InnerField4" : ["value5", "value6"]
>         }
> }
> {code}
> _Query:_
> {code:sql}
> select Field1 from dfs.`/tmp/example.json`
> {code}
> _Incorrect result:_
> {code:none}
> +---------------------------+
> |          Field1           |
> +---------------------------+
> {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{"key1":"value1"},"InnerField2" 
> {"key2":"value2"},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--------------------------+
> {code}
> Theres is no need to output missing fields. In case of deeply nested 
> structure we will get unreadable result for user.
> _Correct result:_
> {code:none}
> +--------------------------+
> |         Field1           |
> +--------------------------+
> |{}                                                                     
> {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}}
> {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--------------------------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to