[jira] [Commented] (DRILL-4824) JSON with complex nested data produces incorrect output with missing fields

ASF GitHub Bot (JIRA) Mon, 12 Sep 2016 18:45:54 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485938#comment-15485938
 ]


ASF GitHub Bot commented on DRILL-4824:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/580#discussion_r78484822
  
    --- Diff: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/MapVector.java 
---
    @@ -317,6 +317,12 @@ public Object getObject(int index) {
             if (v != null && index < v.getAccessor().getValueCount()) {
               Object value = v.getAccessor().getObject(index);
               if (value != null) {
    +            if ((v.getAccessor().getObject(index) instanceof Map
    +                    && ((Map) v.getAccessor().getObject(index)).size() == 
0)
    +                || (v.getAccessor().getObject(index) instanceof List
    +                    && ((List) v.getAccessor().getObject(index)).size() == 
0)) {
    +              continue;
    +            }
    --- End diff --
    
    Does this handle the difference beteen a missing field, and a field that 
exists, but contains an empty map or list? Examples:
    
        { "a" : { "b" : 10 }, "c" : [ 1, 2, 3 ] }  // Baseline case
        { "a" : { }, "c" : [ 4 ] }                       // Keep "a"?
        { "b" : [ 5 ] }                                     // Remove a: OK
        { "a" : { "b": 20 }, "c" : [ ] }             // Keep "c"?
        { "a" : { "b": 20 } }                           // Remove c: OK
    
    That is, should we diffentiate between a empty map/list and a non-existent 
one?
    
    The code seems to discard all empty maps & lists. Should the check actually 
be done in the parser to not pass along empty items? Is this possible (I'm on 
thin ice here in my detailed knowledge of value vectors...)


> JSON with complex nested data produces incorrect output with missing fields
> ---------------------------------------------------------------------------
>
>                 Key: DRILL-4824
>                 URL: https://issues.apache.org/jira/browse/DRILL-4824
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 1.0.0
>            Reporter: Roman
>            Assignee: Roman
>             Fix For: 1.9.0
>
>
> There is incorrect output in case of JSON file with complex nested data.
> _JSON:_
> {code:none|title=example.json|borderStyle=solid}
> {
>         "Field1" : {
>         }
> }
> {
>         "Field1" : {
>                 "InnerField1": {"key1":"value1"},
>                 "InnerField2": {"key2":"value2"}
>         }
> }
> {
>         "Field1" : {
>                 "InnerField3" : ["value3", "value4"],
>                 "InnerField4" : ["value5", "value6"]
>         }
> }
> {code}
> _Query:_
> {code:sql}
> select Field1 from dfs.`/tmp/example.json`
> {code}
> _Incorrect result:_
> {code:none}
> +---------------------------+
> |          Field1           |
> +---------------------------+
> {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{"key1":"value1"},"InnerField2" 
> {"key2":"value2"},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--------------------------+
> {code}
> Theres is no need to output missing fields. In case of deeply nested 
> structure we will get unreadable result for user.
> _Correct result:_
> {code:none}
> +--------------------------+
> |         Field1           |
> +--------------------------+
> |{}                                                                     
> {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}}
> {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--------------------------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4824) JSON with complex nested data produces incorrect output with missing fields

Reply via email to