[jira] [Created] (DRILL-5559) Incorrect query result when querying json files with schema change

Jinfeng Ni (JIRA) Wed, 31 May 2017 14:18:22 -0700

Jinfeng Ni created DRILL-5559:
---------------------------------

             Summary: Incorrect query result when querying json files with 
schema change
                 Key: DRILL-5559
                 URL: https://issues.apache.org/jira/browse/DRILL-5559
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Jinfeng Ni



Have two json files with nested structure. In the first one, `a.b` is bigint 
while in the second one, `a.b` is a float.

{code}
cat 1.json
{a:{b:100}}

cat 2.json
{a:{b:200.0}}
{code}

The following query would return wrong result for the second row. Notice that 
it's changed from 200.0 to 4641240890982006784. 

{code}
select a from dfs.tmp.t2;
+----------------------------+
|             a              |
+----------------------------+
| {"b":100}                  |
| {"b":4641240890982006784}  |
+----------------------------+
{code}

Explain plan output:
{code}
explain plan for select a from dfs.tmp.t2;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(a=[$0])
00-02        Scan(groupscan=[EasyGroupScan [selectionRoot=file:/tmp/t2, 
numFiles=2, columns=[`a`], files=[file:/tmp/t2/1.json, file:/tmp/t2/2.json]]])
{code}

If the involved operators could not handle schema change, at minimum we should 
fail the query with SchemaChangeException error, in stead of returning wrong 
query results.

Another interesting observation. If we query field `a.b` in stead of `a`, then 
Drill returns correct result.

{code}
select t.a.b from dfs.tmp.t2 t;
+---------+
| EXPR$0  |
+---------+
| 100     |
| 200.0   |
+---------+
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5559) Incorrect query result when querying json files with schema change

Reply via email to