Jinfeng Ni created DRILL-5559:
---------------------------------
Summary: Incorrect query result when querying json files with
schema change
Key: DRILL-5559
URL: https://issues.apache.org/jira/browse/DRILL-5559
Project: Apache Drill
Issue Type: Bug
Reporter: Jinfeng Ni
Have two json files with nested structure. In the first one, `a.b` is bigint
while in the second one, `a.b` is a float.
{code}
cat 1.json
{a:{b:100}}
cat 2.json
{a:{b:200.0}}
{code}
The following query would return wrong result for the second row. Notice that
it's changed from 200.0 to 4641240890982006784.
{code}
select a from dfs.tmp.t2;
+----------------------------+
| a |
+----------------------------+
| {"b":100} |
| {"b":4641240890982006784} |
+----------------------------+
{code}
Explain plan output:
{code}
explain plan for select a from dfs.tmp.t2;
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(a=[$0])
00-02 Scan(groupscan=[EasyGroupScan [selectionRoot=file:/tmp/t2,
numFiles=2, columns=[`a`], files=[file:/tmp/t2/1.json, file:/tmp/t2/2.json]]])
{code}
If the involved operators could not handle schema change, at minimum we should
fail the query with SchemaChangeException error, in stead of returning wrong
query results.
Another interesting observation. If we query field `a.b` in stead of `a`, then
Drill returns correct result.
{code}
select t.a.b from dfs.tmp.t2 t;
+---------+
| EXPR$0 |
+---------+
| 100 |
| 200.0 |
+---------+
{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)