Victoria Markman created DRILL-2967:
---------------------------------------
Summary: Incompatible types error reported in a "not in" query
with compatible data types
Key: DRILL-2967
URL: https://issues.apache.org/jira/browse/DRILL-2967
Project: Apache Drill
Issue Type: Bug
Components: Query Planning & Optimization
Affects Versions: 0.9.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
Two tables, parquet files (attached in the bug):
{code}
0: jdbc:drill:schema=dfs> select * from t1;
+------------+------------+------------+
| a1 | b1 | c1 |
+------------+------------+------------+
| 1 | aaaaa | 2015-01-01 |
| 2 | bbbbb | 2015-01-02 |
| 3 | ccccc | 2015-01-03 |
| 4 | null | 2015-01-04 |
| 5 | eeeee | 2015-01-05 |
| 6 | fffff | 2015-01-06 |
| 7 | ggggg | 2015-01-07 |
| null | hhhhh | 2015-01-08 |
| 9 | iiiii | null |
| 10 | jjjjj | 2015-01-10 |
+------------+------------+------------+
10 rows selected (0.119 seconds)
0: jdbc:drill:schema=dfs> select * from t2;
+------------+------------+------------+
| a2 | b2 | c2 |
+------------+------------+------------+
| 0 | zzz | 2014-12-31 |
| 1 | aaaaa | 2015-01-01 |
| 2 | bbbbb | 2015-01-02 |
| 2 | bbbbb | 2015-01-02 |
| 2 | bbbbb | 2015-01-02 |
| 3 | ccccc | 2015-01-03 |
| 4 | ddddd | 2015-01-04 |
| 5 | eeeee | 2015-01-05 |
| 6 | fffff | 2015-01-06 |
| 7 | ggggg | 2015-01-07 |
| 7 | ggggg | 2015-01-07 |
| 8 | hhhhh | 2015-01-08 |
| 9 | iiiii | 2015-01-09 |
+------------+------------+------------+
13 rows selected (0.116 seconds)
{code}
Disable hash join and set slice_target = 1:
alter session set `planner.enable_hashjoin` = false;
alter session set `planner.slice_target` = 1;
Correct result:
{code}
0: jdbc:drill:schema=dfs> select * from t1 where b1 not in (select b2 from t2);
+------------+------------+------------+
| a1 | b1 | c1 |
+------------+------------+------------+
| 10 | jjjjj | 2015-01-10 |
+------------+------------+------------+
1 row selected (0.625 seconds)
{code}
Swap tables and you get an error:
{code}
0: jdbc:drill:schema=dfs> select * from t2 where b2 not in (select b1 from t1);
+------------+------------+------------+
| a1 | b1 | c1 |
+------------+------------+------------+
Query failed: SYSTEM ERROR: Join only supports implicit casts between 1.
Numeric data
2. Varchar, Varbinary data Left type: INT, Right type: VARCHAR. Add explicit
casts to avoid this error
Fragment 1:0
[1a83aa50-39aa-452c-91dd-970bf4a8f03d on atsqa4-133.qa.lab:31010]
java.lang.RuntimeException: java.sql.SQLException: Failure while executing
query.
at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
at sqlline.SqlLine.print(SqlLine.java:1809)
at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
at sqlline.SqlLine.dispatch(SqlLine.java:889)
at sqlline.SqlLine.begin(SqlLine.java:763)
at sqlline.SqlLine.start(SqlLine.java:498)
at sqlline.SqlLine.main(SqlLine.java:460)
{code}
Explain plan for the query with an error:
{code}
0: jdbc:drill:schema=dfs> explain plan for select * from t2 where b2 not in
(select b1 from t1);
+------------+------------+
| text | json |
+------------+------------+
| 00-00 Screen
00-01 Project(*=[$0])
00-02 UnionExchange
01-01 Project(T27¦¦*=[$0])
01-02 SelectionVectorRemover
01-03 Filter(condition=[NOT(CASE(=($2, 0), false, IS NOT NULL($6),
true, IS NULL($4), null, <($3, $2), null, false))])
01-04 MergeJoin(condition=[=($4, $5)], joinType=[left])
01-06 SelectionVectorRemover
01-08 Sort(sort0=[$4], dir0=[ASC])
01-10 Project(T27¦¦*=[$0], b2=[$1], $f0=[$2], $f1=[$3],
b20=[$4])
01-12 HashToRandomExchange(dist0=[[$4]])
02-01 UnorderedMuxExchange
04-01 Project(T27¦¦*=[$0], b2=[$1], $f0=[$2],
$f1=[$3], b20=[$4], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($4))])
04-02 Project(T27¦¦*=[$0], b2=[$1], $f0=[$2],
$f1=[$3], b20=[$1])
04-03 NestedLoopJoin(condition=[true],
joinType=[inner])
04-05 Project(T27¦¦*=[$0], b2=[$1])
04-06 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t2]],
selectionRoot=/drill/testdata/aggregation/t2, numFiles=1, columns=[`*`]]])
04-04 BroadcastExchange
06-01 StreamAgg(group=[{}],
agg#0=[$SUM0($0)], agg#1=[$SUM0($1)])
06-02 UnionExchange
07-01 StreamAgg(group=[{}],
agg#0=[COUNT()], agg#1=[COUNT($0)])
07-02 Project(b1=[$0], $f1=[true])
07-03
Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/drill/testdata/aggregation/t1]],
selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]])
01-05 Project(b1=[$0], $f10=[$1])
01-07 SelectionVectorRemover
01-09 Sort(sort0=[$0], dir0=[ASC])
01-11 HashAgg(group=[{0}], agg#0=[MIN($1)])
01-13 Project(b1=[$0], $f1=[$1])
01-14 HashToRandomExchange(dist0=[[$0]])
03-01 UnorderedMuxExchange
05-01 Project(b1=[$0], $f1=[$1],
E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
05-02 HashAgg(group=[{0}], agg#0=[MIN($1)])
05-03 Project(b1=[$0], $f1=[true])
05-04 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]],
selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]])
{code}
Correct result and correct plan with hash join distributed plan (
planner.slice_target = 1)
alter session set `planner.enable_hashjoin` = true;
alter session set `planner.slice_target` = 1;
{code}
0: jdbc:drill:schema=dfs> select * from t2 where b2 not in (select b1 from t1);
+------------+------------+------------+
| a2 | b2 | c2 |
+------------+------------+------------+
+------------+------------+------------+
No rows selected (0.458 seconds)
0: jdbc:drill:schema=dfs> explain plan for select * from t2 where b2 not in
(select b1 from t1);
+------------+------------+
| text | json |
+------------+------------+
| 00-00 Screen
00-01 Project(*=[$0])
00-02 Project(T25¦¦*=[$0])
00-03 SelectionVectorRemover
00-04 Filter(condition=[NOT(CASE(=($2, 0), false, IS NOT NULL($6),
true, IS NULL($4), null, <($3, $2), null, false))])
00-05 HashJoin(condition=[=($4, $5)], joinType=[left])
00-07 Project(T25¦¦*=[$0], b2=[$1], $f0=[$2], $f1=[$3], b20=[$1])
00-09 NestedLoopJoin(condition=[true], joinType=[inner])
00-11 Project(T25¦¦*=[$0], b2=[$1])
00-12 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t2]],
selectionRoot=/drill/testdata/aggregation/t2, numFiles=1, columns=[`*`]]])
00-10 BroadcastExchange
01-01 StreamAgg(group=[{}], agg#0=[$SUM0($0)],
agg#1=[$SUM0($1)])
01-02 UnionExchange
03-01 StreamAgg(group=[{}], agg#0=[COUNT()],
agg#1=[COUNT($0)])
03-02 Project(b1=[$0], $f1=[true])
03-03 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]],
selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]])
00-06 Project(b1=[$0], $f10=[$1])
00-08 BroadcastExchange
02-01 HashAgg(group=[{0}], agg#0=[MIN($1)])
02-02 Project(b1=[$0], $f1=[$1])
02-03 HashToRandomExchange(dist0=[[$0]])
04-01 UnorderedMuxExchange
05-01 Project(b1=[$0], $f1=[$1],
E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
05-02 HashAgg(group=[{0}], agg#0=[MIN($1)])
05-03 Project(b1=[$0], $f1=[true])
05-04 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]],
selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]])
{code}
Same error with the columns of date, time and timestamp types.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)