[
https://issues.apache.org/jira/browse/DRILL-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974968#comment-14974968
]
ASF GitHub Bot commented on DRILL-2967:
---------------------------------------
Github user mehant commented on the pull request:
https://github.com/apache/drill/pull/213#issuecomment-151273785
+1.
On a side note, I remember there was some discussion earlier about having
an explicit nullable vector or not, in this particular case it might be better
to have an explicit nullable vector and let the implicit cast logic handle
going between a null type and others, instead of the operator having to deal
with this.
> Incompatible types error reported in a "not in" query with compatible data
> types
> ---------------------------------------------------------------------------------
>
> Key: DRILL-2967
> URL: https://issues.apache.org/jira/browse/DRILL-2967
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 0.9.0
> Reporter: Victoria Markman
> Assignee: Mehant Baid
> Fix For: 1.3.0
>
> Attachments: j2_j6_tables.tar, t1.parquet, t2.parquet
>
>
> Two tables, parquet files (attached in the bug):
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1;
> +------------+------------+------------+
> | a1 | b1 | c1 |
> +------------+------------+------------+
> | 1 | aaaaa | 2015-01-01 |
> | 2 | bbbbb | 2015-01-02 |
> | 3 | ccccc | 2015-01-03 |
> | 4 | null | 2015-01-04 |
> | 5 | eeeee | 2015-01-05 |
> | 6 | fffff | 2015-01-06 |
> | 7 | ggggg | 2015-01-07 |
> | null | hhhhh | 2015-01-08 |
> | 9 | iiiii | null |
> | 10 | jjjjj | 2015-01-10 |
> +------------+------------+------------+
> 10 rows selected (0.119 seconds)
> 0: jdbc:drill:schema=dfs> select * from t2;
> +------------+------------+------------+
> | a2 | b2 | c2 |
> +------------+------------+------------+
> | 0 | zzz | 2014-12-31 |
> | 1 | aaaaa | 2015-01-01 |
> | 2 | bbbbb | 2015-01-02 |
> | 2 | bbbbb | 2015-01-02 |
> | 2 | bbbbb | 2015-01-02 |
> | 3 | ccccc | 2015-01-03 |
> | 4 | ddddd | 2015-01-04 |
> | 5 | eeeee | 2015-01-05 |
> | 6 | fffff | 2015-01-06 |
> | 7 | ggggg | 2015-01-07 |
> | 7 | ggggg | 2015-01-07 |
> | 8 | hhhhh | 2015-01-08 |
> | 9 | iiiii | 2015-01-09 |
> +------------+------------+------------+
> 13 rows selected (0.116 seconds)
> {code}
> Disable hash join and set slice_target = 1:
> alter session set `planner.enable_hashjoin` = false;
> alter session set `planner.slice_target` = 1;
> Correct result:
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1 where b1 not in (select b2 from
> t2);
> +------------+------------+------------+
> | a1 | b1 | c1 |
> +------------+------------+------------+
> | 10 | jjjjj | 2015-01-10 |
> +------------+------------+------------+
> 1 row selected (0.625 seconds)
> {code}
> Swap tables and you get an error:
> {code}
> 0: jdbc:drill:schema=dfs> select * from t2 where b2 not in (select b1 from
> t1);
> +------------+------------+------------+
> | a1 | b1 | c1 |
> +------------+------------+------------+
> Query failed: SYSTEM ERROR: Join only supports implicit casts between 1.
> Numeric data
> 2. Varchar, Varbinary data Left type: INT, Right type: VARCHAR. Add explicit
> casts to avoid this error
> Fragment 1:0
> [1a83aa50-39aa-452c-91dd-970bf4a8f03d on atsqa4-133.qa.lab:31010]
> java.lang.RuntimeException: java.sql.SQLException: Failure while executing
> query.
> at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
> at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
> at sqlline.SqlLine.print(SqlLine.java:1809)
> at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
> at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
> at sqlline.SqlLine.dispatch(SqlLine.java:889)
> at sqlline.SqlLine.begin(SqlLine.java:763)
> at sqlline.SqlLine.start(SqlLine.java:498)
> at sqlline.SqlLine.main(SqlLine.java:460)
> {code}
> Explain plan for the query with an error:
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select * from t2 where b2 not in
> (select b1 from t1);
> +------------+------------+
> | text | json |
> +------------+------------+
> | 00-00 Screen
> 00-01 Project(*=[$0])
> 00-02 UnionExchange
> 01-01 Project(T27¦¦*=[$0])
> 01-02 SelectionVectorRemover
> 01-03 Filter(condition=[NOT(CASE(=($2, 0), false, IS NOT NULL($6), true, IS
> NULL($4), null, <($3, $2), null, false))])
> 01-04 MergeJoin(condition=[=($4, $5)], joinType=[left])
> 01-06 SelectionVectorRemover
> 01-08 Sort(sort0=[$4], dir0=[ASC])
> 01-10 Project(T27¦¦*=[$0], b2=[$1], $f0=[$2], $f1=[$3], b20=[$4])
> 01-12 HashToRandomExchange(dist0=[[$4]])
> 02-01 UnorderedMuxExchange
> 04-01 Project(T27¦¦*=[$0], b2=[$1], $f0=[$2], $f1=[$3], b20=[$4],
> E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($4))])
> 04-02 Project(T27¦¦*=[$0], b2=[$1], $f0=[$2], $f1=[$3], b20=[$1])
> 04-03 NestedLoopJoin(condition=[true], joinType=[inner])
> 04-05 Project(T27¦¦*=[$0], b2=[$1])
> 04-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:/drill/testdata/aggregation/t2]],
> selectionRoot=/drill/testdata/aggregation/t2, numFiles=1, columns=[`*`]]])
> 04-04 BroadcastExchange
> 06-01 StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)])
> 06-02 UnionExchange
> 07-01 StreamAgg(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
> 07-02 Project(b1=[$0], $f1=[true])
> 07-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:/drill/testdata/aggregation/t1]],
> selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]])
> 01-05 Project(b1=[$0], $f10=[$1])
> 01-07 SelectionVectorRemover
> 01-09 Sort(sort0=[$0], dir0=[ASC])
> 01-11 HashAgg(group=[{0}], agg#0=[MIN($1)])
> 01-13 Project(b1=[$0], $f1=[$1])
> 01-14 HashToRandomExchange(dist0=[[$0]])
> 03-01 UnorderedMuxExchange
> 05-01 Project(b1=[$0], $f1=[$1],
> E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
> 05-02 HashAgg(group=[{0}], agg#0=[MIN($1)])
> 05-03 Project(b1=[$0], $f1=[true])
> 05-04 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:/drill/testdata/aggregation/t1]],
> selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]])
> {code}
> Correct result and correct plan with hash join distributed plan (
> planner.slice_target = 1)
> alter session set `planner.enable_hashjoin` = true;
> alter session set `planner.slice_target` = 1;
> {code}
> 0: jdbc:drill:schema=dfs> select * from t2 where b2 not in (select b1 from
> t1);
> +------------+------------+------------+
> | a2 | b2 | c2 |
> +------------+------------+------------+
> +------------+------------+------------+
> No rows selected (0.458 seconds)
> 0: jdbc:drill:schema=dfs> explain plan for select * from t2 where b2 not in
> (select b1 from t1);
> +------------+------------+
> | text | json |
> +------------+------------+
> | 00-00 Screen
> 00-01 Project(*=[$0])
> 00-02 Project(T25¦¦*=[$0])
> 00-03 SelectionVectorRemover
> 00-04 Filter(condition=[NOT(CASE(=($2, 0), false, IS NOT NULL($6),
> true, IS NULL($4), null, <($3, $2), null, false))])
> 00-05 HashJoin(condition=[=($4, $5)], joinType=[left])
> 00-07 Project(T25¦¦*=[$0], b2=[$1], $f0=[$2], $f1=[$3],
> b20=[$1])
> 00-09 NestedLoopJoin(condition=[true], joinType=[inner])
> 00-11 Project(T25¦¦*=[$0], b2=[$1])
> 00-12 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t2]],
> selectionRoot=/drill/testdata/aggregation/t2, numFiles=1, columns=[`*`]]])
> 00-10 BroadcastExchange
> 01-01 StreamAgg(group=[{}], agg#0=[$SUM0($0)],
> agg#1=[$SUM0($1)])
> 01-02 UnionExchange
> 03-01 StreamAgg(group=[{}], agg#0=[COUNT()],
> agg#1=[COUNT($0)])
> 03-02 Project(b1=[$0], $f1=[true])
> 03-03 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]],
> selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]])
> 00-06 Project(b1=[$0], $f10=[$1])
> 00-08 BroadcastExchange
> 02-01 HashAgg(group=[{0}], agg#0=[MIN($1)])
> 02-02 Project(b1=[$0], $f1=[$1])
> 02-03 HashToRandomExchange(dist0=[[$0]])
> 04-01 UnorderedMuxExchange
> 05-01 Project(b1=[$0], $f1=[$1],
> E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
> 05-02 HashAgg(group=[{0}], agg#0=[MIN($1)])
> 05-03 Project(b1=[$0], $f1=[true])
> 05-04 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]],
> selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]])
> {code}
> Same error with the columns of date, time and timestamp types.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)