[ https://issues.apache.org/jira/browse/SPARK-29274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939867#comment-16939867 ]
Yuming Wang commented on SPARK-29274: ------------------------------------- I don't know how Vertica works. But it returns incorrect values. {noformat} create table t1 (incdata_id decimal(21,0), v VARCHAR); create table t2 (incdata_id VARCHAR, v VARCHAR); insert into t1 values(100000000001636981212, '1'); insert into t2 values(100000000001636981213, '2'); dbadmin=> select * from t1 join t2 on (t1.incdata_id = t2.incdata_id); incdata_id | v | incdata_id | v -----------------------+---+-----------------------+--- 100000000001636981212 | 1 | 100000000001636981213 | 2 (1 row) dbadmin=> explain select * from t1 join t2 on (t1.incdata_id = t2.incdata_id); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------ QUERY PLAN DESCRIPTION: ------------------------------ explain select * from t1 join t2 on (t1.incdata_id = t2.incdata_id); Access Path: +-JOIN HASH [Cost: 4, Rows: 1 (NO STATISTICS)] (PATH ID: 1) | Join Cond: (t1.incdata_id = t2.incdata_id) | Materialize at Output: t1.v | +-- Outer -> STORAGE ACCESS for t1 [Cost: 1, Rows: 1 (NO STATISTICS)] (PATH ID: 2) | | Projection: public.t1_super | | Materialize: t1.incdata_id | | Runtime Filter: (SIP1(HashJoin): t1.incdata_id) | +-- Inner -> STORAGE ACCESS for t2 [Cost: 2, Rows: 1 (NO STATISTICS)] (PATH ID: 3) | | Projection: public.t2_super | | Materialize: t2.incdata_id, t2.v ------------------------------ ----------------------------------------------- PLAN: BASE QUERY PLAN (GraphViz Format) ----------------------------------------------- digraph G { graph [rankdir=BT, label = "BASE QUERY PLAN\nQuery: explain select * from t1 join t2 on (t1.incdata_id = t2.incdata_id);\n\nAll Nodes Vector: \n\n node[0]=v_docker_node0001 (initiator) Up\n", labelloc=t, labeljust=l ordering=out] 0[label = "Root \nOutBlk=[UncTuple(4)]", color = "green", shape = "house"]; 1[label = "NewEENode \nOutBlk=[UncTuple(4)]", color = "green", shape = "box"]; 2[label = "StorageUnionStep: t1_super\nUnc: Numeric(21,0)\nUnc: Varchar(80)\nUnc: Varchar(80)\nUnc: Varchar(80)", color = "purple", shape = "box"]; 3[label = "Join: Hash-Join: \n(public.t1 x public.t2) using t1_super and t2_super (PATH ID: 1)\n (t1.incdata_id = t2.incdata_id)\n\nUnc: Numeric(21,0)\nUnc: Varchar(80)\nUnc: Varchar(80)\nUnc: Varchar(80)", color = "brown", shape = "box"]; 4[label = "FilterStep: \n(t1.incdata_id IS NOT NULL)\nUnc: Numeric(21,0)", color = "brown", shape = "box"]; 5[label = "ScanStep: t1_super\nSIP1(HashJoin): t1.incdata_id\nincdata_id\nUnc: Numeric(21,0)", color = "brown", shape = "box"]; 6[label = "FilterStep: \n(t2.incdata_id IS NOT NULL)\nUnc: Varchar(80)\nUnc: Varchar(80)", color = "green", shape = "box"]; 7[label = "StorageUnionStep: t2_super\nUnc: Varchar(80)\nUnc: Varchar(80)", color = "purple", shape = "box"]; 8[label = "ScanStep: t2_super\nincdata_id\nv\nUnc: Varchar(80)\nUnc: Varchar(80)", color = "brown", shape = "box"]; 1->0 [label = "V[0] C=4",color = "black",style="bold", arrowtail="inv"]; 2->1 [label = "0",color = "blue"]; 3->2 [label = "0",color = "blue"]; 4->3 [label = "0",color = "blue"]; 5->4 [label = "0",color = "blue"]; 6->3 [label = "1",color = "blue"]; 7->6 [label = "0",color = "blue"]; 8->7 [label = "0",color = "blue"]; } (43 rows) {noformat} > Can not coerce decimal type to double type when it's join key > ------------------------------------------------------------- > > Key: SPARK-29274 > URL: https://issues.apache.org/jira/browse/SPARK-29274 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.4, 2.4.4, 3.0.0 > Reporter: Yuming Wang > Assignee: Pengfei Chang > Priority: Major > Attachments: image-2019-09-27-20-20-24-238.png > > > How to reproduce this issue: > {code:sql} > create table t1 (incdata_id decimal(21,0), v string) using parquet; > create table t2 (incdata_id string, v string) using parquet; > explain select * from t1 join t2 on (t1.incdata_id = t2.incdata_id); > == Physical Plan == > *(5) SortMergeJoin > [knownfloatingpointnormalized(normalizenanandzero(cast(incdata_id#31 as > double)))], > [knownfloatingpointnormalized(normalizenanandzero(cast(incdata_id#33 as > double)))], Inner > :- *(2) Sort > [knownfloatingpointnormalized(normalizenanandzero(cast(incdata_id#31 as > double))) ASC NULLS FIRST], false, 0 > : +- Exchange > hashpartitioning(knownfloatingpointnormalized(normalizenanandzero(cast(incdata_id#31 > as double))), 200), true, [id=#104] > : +- *(1) Filter isnotnull(incdata_id#31) > : +- Scan hive default.t1 [incdata_id#31, v#32], HiveTableRelation > `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [incdata_id#31, v#32], Statistics(sizeInBytes=8.0 EiB) > +- *(4) Sort > [knownfloatingpointnormalized(normalizenanandzero(cast(incdata_id#33 as > double))) ASC NULLS FIRST], false, 0 > +- Exchange > hashpartitioning(knownfloatingpointnormalized(normalizenanandzero(cast(incdata_id#33 > as double))), 200), true, [id=#112] > +- *(3) Filter isnotnull(incdata_id#33) > +- Scan hive default.t2 [incdata_id#33, v#34], HiveTableRelation > `default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [incdata_id#33, v#34], Statistics(sizeInBytes=8.0 EiB) > {code} > {code:sql} > select cast(v1 as double) as v3, cast(v2 as double) as v4, > cast(v1 as double) = cast(v2 as double), v1 = v2 > from (select cast('100000000001636981212' as decimal(21, 0)) as v1, > cast('100000000001636981213' as decimal(21, 0)) as v2) t; > 1.0000000000163697E20 1.0000000000163697E20 true false > {code} > > It's a realy case in our production: > !image-2019-09-27-20-20-24-238.png|width=100%! -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org