[jira] [Commented] (SPARK-29274) Can not coerce decimal type to double type when it's join key

Yuming Wang (Jira) Fri, 27 Sep 2019 22:55:04 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-29274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939867#comment-16939867
 ]


Yuming Wang commented on SPARK-29274:
-------------------------------------

I don't know how Vertica works. But it returns incorrect values.
{noformat}
create table t1 (incdata_id decimal(21,0), v VARCHAR);
create table t2 (incdata_id VARCHAR, v VARCHAR);
insert into t1 values(100000000001636981212, '1');
insert into t2 values(100000000001636981213, '2');


dbadmin=> select * from t1 join t2 on (t1.incdata_id = t2.incdata_id);
      incdata_id       | v |      incdata_id       | v
-----------------------+---+-----------------------+---
 100000000001636981212 | 1 | 100000000001636981213 | 2
(1 row)

dbadmin=> explain select * from t1 join t2 on (t1.incdata_id = t2.incdata_id);
                                                                                
                                   QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------

 explain select * from t1 join t2 on (t1.incdata_id = t2.incdata_id);

 Access Path:
 +-JOIN HASH [Cost: 4, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
 |  Join Cond: (t1.incdata_id = t2.incdata_id)
 |  Materialize at Output: t1.v
 | +-- Outer -> STORAGE ACCESS for t1 [Cost: 1, Rows: 1 (NO STATISTICS)] (PATH 
ID: 2)
 | |      Projection: public.t1_super
 | |      Materialize: t1.incdata_id
 | |      Runtime Filter: (SIP1(HashJoin): t1.incdata_id)
 | +-- Inner -> STORAGE ACCESS for t2 [Cost: 2, Rows: 1 (NO STATISTICS)] (PATH 
ID: 3)
 | |      Projection: public.t2_super
 | |      Materialize: t2.incdata_id, t2.v


 ------------------------------
 -----------------------------------------------
 PLAN: BASE QUERY PLAN (GraphViz Format)
 -----------------------------------------------
 digraph G {
 graph [rankdir=BT, label = "BASE QUERY PLAN\nQuery: explain select * from t1 
join t2 on (t1.incdata_id = t2.incdata_id);\n\nAll Nodes Vector: \n\n  
node[0]=v_docker_node0001 (initiator) Up\n", labelloc=t, labeljust=l 
ordering=out]
 0[label = "Root \nOutBlk=[UncTuple(4)]", color = "green", shape = "house"];
 1[label = "NewEENode \nOutBlk=[UncTuple(4)]", color = "green", shape = "box"];
 2[label = "StorageUnionStep: t1_super\nUnc: Numeric(21,0)\nUnc: 
Varchar(80)\nUnc: Varchar(80)\nUnc: Varchar(80)", color = "purple", shape = 
"box"];
 3[label = "Join: Hash-Join: \n(public.t1 x public.t2) using t1_super and 
t2_super (PATH ID: 1)\n (t1.incdata_id = t2.incdata_id)\n\nUnc: 
Numeric(21,0)\nUnc: Varchar(80)\nUnc: Varchar(80)\nUnc: Varchar(80)", color = 
"brown", shape = "box"];
 4[label = "FilterStep: \n(t1.incdata_id IS NOT NULL)\nUnc: Numeric(21,0)", 
color = "brown", shape = "box"];
 5[label = "ScanStep: t1_super\nSIP1(HashJoin): t1.incdata_id\nincdata_id\nUnc: 
Numeric(21,0)", color = "brown", shape = "box"];
 6[label = "FilterStep: \n(t2.incdata_id IS NOT NULL)\nUnc: Varchar(80)\nUnc: 
Varchar(80)", color = "green", shape = "box"];
 7[label = "StorageUnionStep: t2_super\nUnc: Varchar(80)\nUnc: Varchar(80)", 
color = "purple", shape = "box"];
 8[label = "ScanStep: t2_super\nincdata_id\nv\nUnc: Varchar(80)\nUnc: 
Varchar(80)", color = "brown", shape = "box"];
 1->0 [label = "V[0] C=4",color = "black",style="bold", arrowtail="inv"];
 2->1 [label = "0",color = "blue"];
 3->2 [label = "0",color = "blue"];
 4->3 [label = "0",color = "blue"];
 5->4 [label = "0",color = "blue"];
 6->3 [label = "1",color = "blue"];
 7->6 [label = "0",color = "blue"];
 8->7 [label = "0",color = "blue"];
 }
(43 rows)
{noformat}

> Can not coerce decimal type to double type when it's join key
> -------------------------------------------------------------
>
>                 Key: SPARK-29274
>                 URL: https://issues.apache.org/jira/browse/SPARK-29274
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.4, 2.4.4, 3.0.0
>            Reporter: Yuming Wang
>            Assignee: Pengfei Chang
>            Priority: Major
>         Attachments: image-2019-09-27-20-20-24-238.png
>
>
> How to reproduce this issue:
> {code:sql}
> create table t1 (incdata_id decimal(21,0), v string) using parquet;
> create table t2 (incdata_id string, v string) using parquet;
> explain select * from t1 join t2 on (t1.incdata_id = t2.incdata_id);
> == Physical Plan ==
> *(5) SortMergeJoin 
> [knownfloatingpointnormalized(normalizenanandzero(cast(incdata_id#31 as 
> double)))], 
> [knownfloatingpointnormalized(normalizenanandzero(cast(incdata_id#33 as 
> double)))], Inner
> :- *(2) Sort 
> [knownfloatingpointnormalized(normalizenanandzero(cast(incdata_id#31 as 
> double))) ASC NULLS FIRST], false, 0
> :  +- Exchange 
> hashpartitioning(knownfloatingpointnormalized(normalizenanandzero(cast(incdata_id#31
>  as double))), 200), true, [id=#104]
> :     +- *(1) Filter isnotnull(incdata_id#31)
> :        +- Scan hive default.t1 [incdata_id#31, v#32], HiveTableRelation 
> `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [incdata_id#31, v#32], Statistics(sizeInBytes=8.0 EiB)
> +- *(4) Sort 
> [knownfloatingpointnormalized(normalizenanandzero(cast(incdata_id#33 as 
> double))) ASC NULLS FIRST], false, 0
>    +- Exchange 
> hashpartitioning(knownfloatingpointnormalized(normalizenanandzero(cast(incdata_id#33
>  as double))), 200), true, [id=#112]
>       +- *(3) Filter isnotnull(incdata_id#33)
>          +- Scan hive default.t2 [incdata_id#33, v#34], HiveTableRelation 
> `default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [incdata_id#33, v#34], Statistics(sizeInBytes=8.0 EiB)
> {code}
> {code:sql}
> select cast(v1 as double) as v3, cast(v2 as double) as v4,
>   cast(v1 as double) = cast(v2 as double), v1 = v2 
> from (select cast('100000000001636981212' as decimal(21, 0)) as v1,
>       cast('100000000001636981213' as decimal(21, 0)) as v2) t;
> 1.0000000000163697E20 1.0000000000163697E20   true    false
> {code}
>  
> It's a realy case in our production:
> !image-2019-09-27-20-20-24-238.png|width=100%!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29274) Can not coerce decimal type to double type when it's join key

Reply via email to