[ 
https://issues.apache.org/jira/browse/HIVE-26673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa resolved HIVE-26673.
----------------------------------------
    Fix Version/s: 4.0.0
       Resolution: Fixed

> Incorrect row count when vectorisation is enabled
> -------------------------------------------------
>
>                 Key: HIVE-26673
>                 URL: https://issues.apache.org/jira/browse/HIVE-26673
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Simhadri Govindappa
>            Priority: Major
>             Fix For: 4.0.0
>
>
> Repro:
> {noformat}
> select count(*) from
> (SELECT T0.plant_no,
> T0.part_chain,
> T0.part_new,
> T0.part_no
> FROM dm_ads_dims_prod.cloudera_test3 T0
> LEFT JOIN
> (SELECT T0.plant_no,
> T0.part_chain
> FROM
> (SELECT T0.plant_no,
> T0.part_chain,
> count( *) AS ct
> FROM dm_ads_dims_prod.cloudera_test3 T0
> WHERE purchase_pos = pos
> GROUP BY T0.plant_no,
> T0.part_chain) T0
> WHERE ct = 2 ) T1 ON T0.plant_no = T1.plant_no
> AND T0.part_chain = T1.part_chain
> WHERE T0.purchase_pos = T0.pos
> AND (T1.part_chain IS NULL
> OR (T1.part_chain IS NOT NULL
> AND T0.fd = 1)) ) s;
> {noformat}
> Run the query with the following settings on the repro cluster a few times
> {code:java}
> set hive.query.results.cache.enabled=false;
> set hive.compute.query.using.stats=false;
> set hive.auto.convert.join=true;
> {code}
> and the results was
> {code:java}
> 2682424
> 2682426
> 2682425{code}
>  
> Then turn off {{hive.auto.convert.join}}
> {code:java}
> set hive.query.results.cache.enabled=false;
> set hive.compute.query.using.stats=false;
> set hive.auto.convert.join=false;
> {code}
> and the result was always *2682420*
> Analyzing the plans with hive.auto.convert.join enabled vs disabled, the 
> difference is the type of join Map vs Merge.
> Additionally, vectorization also plays a role when turned off the result 
> became good:
> {code:java}
> SET hive.vectorized.execution.enabled=false;
> {code}
> It is also just a workaround and has negative impact on performance this 
> should help us narrow down where to find the cause of the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to