[jira] [Updated] (HIVE-28735) TPCDS queries q15, q19 are failing when hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled is set to true

Paramvir Singh (Jira) Tue, 04 Feb 2025 05:07:24 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-28735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Paramvir Singh updated HIVE-28735:
----------------------------------
    Description: 
TPCDS queries q15, q19 are failing when 
hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled  is set to 
true. 
Setup should include atleast 2 node cluster.  It's passing when the cluster has 
only 1 node. 
The wrong result is also random(on each run I get different random wrong 
values).

Small repro query on TPCDS dataset

{code:java}
select ca_zip, count(*)
from catalog_sales_small, customer_small, customer_address_small
where cs_bill_customer_sk = c_customer_sk
and c_current_addr_sk = ca_address_sk
group by ca_zip
order by ca_zip
limit 100;
{code}



If we set the following properties, we get correct results


{code:java}
set hive.vectorized.execution.enabled=false; - Correct results

{code}

OR

{code:java}
set hive.auto.convert.join=false; - Correct results

{code}


  was:
TPCDS queries q15, q19, q23a, q45, q58 are failing when 
hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled  is set to 
true. 
Setup should include atleast 2 node cluster.  It's passing when the cluster has 
only 1 node. 
The wrong result is also random(on each run I get different random wrong 
values).

Small repro query on TPCDS dataset

{code:java}
select ca_zip, count(*)
from catalog_sales_small, customer_small, customer_address_small
where cs_bill_customer_sk = c_customer_sk
and c_current_addr_sk = ca_address_sk
group by ca_zip
order by ca_zip
limit 100;
{code}



If we set the following properties, we get correct results


{code:java}
set hive.vectorized.execution.enabled=false; - Correct results

{code}

OR

{code:java}
set hive.auto.convert.join=false; - Correct results

{code}



> TPCDS queries q15, q19 are failing when 
> hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled  is set to 
> true
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-28735
>                 URL: https://issues.apache.org/jira/browse/HIVE-28735
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Vectorization
>    Affects Versions: 4.0.0, 4.0.1
>            Reporter: Paramvir Singh
>            Priority: Major
>              Labels: hive-4.1.0-must
>
> TPCDS queries q15, q19 are failing when 
> hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled  is set to 
> true. 
> Setup should include atleast 2 node cluster.  It's passing when the cluster 
> has only 1 node. 
> The wrong result is also random(on each run I get different random wrong 
> values).
> Small repro query on TPCDS dataset
> {code:java}
> select ca_zip, count(*)
> from catalog_sales_small, customer_small, customer_address_small
> where cs_bill_customer_sk = c_customer_sk
> and c_current_addr_sk = ca_address_sk
> group by ca_zip
> order by ca_zip
> limit 100;
> {code}
> If we set the following properties, we get correct results
> {code:java}
> set hive.vectorized.execution.enabled=false; - Correct results
> {code}
> OR
> {code:java}
> set hive.auto.convert.join=false; - Correct results
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28735) TPCDS queries q15, q19 are failing when hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled is set to true

Reply via email to