[
https://issues.apache.org/jira/browse/HIVE-22636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Turoczy updated HIVE-22636:
----------------------------------
Labels: check hive-4.0.0-must (was: )
> Data loss on skewjoin for ACID tables.
> --------------------------------------
>
> Key: HIVE-22636
> URL: https://issues.apache.org/jira/browse/HIVE-22636
> Project: Hive
> Issue Type: Bug
> Affects Versions: 4.0.0
> Reporter: Aditya Shah
> Priority: Blocker
> Labels: check, hive-4.0.0-must
>
> I am trying to do a skewjoin and writing the result into a FullAcid table.
> The results are incorrect. The issue is similar to seen for MM tables in
> HIVE-16051 where the fix was to skip having a skewjoin for MM table.
> Steps to reproduce:
> Used a qtest similar to HIVE-16051:
> {code:java}
> --! qt:dataset:src1
> --! qt:dataset:src
> -- MASK_LINEAGE
> set hive.mapred.mode=nonstrict;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.optimize.skewjoin=true;
> set hive.skewjoin.key=2;
> set hive.optimize.metadataonly=false;
> CREATE TABLE skewjoin_acid(key INT, value STRING) STORED AS ORC tblproperties
> ("transactional"="true");
> FROM src src1 JOIN src src2 ON (src1.key = src2.key) INSERT into TABLE
> skewjoin_acid SELECT src1.key, src2.value;
> select count(distinct key) from skewjoin_acid;
> drop table skewjoin_acid;
> {code}
> The expected result for the count was 309 but got 173.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)