[jira] [Updated] (HIVE-29166) Repeated MERGE query generates duplicates

Dmitriy Fingerman (Jira) Fri, 29 Aug 2025 11:51:08 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-29166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dmitriy Fingerman updated HIVE-29166:
-------------------------------------
    Description: 
The attached sql script with a repeated MERGE query generates duplicates.

If any of the following 2 changes are done to the script than there are no 
duplicates:
 # hive.auto.convert.join=true –> hive.auto.convert.join=false
 # The order of columns in CLUSTER BY doesn't match the order of columns in 
CREATE TABLE. If the order matches then there are no duplicates.

It was also found that a query like below returns wrong results:
{code:java}
select * from omsexternal_order_mapping_backup 
left outer join omsexternal_order_mapping__2025_08_26_03__transactional on 
...{code}
This is what MERGE query does under the hood.

Changing the order of the columns to match order in CLUSTER BY with the order 
of the columns in CREATE TABLE can be considered as a workaround for this issue.

  was:
The attached sql script with a repeated MERGE query generates duplicates.

If any of the following 2 changes are done to the script than there are no 
duplicates:
 # hive.auto.convert.join=true –> hive.auto.convert.join=false
 # The order of columns in CLUSTER BY doesn't match the order of columns in 
CREATE TABLE. If the order matches then there are no duplicates.

It was also found that a query like below returns wrong results:
{code:java}
select * from omsexternal_order_mapping_backup 
left outer join omsexternal_order_mapping__2025_08_26_03__transactional on 
...{code}
This is what MERGE query does under the hood.


> Repeated MERGE query generates duplicates
> -----------------------------------------
>
>                 Key: HIVE-29166
>                 URL: https://issues.apache.org/jira/browse/HIVE-29166
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Dmitriy Fingerman
>            Assignee: Seonggon Namgung
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: merge_duplicates.q
>
>
> The attached sql script with a repeated MERGE query generates duplicates.
> If any of the following 2 changes are done to the script than there are no 
> duplicates:
>  # hive.auto.convert.join=true –> hive.auto.convert.join=false
>  # The order of columns in CLUSTER BY doesn't match the order of columns in 
> CREATE TABLE. If the order matches then there are no duplicates.
> It was also found that a query like below returns wrong results:
> {code:java}
> select * from omsexternal_order_mapping_backup 
> left outer join omsexternal_order_mapping__2025_08_26_03__transactional on 
> ...{code}
> This is what MERGE query does under the hood.
> Changing the order of the columns to match order in CLUSTER BY with the order 
> of the columns in CREATE TABLE can be considered as a workaround for this 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-29166) Repeated MERGE query generates duplicates

Reply via email to