[jira] [Updated] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

Eugene Koifman (JIRA) Tue, 06 Jun 2017 16:17:32 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eugene Koifman updated HIVE-16832:
----------------------------------
    Attachment: HIVE-16832.01.patch

HIVE-16832.01.patch is an incomplete WIP
VectorizedOrcAcidRowBatchReader assumes that ROW__ID.bucketId is the same in 
each split (and each bucket file of a delete_delta) which is no longer the case

SortedDynPartitionOptimizer needs to ensure that data is sorted by 
by (ROW__ID.bucketId%numBuckets) before it's sorted by ROW__ID so that
FileSinkOperator.process() sees all rows for a given bucket equivalence set 
before moving on to the next equivalence set.  

> duplicate ROW__ID possible in multi insert into transactional table
> -------------------------------------------------------------------
>
>                 Key: HIVE-16832
>                 URL: https://issues.apache.org/jira/browse/HIVE-16832
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 2.2.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>            Priority: Critical
>         Attachments: HIVE-16832.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

Reply via email to