[
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eugene Koifman updated HIVE-16832:
----------------------------------
Attachment: HIVE-16832.01.patch
HIVE-16832.01.patch is an incomplete WIP
VectorizedOrcAcidRowBatchReader assumes that ROW__ID.bucketId is the same in
each split (and each bucket file of a delete_delta) which is no longer the case
SortedDynPartitionOptimizer needs to ensure that data is sorted by
by (ROW__ID.bucketId%numBuckets) before it's sorted by ROW__ID so that
FileSinkOperator.process() sees all rows for a given bucket equivalence set
before moving on to the next equivalence set.
> duplicate ROW__ID possible in multi insert into transactional table
> -------------------------------------------------------------------
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
> Issue Type: Bug
> Components: Transactions
> Affects Versions: 2.2.0
> Reporter: Eugene Koifman
> Assignee: Eugene Koifman
> Priority: Critical
> Attachments: HIVE-16832.01.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)