[
https://issues.apache.org/jira/browse/HIVE-19890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16520849#comment-16520849
]
Gopal V commented on HIVE-19890:
--------------------------------
Thanks, I've added comments to describe how the createDynamicBucket() works
like multi-file spray.
{code}
+ /**
+ * This method is intended for use with ACID unbucketed tables, where the
DELETE ops behave as
+ * though they are bucketed, but without an explicit pre-specified bucket
count. The bucketNum
+ * is read out of the middle value of the ROW__ID variable and this is
written out from a single
+ * FileSink, in ways similar to the multi file spray, but without knowing
the total number of
+ * buckets ahead of time.
+ *
+ * ROW__ID (1,2[0],3) => bucket_00002
+ * ROW__ID (1,3[0],4) => bucket_00003 etc
+ *
+ * A new FSP is created for each partition, so this only requires the
bucket numbering and that
+ * is mapped in directly as an index.
+ */
{code}
> ACID: Inherit bucket-id from original ROW_ID for delete deltas
> --------------------------------------------------------------
>
> Key: HIVE-19890
> URL: https://issues.apache.org/jira/browse/HIVE-19890
> Project: Hive
> Issue Type: Bug
> Components: Transactions
> Affects Versions: 3.0.0
> Reporter: Gopal V
> Assignee: Gopal V
> Priority: Major
> Attachments: HIVE-19890.1.patch, HIVE-19890.2.patch,
> HIVE-19890.3.patch
>
>
> The ACID delete deltas for unbucketed tables are written to arbitrary files,
> which should instead be shuffled using the bucket-id instead of hash(ROW__ID).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)