[ 
https://issues.apache.org/jira/browse/HIVE-19890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16520849#comment-16520849
 ] 

Gopal V commented on HIVE-19890:
--------------------------------

Thanks, I've added comments to describe how the createDynamicBucket() works 
like multi-file spray.

{code}
+    /**
+     * This method is intended for use with ACID unbucketed tables, where the 
DELETE ops behave as
+     * though they are bucketed, but without an explicit pre-specified bucket 
count. The bucketNum
+     * is read out of the middle value of the ROW__ID variable and this is 
written out from a single
+     * FileSink, in ways similar to the multi file spray, but without knowing 
the total number of
+     * buckets ahead of time.
+     *
+     * ROW__ID (1,2[0],3) => bucket_00002
+     * ROW__ID (1,3[0],4) => bucket_00003 etc
+     *
+     * A new FSP is created for each partition, so this only requires the 
bucket numbering and that
+     * is mapped in directly as an index.
+     */
{code}

> ACID: Inherit bucket-id from original ROW_ID for delete deltas
> --------------------------------------------------------------
>
>                 Key: HIVE-19890
>                 URL: https://issues.apache.org/jira/browse/HIVE-19890
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 3.0.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Major
>         Attachments: HIVE-19890.1.patch, HIVE-19890.2.patch, 
> HIVE-19890.3.patch
>
>
> The ACID delete deltas for unbucketed tables are written to arbitrary files, 
> which should instead be shuffled using the bucket-id instead of hash(ROW__ID).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to