[GitHub] [hudi] parisni commented on pull request #8657: [HUDI-6150] Support bucketing for each hive client

via GitHub Wed, 10 May 2023 05:46:22 -0700


parisni commented on PR #8657:
URL: https://github.com/apache/hudi/pull/8657#issuecomment-1542146979


   Hardcoding Murmur is likely a good idea but it would break existing bucketed 
tables. Also it would!t support hive2 users. 
   
   As for file naming I suspect by adding the bucket also before the mime type 
(and keeping the prefix, so 
`${bucketId}_${filegroupId}_${UUID}_${timestamp}_{bucketid}.parquet/log` it 
would allow to support both spark 2 and all spark 3 releases.
   
   On May 10, 2023 8:58:26 AM UTC, Danny Chan ***@***.***> wrote:
   >> > , I'm afraid the algorithm should be in-consistency too in order to 
operate the bucket pruning opimization
   >> 
   >> not sure to understand. Do you mean the hashing algorithm must be the 
same as the target engine ? The answer is definitely yes
   >
   >Yes, I guess so, because that is how the bucket pruning works, I'm 
wondering whether we should make the bucketing alsorithm configurable, it 
should be feasible if we use the Hive `murmur3hash` algorithm.
   >
   >-- 
   >Reply to this email directly or view it on GitHub:
   >https://github.com/apache/hudi/pull/8657#issuecomment-1541617605
   >You are receiving this because you authored the thread.
   >
   >Message ID: ***@***.***>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] parisni commented on pull request #8657: [HUDI-6150] Support bucketing for each hive client

Reply via email to