zhangyue19921010 commented on issue #12699: URL: https://github.com/apache/hudi/issues/12699#issuecomment-2642684431
> [@zhangyue19921010](https://github.com/zhangyue19921010) We have implemented dynamic partition bucketing, which supports regular expressions, similar to your idea. The only difference is that we store bucket information in the ./hoodie/.bucket directory. Since the bucket information is minimal, it's efficient to store it in a single file. This approach simplifies the process of retrieving partition-level bucket counts and performing bucket pruning. At the same time, with the help of Hudi's timeline, we can easily ensure the consistency of bucket information Hi @xiarixiaoyao Thanks for your replay! It seems that dynamic partition-level bucket index is indeed a common requirement. `./hoodie/.bucket directory` is a good idea. But how to solve the problem of two jobs concurrently writing? At this time, there may be multiple tasks operating on the partition meta file(Even if wrote to different partitions)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
