geserdugarov commented on issue #12155: URL: https://github.com/apache/hudi/issues/12155#issuecomment-2437418537
> In a result, current filtering for bucket hash is broken, and two examples above show that even for usual jobs, we could get unexpected distribution of data. But fixing of this filtering could lead to duplication of some data written before and after fix (if we wrote part of data using current hash, and continued after upgrade to write data using fixed hash). > > From my point of view there are not much possibilities: > > * to break for some specific scenarios backward compatibility, possibly attach it to 1.0 release, which already has drastical changes, (fix is ready in open MR [[HUDI-8403] Fixed values extraction for bucketing and optimized `KeyGenUtils::extractRecordKeysByFields` #12120](https://github.com/apache/hudi/pull/12120)) > > * leave filtering as it is, with wrong behavior (I will change MR with optimization correspondingly). @danny0405 @yihua @codope , if you don't mind, could you please, write your opinion about this situation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
