Re: [I] [SUPPORT] Wrong filtering of values for bucket hashes [hudi]

via GitHub Fri, 25 Oct 2024 03:28:50 -0700


geserdugarov commented on issue #12155:
URL: https://github.com/apache/hudi/issues/12155#issuecomment-2437418537


   > In a result, current filtering for bucket hash is broken, and two examples 
above show that even for usual jobs, we could get unexpected distribution of 
data. But fixing of this filtering could lead to duplication of some data 
written before and after fix (if we wrote part of data using current hash, and 
continued after upgrade to write data using fixed hash).
   > 
   > From my point of view there are not much possibilities:
   > 
   >     * to break for some specific scenarios backward compatibility, 
possibly attach it to 1.0 release, which already has drastical changes, (fix is 
ready in open MR [[HUDI-8403] Fixed values extraction for bucketing and 
optimized `KeyGenUtils::extractRecordKeysByFields` 
#12120](https://github.com/apache/hudi/pull/12120))
   > 
   >     * leave filtering as it is, with wrong behavior (I will change MR with 
optimization correspondingly).
   
   @danny0405 @yihua @codope , if you don't mind, could you please, write your 
opinion about this situation?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [SUPPORT] Wrong filtering of values for bucket hashes [hudi]

Reply via email to