RussellSpitzer commented on issue #5626:
URL: https://github.com/apache/iceberg/issues/5626#issuecomment-1641070362

   I think the distribution math there is a bit off since it assumes that the 
skew is only present at the first bit of the hashing function. The assumption 
that increasing the number of buckets evenly divides the skew is a bit of an 
issue since this assumes the skew is generally present and correlated with the 
hashing function but only at the first bit.
   
   That said I did run some experiments on my own and while there wasn't a ton 
of difference between composition and running the function on it's own, there 
was a benefit to using the combined function. I'll try to finish up my test 
framework for running more examples.
   
   Anyway we probably do need to support multi-arg transforms within Iceberg at 
some point, so this may be a good time to start a design document and work 
towards adding that to spec as a First step.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to