RussellSpitzer commented on issue #5626: URL: https://github.com/apache/iceberg/issues/5626#issuecomment-1641070362
I think the distribution math there is a bit off since it assumes that the skew is only present at the first bit of the hashing function. The assumption that increasing the number of buckets evenly divides the skew is a bit of an issue since this assumes the skew is generally present and correlated with the hashing function but only at the first bit. That said I did run some experiments on my own and while there wasn't a ton of difference between composition and running the function on it's own, there was a benefit to using the combined function. I'll try to finish up my test framework for running more examples. Anyway we probably do need to support multi-arg transforms within Iceberg at some point, so this may be a good time to start a design document and work towards adding that to spec as a First step. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
