stevenzwu commented on PR #7161: URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1861850713
It is reverted because there are users depending on the previous behavior of keyBy all partition columns. https://github.com/apache/iceberg/pull/7161#issuecomment-1761169778 We were assuming that if there is a bucket column, users only want to shuffle by the bucketing column. that is not the case from the user report linked in the above comment. so we decided to roll back for backward compatibility. @bendevera you are right that `BucketPartitioner` isn't public and can't be used at the moment. Now we need to discuss what's the best way moving forward? we are working on a more comprehensive smart shuffling (range partition) feature: https://github.com/apache/iceberg/projects/27. I am thinking maybe we can expose this in `range` distribution mode. before that, you may have to copy the code and manually apply the bucketing shuffling. ``` input.partitionCustom( new BucketPartitioner(partitionSpec), new BucketPartitionKeySelector(partitionSpec, iSchema, flinkRowType)); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
