Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/6075
Hi @zhangminglei
Sorry for the late response - I thought about this solution quite a bit and
came to the conclusion that we may need to do a bit more for efficient results:
Please take a look at
[FLINK-9749](https://issues.apache.org/jira/browse/FLINK-9749) and the subtask
[FLINK-9753](https://issues.apache.org/jira/browse/FLINK-9753)
The description outlines why I believe the simple approach suggested here
may not be enough (will frequently result in badly compressed ORC/Parquet).
We have already started this effort to completely redesign the
BucketingSink. The initial work-in-progress looks quite promising.
---