Please help us test this more, before RC is cut! :) On Tue, Dec 22, 2020 at 10:23 PM Satish Kotha <[email protected]> wrote:
> Hello all, > > Clustering feature landed <https://github.com/apache/hudi/pull/2263> on > master branch and is available in beta. This feature can be used to do > following > 1) Stitch small files into larger files > 2) Change data layout on disk by sorting data using different columns (for > query/storage optimization) > > If you are interested in the above use cases, appreciate it if you can try > out this feature. I have included commands to run clustering in this > section > < > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+speed+and+query+performance#RFC19Clusteringdataforspeedandqueryperformance-Commandstoscheduleandrunclustering > > > (along > with caveats as this feature is still in beta). > > Any feedback is welcome. I'm also on #general room in slack. Please feel > free to ping me if you have any questions/comments. > > Thanks > Satish >
