Hello all,

Clustering feature landed <https://github.com/apache/hudi/pull/2263> on
master branch and is available in beta. This feature can be used to do
following
1) Stitch small files into larger files
2) Change data layout on disk by sorting data using different columns (for
query/storage optimization)

If you are interested in the above use cases, appreciate it if you can try
out this feature. I have included commands to run clustering in this section
<https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+speed+and+query+performance#RFC19Clusteringdataforspeedandqueryperformance-Commandstoscheduleandrunclustering>
(along
with caveats as this feature is still in beta).

Any feedback is welcome. I'm also on #general room in slack. Please feel
free to ping me if you have any questions/comments.

Thanks
Satish

Reply via email to