codope commented on code in PR #7985: URL: https://github.com/apache/hudi/pull/7985#discussion_r1112956709
########## website/docs/clustering.md: ########## @@ -51,8 +62,147 @@ NOTE: Clustering can only be scheduled for tables / partitions not receiving any  _Figure: Illustrating query performance improvements by clustering_ -### Setting up clustering -Inline clustering can be setup easily using spark dataframe options. See sample below +## Clustering Usecases + +### Batching small files + +As mentioned in the intro, streaming ingestion generally results in smaller files in your data lake. But having a lot of +such small files could bring down your query latency. From our experience supporting community users, there are quite a Review Comment: Ah yes. Good catch Sudha! Will correct it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
