[GitHub] [hudi] codope commented on a diff in pull request #7985: [DOCS] Update clustering docs

via GitHub Tue, 21 Feb 2023 03:58:35 -0800


codope commented on code in PR #7985:
URL: https://github.com/apache/hudi/pull/7985#discussion_r1112956709



##########
website/docs/clustering.md:
##########
@@ -51,8 +62,147 @@ NOTE: Clustering can only be scheduled for tables / 
partitions not receiving any
 ![Clustering 
example](/assets/images/blog/clustering/example_perf_improvement.png)
 _Figure: Illustrating query performance improvements by clustering_
 
-### Setting up clustering
-Inline clustering can be setup easily using spark dataframe options. See 
sample below
+## Clustering Usecases
+
+### Batching small files
+
+As mentioned in the intro, streaming ingestion generally results in smaller 
files in your data lake. But having a lot of
+such small files could bring down your query latency. From our experience 
supporting community users, there are quite a

Review Comment:
   Ah yes. Good catch Sudha! Will correct it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] codope commented on a diff in pull request #7985: [DOCS] Update clustering docs

Reply via email to