This is an automated email from the ASF dual-hosted git repository.
danny0405 pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new e116d04cf10 [DOCS] Update run_clustering procedure docs (#9989)
e116d04cf10 is described below
commit e116d04cf1001e8a40d7faa18783aafe5a89b9b1
Author: ksmou <[email protected]>
AuthorDate: Mon Nov 6 17:48:27 2023 +0800
[DOCS] Update run_clustering procedure docs (#9989)
---
website/docs/procedures.md | 73 ++++++++++++++++++++++++++++++++++++++++------
1 file changed, 64 insertions(+), 9 deletions(-)
diff --git a/website/docs/procedures.md b/website/docs/procedures.md
index ba2d1c06968..e30abf4e090 100644
--- a/website/docs/procedures.md
+++ b/website/docs/procedures.md
@@ -1238,8 +1238,8 @@ Trigger clustering on a hoodie table. By using partition
predicates, clustering
with specified partitions, and you can also specify the order columns to sort
data.
:::note
-Newly clustering instant will be generated every call, and all pending
clustering instants are executed.
-When calling this procedure, one of parameters ``table`` and ``path`` must be
specified at least. If both
+Newly clustering instant will be generated every call, or some pending
clustering instants are executed.
+When calling this procedure, one of parameters ``table`` and ``path`` must be
specified at least. If both
parameters are given, ``table`` will take effect.
:::
@@ -1247,16 +1247,30 @@ parameters are given, ``table`` will take effect.
**Input**
-| Parameter Name | Type | Required | Default Value | Description
|
-|----------------|--------|----------|---------------|-------------------------------|
-| table | String | N | None | Name of table to be
clustered |
-| path | String | N | None | Path of table to be
clustered |
-| predicate | String | N | None | Predicate to filter
partition |
-| order | String | N | None | Order column split by
`,` |
+| Parameter Name | Type | Required | Default Value | Description
|
+|-------------------------|---------|----------|---------------|----------------------------------------------------------------|
+| table | String | N | None | Name of table
to be clustered |
+| path | String | N | None | Path of table
to be clustered |
+| predicate | String | N | None | Predicate to
filter partition |
+| order | String | N | None | Order column
split by `,` |
+| show_involved_partition | Boolean | N | false | Show involved
partition in the output |
+| op | String | N | None | Operation
type, `EXECUTE` or `SCHEDULE` |
+| order_strategy | String | N | None | Records
layout optimization, `linear/z-order/hilbert` |
+| options | String | N | None | Customize
hudi configs in the format "key1=value1,key2=value2` |
+| instants | String | N | None | Specified
instants by `,` |
+| selected_partitions | String | N | None | Partitions to
run clustering by `,` |
+| limit | Int | N | None | Max number of
plans to be executed |
**Output**
-Empty
+The output as follows:
+
+| Parameter Name | Type | Required | Default Value | Description
|
+|---------------------|--------|----------|---------------|------------------------------------------|
+| timestamp | String | N | None | Instant name
|
+| input_group_size | Int | N | None | The input group
sizes for each plan |
+| state | String | N | None | The instant final
state |
+| involved_partitions | String | N | * | Show involved
partitions, default is `*` |
**Example**
@@ -1275,6 +1289,47 @@ Clustering test_hudi_table with table name, predicate
and order column
call run_clustering(table => 'test_hudi_table', predicate => 'ts <=
20220408L', order => 'ts');
```
+Clustering test_hudi_table with table name, show_involved_partition
+```
+call run_clustering(table => 'test_hudi_table', show_involved_partition =>
true);
+```
+
+Clustering test_hudi_table with table name, op
+```
+call run_clustering(table => 'test_hudi_table', op => 'schedule');
+```
+
+Clustering test_hudi_table with table name, order_strategy
+```
+call run_clustering(table => 'test_hudi_table', order_strategy => 'z-order');
+```
+
+Clustering test_hudi_table with table name, op, options
+```
+call run_clustering(table => 'test_hudi_table', op => 'schedule', options => '
+hoodie.clustering.plan.strategy.target.file.max.bytes=1024*1024*1024,
+hoodie.clustering.plan.strategy.max.bytes.per.group=2*1024*1024*1024');
+```
+
+Clustering test_hudi_table with table name, op, instants
+```
+call run_clustering(table => 'test_hudi_table', op => 'execute', instants =>
'ts1,ts2');
+```
+
+Clustering test_hudi_table with table name, op, selected_partitions
+```
+call run_clustering(table => 'test_hudi_table', op => 'execute',
selected_partitions => 'par1,par2');
+```
+
+Clustering test_hudi_table with table name, op, limit
+```
+call run_clustering(table => 'test_hudi_table', op => 'execute', limit => 10);
+```
+:::note
+Limit parameter is valid only when op is execute.
+
+:::
+
### show_clustering
Show pending clusterings on a hoodie table.