This is an automated email from the ASF dual-hosted git repository.
yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new d591b08aa09 [DOCS] Add explanation for deduplicate command (#10498)
d591b08aa09 is described below
commit d591b08aa09512089ba16e640eccdc771e180fc1
Author: Santhosh Kumar M <[email protected]>
AuthorDate: Fri Mar 1 13:18:52 2024 +0530
[DOCS] Add explanation for deduplicate command (#10498)
Co-authored-by: Y Ethan Guo <[email protected]>
---
website/docs/procedures.md | 6 +++---
website/versioned_docs/version-0.13.1/procedures.md | 6 +++---
website/versioned_docs/version-0.14.0/procedures.md | 6 +++---
website/versioned_docs/version-0.14.1/procedures.md | 6 +++---
4 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/website/docs/procedures.md b/website/docs/procedures.md
index 10c3ee853ec..49e8de775a5 100644
--- a/website/docs/procedures.md
+++ b/website/docs/procedures.md
@@ -1753,7 +1753,7 @@ call repair_corrupted_clean_files(table =>
'test_hudi_table');
### repair_deduplicate
-Repair deduplicate records for a hudi table.
+Repair deduplicate records for a hudi table. The job dedupliates the data in
the duplicated_partition_path and writes it into repaired_output_path. In the
end of the job, the data in repaired_output_path is copied into the original
path (duplicated_partition_path).
**Input**
@@ -1774,12 +1774,12 @@ Repair deduplicate records for a hudi table.
**Example**
```
-call repair_deduplicate(table => 'test_hudi_table', duplicated_partition_path
=> 'dt=2021-05-03', repaired_output_path => 'dt=2021-05-04');
+call repair_deduplicate(table => 'test_hudi_table', duplicated_partition_path
=> 'dt=2021-05-03', repaired_output_path => '/tmp/repair_path/');
```
| result |
|----------------------------------------------|
-| Reduplicated files placed in: dt=2021-05-04. |
+| Reduplicated files placed in: /tmp/repair_path/. |
### repair_migrate_partition_meta
diff --git a/website/versioned_docs/version-0.13.1/procedures.md
b/website/versioned_docs/version-0.13.1/procedures.md
index 1144efc8d52..dbaf9f36acd 100644
--- a/website/versioned_docs/version-0.13.1/procedures.md
+++ b/website/versioned_docs/version-0.13.1/procedures.md
@@ -1645,7 +1645,7 @@ call repair_corrupted_clean_files(table =>
'test_hudi_table');
### repair_deduplicate
-Repair deduplicate records for a hudi table.
+Repair deduplicate records for a hudi table. The job dedupliates the data in
the duplicated_partition_path and writes it into repaired_output_path. In the
end of the job, the data in repaired_output_path is copied into the original
path (duplicated_partition_path).
**Input**
@@ -1666,12 +1666,12 @@ Repair deduplicate records for a hudi table.
**Example**
```
-call repair_deduplicate(table => 'test_hudi_table', duplicated_partition_path
=> 'dt=2021-05-03', repaired_output_path => 'dt=2021-05-04');
+call repair_deduplicate(table => 'test_hudi_table', duplicated_partition_path
=> 'dt=2021-05-03', repaired_output_path => '/tmp/repair_path/');
```
| result |
|----------------------------------------------|
-| Reduplicated files placed in: dt=2021-05-04. |
+| Reduplicated files placed in: /tmp/repair_path/. |
### repair_migrate_partition_meta
diff --git a/website/versioned_docs/version-0.14.0/procedures.md
b/website/versioned_docs/version-0.14.0/procedures.md
index 21d0ab901e1..ec8dea5ae56 100644
--- a/website/versioned_docs/version-0.14.0/procedures.md
+++ b/website/versioned_docs/version-0.14.0/procedures.md
@@ -1693,7 +1693,7 @@ call repair_corrupted_clean_files(table =>
'test_hudi_table');
### repair_deduplicate
-Repair deduplicate records for a hudi table.
+Repair deduplicate records for a hudi table. The job dedupliates the data in
the duplicated_partition_path and writes it into repaired_output_path. In the
end of the job, the data in repaired_output_path is copied into the original
path (duplicated_partition_path).
**Input**
@@ -1714,12 +1714,12 @@ Repair deduplicate records for a hudi table.
**Example**
```
-call repair_deduplicate(table => 'test_hudi_table', duplicated_partition_path
=> 'dt=2021-05-03', repaired_output_path => 'dt=2021-05-04');
+call repair_deduplicate(table => 'test_hudi_table', duplicated_partition_path
=> 'dt=2021-05-03', repaired_output_path => '/tmp/repair_path/');
```
| result |
|----------------------------------------------|
-| Reduplicated files placed in: dt=2021-05-04. |
+| Reduplicated files placed in: /tmp/repair_path/. |
### repair_migrate_partition_meta
diff --git a/website/versioned_docs/version-0.14.1/procedures.md
b/website/versioned_docs/version-0.14.1/procedures.md
index 80bbb23a5b5..c913db17de2 100644
--- a/website/versioned_docs/version-0.14.1/procedures.md
+++ b/website/versioned_docs/version-0.14.1/procedures.md
@@ -1753,7 +1753,7 @@ call repair_corrupted_clean_files(table =>
'test_hudi_table');
### repair_deduplicate
-Repair deduplicate records for a hudi table.
+Repair deduplicate records for a hudi table. The job dedupliates the data in
the duplicated_partition_path and writes it into repaired_output_path. In the
end of the job, the data in repaired_output_path is copied into the original
path (duplicated_partition_path).
**Input**
@@ -1774,12 +1774,12 @@ Repair deduplicate records for a hudi table.
**Example**
```
-call repair_deduplicate(table => 'test_hudi_table', duplicated_partition_path
=> 'dt=2021-05-03', repaired_output_path => 'dt=2021-05-04');
+call repair_deduplicate(table => 'test_hudi_table', duplicated_partition_path
=> 'dt=2021-05-03', repaired_output_path => '/tmp/repair_path/');
```
| result |
|----------------------------------------------|
-| Reduplicated files placed in: dt=2021-05-04. |
+| Reduplicated files placed in: /tmp/repair_path/. |
### repair_migrate_partition_meta