(hudi) branch asf-site updated: [DOCS] Add missing parallelism into snapshot_exporter.md (#11507)

danny0405 Wed, 26 Jun 2024 02:15:10 -0700

This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new d921931facb [DOCS] Add missing parallelism into snapshot_exporter.md 
(#11507)
d921931facb is described below

commit d921931facbdeb861b599ab9e18284a328da16c1
Author: Gatsby Lee <[email protected]>
AuthorDate: Wed Jun 26 02:15:01 2024 -0700

    [DOCS] Add missing parallelism into snapshot_exporter.md (#11507)
---
 website/versioned_docs/version-0.13.0/snapshot_exporter.md | 13 +++++++------
 website/versioned_docs/version-0.13.1/snapshot_exporter.md | 13 +++++++------
 website/versioned_docs/version-0.14.0/snapshot_exporter.md | 13 +++++++------
 website/versioned_docs/version-0.14.1/snapshot_exporter.md | 13 +++++++------
 website/versioned_docs/version-0.15.0/snapshot_exporter.md | 13 +++++++------
 5 files changed, 35 insertions(+), 30 deletions(-)

diff --git a/website/versioned_docs/version-0.13.0/snapshot_exporter.md 
b/website/versioned_docs/version-0.13.0/snapshot_exporter.md
index 168f3a81543..fad6c217a39 100644
--- a/website/versioned_docs/version-0.13.0/snapshot_exporter.md
+++ b/website/versioned_docs/version-0.13.0/snapshot_exporter.md
@@ -5,12 +5,12 @@ toc: true
 ---
 
 ## Introduction
-HoodieSnapshotExporter allows you to copy data from one location to another 
for backups or other purposes. 
-You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to 
copying data, you can also repartition data 
+HoodieSnapshotExporter allows you to copy data from one location to another 
for backups or other purposes.
+You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to 
copying data, you can also repartition data
 with a provided field or implement custom repartitioning by extending a class 
shown in detail below.
 
 ## Arguments
-HoodieSnapshotExporter accepts a reference to a source path and a destination 
path. The utility will issue a 
+HoodieSnapshotExporter accepts a reference to a source path and a destination 
path. The utility will issue a
 query, perform any repartitioning if required and will write the data as Hudi, 
parquet, or json format.
 
 |Argument|Description|Required|Note|
@@ -20,6 +20,7 @@ query, perform any repartitioning if required and will write 
the data as Hudi, p
 |--output-format|Output format for the exported dataset; accept these values: 
json,parquet,hudi|required||
 |--output-partition-field|A field to be used by Spark 
repartitioning|optional|Ignored when "Hudi" or when --output-partitioner is 
specified.The output dataset's default partition field will inherent from the 
source Hudi dataset.|
 |--output-partitioner|A class to facilitate custom 
repartitioning|optional|Ignored when using output-format "Hudi"|
+|--parallelism|Parallelism for file listing|optional||
 
 ## Examples
 
@@ -63,7 +64,7 @@ spark-submit \
   --jars "/opt/hudi-spark-bundle_2.12-0.13.0.jar" \
   --deploy-mode "client" \
   --class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
-      /opt/hudi-utilities-bundle_2.12-0.13.0.jar \  
+      /opt/hudi-utilities-bundle_2.12-0.13.0.jar \
   --source-base-path "/tmp/" \
   --target-output-path "/tmp/exported/json/" \
   --output-format "json" \
@@ -77,7 +78,7 @@ The output directory will look like this
 ```
 
 ### Custom Re-partitioning
-`--output-partitioner` parameter takes in a fully-qualified name of a class 
that implements `HoodieSnapshotExporter.Partitioner`. 
+`--output-partitioner` parameter takes in a fully-qualified name of a class 
that implements `HoodieSnapshotExporter.Partitioner`.
 This parameter takes higher precedence than `--output-partition-field`, which 
will be ignored if this is provided.
 
 An example implementation is shown below:
@@ -88,7 +89,7 @@ package com.foo.bar;
 public class MyPartitioner implements HoodieSnapshotExporter.Partitioner {
 
   private static final String PARTITION_NAME = "date";
- 
+
   @Override
   public DataFrameWriter<Row> partition(Dataset<Row> source) {
     // use the current hoodie partition path as the output partition
diff --git a/website/versioned_docs/version-0.13.1/snapshot_exporter.md 
b/website/versioned_docs/version-0.13.1/snapshot_exporter.md
index 4fc431fbdfd..8dbf13e8e31 100644
--- a/website/versioned_docs/version-0.13.1/snapshot_exporter.md
+++ b/website/versioned_docs/version-0.13.1/snapshot_exporter.md
@@ -5,12 +5,12 @@ toc: true
 ---
 
 ## Introduction
-HoodieSnapshotExporter allows you to copy data from one location to another 
for backups or other purposes. 
-You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to 
copying data, you can also repartition data 
+HoodieSnapshotExporter allows you to copy data from one location to another 
for backups or other purposes.
+You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to 
copying data, you can also repartition data
 with a provided field or implement custom repartitioning by extending a class 
shown in detail below.
 
 ## Arguments
-HoodieSnapshotExporter accepts a reference to a source path and a destination 
path. The utility will issue a 
+HoodieSnapshotExporter accepts a reference to a source path and a destination 
path. The utility will issue a
 query, perform any repartitioning if required and will write the data as Hudi, 
parquet, or json format.
 
 |Argument|Description|Required|Note|
@@ -20,6 +20,7 @@ query, perform any repartitioning if required and will write 
the data as Hudi, p
 |--output-format|Output format for the exported dataset; accept these values: 
json,parquet,hudi|required||
 |--output-partition-field|A field to be used by Spark 
repartitioning|optional|Ignored when "Hudi" or when --output-partitioner is 
specified.The output dataset's default partition field will inherent from the 
source Hudi dataset.|
 |--output-partitioner|A class to facilitate custom 
repartitioning|optional|Ignored when using output-format "Hudi"|
+|--parallelism|Parallelism for file listing|optional||
 
 ## Examples
 
@@ -63,7 +64,7 @@ spark-submit \
   --jars 
"packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-0.13.1.jar" \
   --deploy-mode "client" \
   --class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
-      
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.13.1.jar \  
+      
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.13.1.jar \
   --source-base-path "/tmp/" \
   --target-output-path "/tmp/exported/json/" \
   --output-format "json" \
@@ -77,7 +78,7 @@ The output directory will look like this
 ```
 
 ### Custom Re-partitioning
-`--output-partitioner` parameter takes in a fully-qualified name of a class 
that implements `HoodieSnapshotExporter.Partitioner`. 
+`--output-partitioner` parameter takes in a fully-qualified name of a class 
that implements `HoodieSnapshotExporter.Partitioner`.
 This parameter takes higher precedence than `--output-partition-field`, which 
will be ignored if this is provided.
 
 An example implementation is shown below:
@@ -88,7 +89,7 @@ package com.foo.bar;
 public class MyPartitioner implements HoodieSnapshotExporter.Partitioner {
 
   private static final String PARTITION_NAME = "date";
- 
+
   @Override
   public DataFrameWriter<Row> partition(Dataset<Row> source) {
     // use the current hoodie partition path as the output partition
diff --git a/website/versioned_docs/version-0.14.0/snapshot_exporter.md 
b/website/versioned_docs/version-0.14.0/snapshot_exporter.md
index c85849454ea..49522b5464d 100644
--- a/website/versioned_docs/version-0.14.0/snapshot_exporter.md
+++ b/website/versioned_docs/version-0.14.0/snapshot_exporter.md
@@ -5,12 +5,12 @@ toc: true
 ---
 
 ## Introduction
-HoodieSnapshotExporter allows you to copy data from one location to another 
for backups or other purposes. 
-You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to 
copying data, you can also repartition data 
+HoodieSnapshotExporter allows you to copy data from one location to another 
for backups or other purposes.
+You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to 
copying data, you can also repartition data
 with a provided field or implement custom repartitioning by extending a class 
shown in detail below.
 
 ## Arguments
-HoodieSnapshotExporter accepts a reference to a source path and a destination 
path. The utility will issue a 
+HoodieSnapshotExporter accepts a reference to a source path and a destination 
path. The utility will issue a
 query, perform any repartitioning if required and will write the data as Hudi, 
parquet, or json format.
 
 |Argument|Description|Required|Note|
@@ -20,6 +20,7 @@ query, perform any repartitioning if required and will write 
the data as Hudi, p
 |--output-format|Output format for the exported dataset; accept these values: 
json,parquet,hudi|required||
 |--output-partition-field|A field to be used by Spark 
repartitioning|optional|Ignored when "Hudi" or when --output-partitioner is 
specified.The output dataset's default partition field will inherent from the 
source Hudi dataset.|
 |--output-partitioner|A class to facilitate custom 
repartitioning|optional|Ignored when using output-format "Hudi"|
+|--parallelism|Parallelism for file listing|optional||
 
 ## Examples
 
@@ -63,7 +64,7 @@ spark-submit \
   --jars 
"packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-0.14.0-SNAPSHOT.jar" 
\
   --deploy-mode "client" \
   --class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
-      
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.14.0-SNAPSHOT.jar
 \  
+      
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.14.0-SNAPSHOT.jar
 \
   --source-base-path "/tmp/" \
   --target-output-path "/tmp/exported/json/" \
   --output-format "json" \
@@ -77,7 +78,7 @@ The output directory will look like this
 ```
 
 ### Custom Re-partitioning
-`--output-partitioner` parameter takes in a fully-qualified name of a class 
that implements `HoodieSnapshotExporter.Partitioner`. 
+`--output-partitioner` parameter takes in a fully-qualified name of a class 
that implements `HoodieSnapshotExporter.Partitioner`.
 This parameter takes higher precedence than `--output-partition-field`, which 
will be ignored if this is provided.
 
 An example implementation is shown below:
@@ -88,7 +89,7 @@ package com.foo.bar;
 public class MyPartitioner implements HoodieSnapshotExporter.Partitioner {
 
   private static final String PARTITION_NAME = "date";
- 
+
   @Override
   public DataFrameWriter<Row> partition(Dataset<Row> source) {
     // use the current hoodie partition path as the output partition
diff --git a/website/versioned_docs/version-0.14.1/snapshot_exporter.md 
b/website/versioned_docs/version-0.14.1/snapshot_exporter.md
index c85849454ea..49522b5464d 100644
--- a/website/versioned_docs/version-0.14.1/snapshot_exporter.md
+++ b/website/versioned_docs/version-0.14.1/snapshot_exporter.md
@@ -5,12 +5,12 @@ toc: true
 ---
 
 ## Introduction
-HoodieSnapshotExporter allows you to copy data from one location to another 
for backups or other purposes. 
-You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to 
copying data, you can also repartition data 
+HoodieSnapshotExporter allows you to copy data from one location to another 
for backups or other purposes.
+You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to 
copying data, you can also repartition data
 with a provided field or implement custom repartitioning by extending a class 
shown in detail below.
 
 ## Arguments
-HoodieSnapshotExporter accepts a reference to a source path and a destination 
path. The utility will issue a 
+HoodieSnapshotExporter accepts a reference to a source path and a destination 
path. The utility will issue a
 query, perform any repartitioning if required and will write the data as Hudi, 
parquet, or json format.
 
 |Argument|Description|Required|Note|
@@ -20,6 +20,7 @@ query, perform any repartitioning if required and will write 
the data as Hudi, p
 |--output-format|Output format for the exported dataset; accept these values: 
json,parquet,hudi|required||
 |--output-partition-field|A field to be used by Spark 
repartitioning|optional|Ignored when "Hudi" or when --output-partitioner is 
specified.The output dataset's default partition field will inherent from the 
source Hudi dataset.|
 |--output-partitioner|A class to facilitate custom 
repartitioning|optional|Ignored when using output-format "Hudi"|
+|--parallelism|Parallelism for file listing|optional||
 
 ## Examples
 
@@ -63,7 +64,7 @@ spark-submit \
   --jars 
"packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-0.14.0-SNAPSHOT.jar" 
\
   --deploy-mode "client" \
   --class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
-      
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.14.0-SNAPSHOT.jar
 \  
+      
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.14.0-SNAPSHOT.jar
 \
   --source-base-path "/tmp/" \
   --target-output-path "/tmp/exported/json/" \
   --output-format "json" \
@@ -77,7 +78,7 @@ The output directory will look like this
 ```
 
 ### Custom Re-partitioning
-`--output-partitioner` parameter takes in a fully-qualified name of a class 
that implements `HoodieSnapshotExporter.Partitioner`. 
+`--output-partitioner` parameter takes in a fully-qualified name of a class 
that implements `HoodieSnapshotExporter.Partitioner`.
 This parameter takes higher precedence than `--output-partition-field`, which 
will be ignored if this is provided.
 
 An example implementation is shown below:
@@ -88,7 +89,7 @@ package com.foo.bar;
 public class MyPartitioner implements HoodieSnapshotExporter.Partitioner {
 
   private static final String PARTITION_NAME = "date";
- 
+
   @Override
   public DataFrameWriter<Row> partition(Dataset<Row> source) {
     // use the current hoodie partition path as the output partition
diff --git a/website/versioned_docs/version-0.15.0/snapshot_exporter.md 
b/website/versioned_docs/version-0.15.0/snapshot_exporter.md
index aee29e3c1cc..8ff815e6f5d 100644
--- a/website/versioned_docs/version-0.15.0/snapshot_exporter.md
+++ b/website/versioned_docs/version-0.15.0/snapshot_exporter.md
@@ -5,12 +5,12 @@ toc: true
 ---
 
 ## Introduction
-HoodieSnapshotExporter allows you to copy data from one location to another 
for backups or other purposes. 
-You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to 
copying data, you can also repartition data 
+HoodieSnapshotExporter allows you to copy data from one location to another 
for backups or other purposes.
+You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to 
copying data, you can also repartition data
 with a provided field or implement custom repartitioning by extending a class 
shown in detail below.
 
 ## Arguments
-HoodieSnapshotExporter accepts a reference to a source path and a destination 
path. The utility will issue a 
+HoodieSnapshotExporter accepts a reference to a source path and a destination 
path. The utility will issue a
 query, perform any repartitioning if required and will write the data as Hudi, 
parquet, or json format.
 
 |Argument|Description|Required|Note|
@@ -20,6 +20,7 @@ query, perform any repartitioning if required and will write 
the data as Hudi, p
 |--output-format|Output format for the exported dataset; accept these values: 
json,parquet,hudi|required||
 |--output-partition-field|A field to be used by Spark 
repartitioning|optional|Ignored when "Hudi" or when --output-partitioner is 
specified.The output dataset's default partition field will inherent from the 
source Hudi dataset.|
 |--output-partitioner|A class to facilitate custom 
repartitioning|optional|Ignored when using output-format "Hudi"|
+|--parallelism|Parallelism for file listing|optional||
 
 ## Examples
 
@@ -63,7 +64,7 @@ spark-submit \
   --jars 
"packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-0.15.0.jar" \
   --deploy-mode "client" \
   --class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
-      
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.15.0.jar \  
+      
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.15.0.jar \
   --source-base-path "/tmp/" \
   --target-output-path "/tmp/exported/json/" \
   --output-format "json" \
@@ -77,7 +78,7 @@ The output directory will look like this
 ```
 
 ### Custom Re-partitioning
-`--output-partitioner` parameter takes in a fully-qualified name of a class 
that implements `HoodieSnapshotExporter.Partitioner`. 
+`--output-partitioner` parameter takes in a fully-qualified name of a class 
that implements `HoodieSnapshotExporter.Partitioner`.
 This parameter takes higher precedence than `--output-partition-field`, which 
will be ignored if this is provided.
 
 An example implementation is shown below:
@@ -88,7 +89,7 @@ package com.foo.bar;
 public class MyPartitioner implements HoodieSnapshotExporter.Partitioner {
 
   private static final String PARTITION_NAME = "date";
- 
+
   @Override
   public DataFrameWriter<Row> partition(Dataset<Row> source) {
     // use the current hoodie partition path as the output partition

(hudi) branch asf-site updated: [DOCS] Add missing parallelism into snapshot_exporter.md (#11507)

Reply via email to