This is an automated email from the ASF dual-hosted git repository.
danny0405 pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new d921931facb [DOCS] Add missing parallelism into snapshot_exporter.md
(#11507)
d921931facb is described below
commit d921931facbdeb861b599ab9e18284a328da16c1
Author: Gatsby Lee <[email protected]>
AuthorDate: Wed Jun 26 02:15:01 2024 -0700
[DOCS] Add missing parallelism into snapshot_exporter.md (#11507)
---
website/versioned_docs/version-0.13.0/snapshot_exporter.md | 13 +++++++------
website/versioned_docs/version-0.13.1/snapshot_exporter.md | 13 +++++++------
website/versioned_docs/version-0.14.0/snapshot_exporter.md | 13 +++++++------
website/versioned_docs/version-0.14.1/snapshot_exporter.md | 13 +++++++------
website/versioned_docs/version-0.15.0/snapshot_exporter.md | 13 +++++++------
5 files changed, 35 insertions(+), 30 deletions(-)
diff --git a/website/versioned_docs/version-0.13.0/snapshot_exporter.md
b/website/versioned_docs/version-0.13.0/snapshot_exporter.md
index 168f3a81543..fad6c217a39 100644
--- a/website/versioned_docs/version-0.13.0/snapshot_exporter.md
+++ b/website/versioned_docs/version-0.13.0/snapshot_exporter.md
@@ -5,12 +5,12 @@ toc: true
---
## Introduction
-HoodieSnapshotExporter allows you to copy data from one location to another
for backups or other purposes.
-You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to
copying data, you can also repartition data
+HoodieSnapshotExporter allows you to copy data from one location to another
for backups or other purposes.
+You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to
copying data, you can also repartition data
with a provided field or implement custom repartitioning by extending a class
shown in detail below.
## Arguments
-HoodieSnapshotExporter accepts a reference to a source path and a destination
path. The utility will issue a
+HoodieSnapshotExporter accepts a reference to a source path and a destination
path. The utility will issue a
query, perform any repartitioning if required and will write the data as Hudi,
parquet, or json format.
|Argument|Description|Required|Note|
@@ -20,6 +20,7 @@ query, perform any repartitioning if required and will write
the data as Hudi, p
|--output-format|Output format for the exported dataset; accept these values:
json,parquet,hudi|required||
|--output-partition-field|A field to be used by Spark
repartitioning|optional|Ignored when "Hudi" or when --output-partitioner is
specified.The output dataset's default partition field will inherent from the
source Hudi dataset.|
|--output-partitioner|A class to facilitate custom
repartitioning|optional|Ignored when using output-format "Hudi"|
+|--parallelism|Parallelism for file listing|optional||
## Examples
@@ -63,7 +64,7 @@ spark-submit \
--jars "/opt/hudi-spark-bundle_2.12-0.13.0.jar" \
--deploy-mode "client" \
--class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
- /opt/hudi-utilities-bundle_2.12-0.13.0.jar \
+ /opt/hudi-utilities-bundle_2.12-0.13.0.jar \
--source-base-path "/tmp/" \
--target-output-path "/tmp/exported/json/" \
--output-format "json" \
@@ -77,7 +78,7 @@ The output directory will look like this
```
### Custom Re-partitioning
-`--output-partitioner` parameter takes in a fully-qualified name of a class
that implements `HoodieSnapshotExporter.Partitioner`.
+`--output-partitioner` parameter takes in a fully-qualified name of a class
that implements `HoodieSnapshotExporter.Partitioner`.
This parameter takes higher precedence than `--output-partition-field`, which
will be ignored if this is provided.
An example implementation is shown below:
@@ -88,7 +89,7 @@ package com.foo.bar;
public class MyPartitioner implements HoodieSnapshotExporter.Partitioner {
private static final String PARTITION_NAME = "date";
-
+
@Override
public DataFrameWriter<Row> partition(Dataset<Row> source) {
// use the current hoodie partition path as the output partition
diff --git a/website/versioned_docs/version-0.13.1/snapshot_exporter.md
b/website/versioned_docs/version-0.13.1/snapshot_exporter.md
index 4fc431fbdfd..8dbf13e8e31 100644
--- a/website/versioned_docs/version-0.13.1/snapshot_exporter.md
+++ b/website/versioned_docs/version-0.13.1/snapshot_exporter.md
@@ -5,12 +5,12 @@ toc: true
---
## Introduction
-HoodieSnapshotExporter allows you to copy data from one location to another
for backups or other purposes.
-You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to
copying data, you can also repartition data
+HoodieSnapshotExporter allows you to copy data from one location to another
for backups or other purposes.
+You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to
copying data, you can also repartition data
with a provided field or implement custom repartitioning by extending a class
shown in detail below.
## Arguments
-HoodieSnapshotExporter accepts a reference to a source path and a destination
path. The utility will issue a
+HoodieSnapshotExporter accepts a reference to a source path and a destination
path. The utility will issue a
query, perform any repartitioning if required and will write the data as Hudi,
parquet, or json format.
|Argument|Description|Required|Note|
@@ -20,6 +20,7 @@ query, perform any repartitioning if required and will write
the data as Hudi, p
|--output-format|Output format for the exported dataset; accept these values:
json,parquet,hudi|required||
|--output-partition-field|A field to be used by Spark
repartitioning|optional|Ignored when "Hudi" or when --output-partitioner is
specified.The output dataset's default partition field will inherent from the
source Hudi dataset.|
|--output-partitioner|A class to facilitate custom
repartitioning|optional|Ignored when using output-format "Hudi"|
+|--parallelism|Parallelism for file listing|optional||
## Examples
@@ -63,7 +64,7 @@ spark-submit \
--jars
"packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-0.13.1.jar" \
--deploy-mode "client" \
--class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
-
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.13.1.jar \
+
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.13.1.jar \
--source-base-path "/tmp/" \
--target-output-path "/tmp/exported/json/" \
--output-format "json" \
@@ -77,7 +78,7 @@ The output directory will look like this
```
### Custom Re-partitioning
-`--output-partitioner` parameter takes in a fully-qualified name of a class
that implements `HoodieSnapshotExporter.Partitioner`.
+`--output-partitioner` parameter takes in a fully-qualified name of a class
that implements `HoodieSnapshotExporter.Partitioner`.
This parameter takes higher precedence than `--output-partition-field`, which
will be ignored if this is provided.
An example implementation is shown below:
@@ -88,7 +89,7 @@ package com.foo.bar;
public class MyPartitioner implements HoodieSnapshotExporter.Partitioner {
private static final String PARTITION_NAME = "date";
-
+
@Override
public DataFrameWriter<Row> partition(Dataset<Row> source) {
// use the current hoodie partition path as the output partition
diff --git a/website/versioned_docs/version-0.14.0/snapshot_exporter.md
b/website/versioned_docs/version-0.14.0/snapshot_exporter.md
index c85849454ea..49522b5464d 100644
--- a/website/versioned_docs/version-0.14.0/snapshot_exporter.md
+++ b/website/versioned_docs/version-0.14.0/snapshot_exporter.md
@@ -5,12 +5,12 @@ toc: true
---
## Introduction
-HoodieSnapshotExporter allows you to copy data from one location to another
for backups or other purposes.
-You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to
copying data, you can also repartition data
+HoodieSnapshotExporter allows you to copy data from one location to another
for backups or other purposes.
+You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to
copying data, you can also repartition data
with a provided field or implement custom repartitioning by extending a class
shown in detail below.
## Arguments
-HoodieSnapshotExporter accepts a reference to a source path and a destination
path. The utility will issue a
+HoodieSnapshotExporter accepts a reference to a source path and a destination
path. The utility will issue a
query, perform any repartitioning if required and will write the data as Hudi,
parquet, or json format.
|Argument|Description|Required|Note|
@@ -20,6 +20,7 @@ query, perform any repartitioning if required and will write
the data as Hudi, p
|--output-format|Output format for the exported dataset; accept these values:
json,parquet,hudi|required||
|--output-partition-field|A field to be used by Spark
repartitioning|optional|Ignored when "Hudi" or when --output-partitioner is
specified.The output dataset's default partition field will inherent from the
source Hudi dataset.|
|--output-partitioner|A class to facilitate custom
repartitioning|optional|Ignored when using output-format "Hudi"|
+|--parallelism|Parallelism for file listing|optional||
## Examples
@@ -63,7 +64,7 @@ spark-submit \
--jars
"packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-0.14.0-SNAPSHOT.jar"
\
--deploy-mode "client" \
--class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
-
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.14.0-SNAPSHOT.jar
\
+
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.14.0-SNAPSHOT.jar
\
--source-base-path "/tmp/" \
--target-output-path "/tmp/exported/json/" \
--output-format "json" \
@@ -77,7 +78,7 @@ The output directory will look like this
```
### Custom Re-partitioning
-`--output-partitioner` parameter takes in a fully-qualified name of a class
that implements `HoodieSnapshotExporter.Partitioner`.
+`--output-partitioner` parameter takes in a fully-qualified name of a class
that implements `HoodieSnapshotExporter.Partitioner`.
This parameter takes higher precedence than `--output-partition-field`, which
will be ignored if this is provided.
An example implementation is shown below:
@@ -88,7 +89,7 @@ package com.foo.bar;
public class MyPartitioner implements HoodieSnapshotExporter.Partitioner {
private static final String PARTITION_NAME = "date";
-
+
@Override
public DataFrameWriter<Row> partition(Dataset<Row> source) {
// use the current hoodie partition path as the output partition
diff --git a/website/versioned_docs/version-0.14.1/snapshot_exporter.md
b/website/versioned_docs/version-0.14.1/snapshot_exporter.md
index c85849454ea..49522b5464d 100644
--- a/website/versioned_docs/version-0.14.1/snapshot_exporter.md
+++ b/website/versioned_docs/version-0.14.1/snapshot_exporter.md
@@ -5,12 +5,12 @@ toc: true
---
## Introduction
-HoodieSnapshotExporter allows you to copy data from one location to another
for backups or other purposes.
-You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to
copying data, you can also repartition data
+HoodieSnapshotExporter allows you to copy data from one location to another
for backups or other purposes.
+You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to
copying data, you can also repartition data
with a provided field or implement custom repartitioning by extending a class
shown in detail below.
## Arguments
-HoodieSnapshotExporter accepts a reference to a source path and a destination
path. The utility will issue a
+HoodieSnapshotExporter accepts a reference to a source path and a destination
path. The utility will issue a
query, perform any repartitioning if required and will write the data as Hudi,
parquet, or json format.
|Argument|Description|Required|Note|
@@ -20,6 +20,7 @@ query, perform any repartitioning if required and will write
the data as Hudi, p
|--output-format|Output format for the exported dataset; accept these values:
json,parquet,hudi|required||
|--output-partition-field|A field to be used by Spark
repartitioning|optional|Ignored when "Hudi" or when --output-partitioner is
specified.The output dataset's default partition field will inherent from the
source Hudi dataset.|
|--output-partitioner|A class to facilitate custom
repartitioning|optional|Ignored when using output-format "Hudi"|
+|--parallelism|Parallelism for file listing|optional||
## Examples
@@ -63,7 +64,7 @@ spark-submit \
--jars
"packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-0.14.0-SNAPSHOT.jar"
\
--deploy-mode "client" \
--class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
-
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.14.0-SNAPSHOT.jar
\
+
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.14.0-SNAPSHOT.jar
\
--source-base-path "/tmp/" \
--target-output-path "/tmp/exported/json/" \
--output-format "json" \
@@ -77,7 +78,7 @@ The output directory will look like this
```
### Custom Re-partitioning
-`--output-partitioner` parameter takes in a fully-qualified name of a class
that implements `HoodieSnapshotExporter.Partitioner`.
+`--output-partitioner` parameter takes in a fully-qualified name of a class
that implements `HoodieSnapshotExporter.Partitioner`.
This parameter takes higher precedence than `--output-partition-field`, which
will be ignored if this is provided.
An example implementation is shown below:
@@ -88,7 +89,7 @@ package com.foo.bar;
public class MyPartitioner implements HoodieSnapshotExporter.Partitioner {
private static final String PARTITION_NAME = "date";
-
+
@Override
public DataFrameWriter<Row> partition(Dataset<Row> source) {
// use the current hoodie partition path as the output partition
diff --git a/website/versioned_docs/version-0.15.0/snapshot_exporter.md
b/website/versioned_docs/version-0.15.0/snapshot_exporter.md
index aee29e3c1cc..8ff815e6f5d 100644
--- a/website/versioned_docs/version-0.15.0/snapshot_exporter.md
+++ b/website/versioned_docs/version-0.15.0/snapshot_exporter.md
@@ -5,12 +5,12 @@ toc: true
---
## Introduction
-HoodieSnapshotExporter allows you to copy data from one location to another
for backups or other purposes.
-You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to
copying data, you can also repartition data
+HoodieSnapshotExporter allows you to copy data from one location to another
for backups or other purposes.
+You can write data as Hudi, Json, Orc, or Parquet file formats. In addition to
copying data, you can also repartition data
with a provided field or implement custom repartitioning by extending a class
shown in detail below.
## Arguments
-HoodieSnapshotExporter accepts a reference to a source path and a destination
path. The utility will issue a
+HoodieSnapshotExporter accepts a reference to a source path and a destination
path. The utility will issue a
query, perform any repartitioning if required and will write the data as Hudi,
parquet, or json format.
|Argument|Description|Required|Note|
@@ -20,6 +20,7 @@ query, perform any repartitioning if required and will write
the data as Hudi, p
|--output-format|Output format for the exported dataset; accept these values:
json,parquet,hudi|required||
|--output-partition-field|A field to be used by Spark
repartitioning|optional|Ignored when "Hudi" or when --output-partitioner is
specified.The output dataset's default partition field will inherent from the
source Hudi dataset.|
|--output-partitioner|A class to facilitate custom
repartitioning|optional|Ignored when using output-format "Hudi"|
+|--parallelism|Parallelism for file listing|optional||
## Examples
@@ -63,7 +64,7 @@ spark-submit \
--jars
"packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-0.15.0.jar" \
--deploy-mode "client" \
--class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
-
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.15.0.jar \
+
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.15.0.jar \
--source-base-path "/tmp/" \
--target-output-path "/tmp/exported/json/" \
--output-format "json" \
@@ -77,7 +78,7 @@ The output directory will look like this
```
### Custom Re-partitioning
-`--output-partitioner` parameter takes in a fully-qualified name of a class
that implements `HoodieSnapshotExporter.Partitioner`.
+`--output-partitioner` parameter takes in a fully-qualified name of a class
that implements `HoodieSnapshotExporter.Partitioner`.
This parameter takes higher precedence than `--output-partition-field`, which
will be ignored if this is provided.
An example implementation is shown below:
@@ -88,7 +89,7 @@ package com.foo.bar;
public class MyPartitioner implements HoodieSnapshotExporter.Partitioner {
private static final String PARTITION_NAME = "date";
-
+
@Override
public DataFrameWriter<Row> partition(Dataset<Row> source) {
// use the current hoodie partition path as the output partition