This is an automated email from the ASF dual-hosted git repository.
yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 9f2ea8563f [MINOR][DOCS] Update spark.yarn.driver.memoryOverhead and
spark.yarn.executor.memoryOverhead in the tuning-guide. (#5670)
9f2ea8563f is described below
commit 9f2ea8563fe71ddcdfd9b10e40841948b5f0d586
Author: liuzhuang2017 <[email protected]>
AuthorDate: Wed May 25 08:30:01 2022 +0800
[MINOR][DOCS] Update spark.yarn.driver.memoryOverhead and
spark.yarn.executor.memoryOverhead in the tuning-guide. (#5670)
---
website/docs/tuning-guide.md | 6 +++---
website/versioned_docs/version-0.10.1/tuning-guide.md | 6 +++---
website/versioned_docs/version-0.11.0/tuning-guide.md | 6 +++---
3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/website/docs/tuning-guide.md b/website/docs/tuning-guide.md
index 581778aa97..4affeafda6 100644
--- a/website/docs/tuning-guide.md
+++ b/website/docs/tuning-guide.md
@@ -13,7 +13,7 @@ Writing data via Hudi happens as a Spark job and thus general
rules of spark deb
**Input Parallelism** : By default, Hudi tends to over-partition input (i.e
`withParallelism(1500)`), to ensure each Spark partition stays within the 2GB
limit for inputs upto 500GB. Bump this up accordingly if you have larger
inputs. We recommend having shuffle parallelism
`hoodie.[insert|upsert|bulkinsert].shuffle.parallelism` such that its atleast
input_data_size/500MB
-**Off-heap memory** : Hudi writes parquet files and that needs good amount of
off-heap memory proportional to schema width. Consider setting something like
`spark.yarn.executor.memoryOverhead` or `spark.yarn.driver.memoryOverhead`, if
you are running into such failures.
+**Off-heap memory** : Hudi writes parquet files and that needs good amount of
off-heap memory proportional to schema width. Consider setting something like
`spark.executor.memoryOverhead` or `spark.driver.memoryOverhead`, if you are
running into such failures.
**Spark Memory** : Typically, hudi needs to be able to read a single file into
memory to perform merges or compactions and thus the executor memory should be
sufficient to accomodate this. In addition, Hoodie caches the input to be able
to intelligently place data and thus leaving some
`spark.memory.storageFraction` will generally help boost performance.
@@ -51,7 +51,7 @@ spark.submit.deployMode cluster
spark.task.cpus 1
spark.task.maxFailures 4
-spark.yarn.driver.memoryOverhead 1024
-spark.yarn.executor.memoryOverhead 3072
+spark.driver.memoryOverhead 1024
+spark.executor.memoryOverhead 3072
spark.yarn.max.executor.failures 100
```
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.10.1/tuning-guide.md
b/website/versioned_docs/version-0.10.1/tuning-guide.md
index 581778aa97..4affeafda6 100644
--- a/website/versioned_docs/version-0.10.1/tuning-guide.md
+++ b/website/versioned_docs/version-0.10.1/tuning-guide.md
@@ -13,7 +13,7 @@ Writing data via Hudi happens as a Spark job and thus general
rules of spark deb
**Input Parallelism** : By default, Hudi tends to over-partition input (i.e
`withParallelism(1500)`), to ensure each Spark partition stays within the 2GB
limit for inputs upto 500GB. Bump this up accordingly if you have larger
inputs. We recommend having shuffle parallelism
`hoodie.[insert|upsert|bulkinsert].shuffle.parallelism` such that its atleast
input_data_size/500MB
-**Off-heap memory** : Hudi writes parquet files and that needs good amount of
off-heap memory proportional to schema width. Consider setting something like
`spark.yarn.executor.memoryOverhead` or `spark.yarn.driver.memoryOverhead`, if
you are running into such failures.
+**Off-heap memory** : Hudi writes parquet files and that needs good amount of
off-heap memory proportional to schema width. Consider setting something like
`spark.executor.memoryOverhead` or `spark.driver.memoryOverhead`, if you are
running into such failures.
**Spark Memory** : Typically, hudi needs to be able to read a single file into
memory to perform merges or compactions and thus the executor memory should be
sufficient to accomodate this. In addition, Hoodie caches the input to be able
to intelligently place data and thus leaving some
`spark.memory.storageFraction` will generally help boost performance.
@@ -51,7 +51,7 @@ spark.submit.deployMode cluster
spark.task.cpus 1
spark.task.maxFailures 4
-spark.yarn.driver.memoryOverhead 1024
-spark.yarn.executor.memoryOverhead 3072
+spark.driver.memoryOverhead 1024
+spark.executor.memoryOverhead 3072
spark.yarn.max.executor.failures 100
```
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.11.0/tuning-guide.md
b/website/versioned_docs/version-0.11.0/tuning-guide.md
index 581778aa97..4affeafda6 100644
--- a/website/versioned_docs/version-0.11.0/tuning-guide.md
+++ b/website/versioned_docs/version-0.11.0/tuning-guide.md
@@ -13,7 +13,7 @@ Writing data via Hudi happens as a Spark job and thus general
rules of spark deb
**Input Parallelism** : By default, Hudi tends to over-partition input (i.e
`withParallelism(1500)`), to ensure each Spark partition stays within the 2GB
limit for inputs upto 500GB. Bump this up accordingly if you have larger
inputs. We recommend having shuffle parallelism
`hoodie.[insert|upsert|bulkinsert].shuffle.parallelism` such that its atleast
input_data_size/500MB
-**Off-heap memory** : Hudi writes parquet files and that needs good amount of
off-heap memory proportional to schema width. Consider setting something like
`spark.yarn.executor.memoryOverhead` or `spark.yarn.driver.memoryOverhead`, if
you are running into such failures.
+**Off-heap memory** : Hudi writes parquet files and that needs good amount of
off-heap memory proportional to schema width. Consider setting something like
`spark.executor.memoryOverhead` or `spark.driver.memoryOverhead`, if you are
running into such failures.
**Spark Memory** : Typically, hudi needs to be able to read a single file into
memory to perform merges or compactions and thus the executor memory should be
sufficient to accomodate this. In addition, Hoodie caches the input to be able
to intelligently place data and thus leaving some
`spark.memory.storageFraction` will generally help boost performance.
@@ -51,7 +51,7 @@ spark.submit.deployMode cluster
spark.task.cpus 1
spark.task.maxFailures 4
-spark.yarn.driver.memoryOverhead 1024
-spark.yarn.executor.memoryOverhead 3072
+spark.driver.memoryOverhead 1024
+spark.executor.memoryOverhead 3072
spark.yarn.max.executor.failures 100
```
\ No newline at end of file