This is an automated email from the ASF dual-hosted git repository.
nagarwal pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new a961350 [MINOR] Fix concurrency docs (#2794)
a961350 is described below
commit a961350740abf4d1637798bc287bd0b6b9800305
Author: n3nash <[email protected]>
AuthorDate: Fri Apr 9 00:48:45 2021 -0700
[MINOR] Fix concurrency docs (#2794)
---
docs/_docs/0.8.0/2_4_configurations.md | 32 ++++++++++++++++-------------
docs/_docs/0.8.0/2_9_concurrency_control.md | 6 ------
docs/_docs/2_4_configurations.md | 32 ++++++++++++++++-------------
docs/_docs/2_9_concurrency_control.md | 6 ------
4 files changed, 36 insertions(+), 40 deletions(-)
diff --git a/docs/_docs/0.8.0/2_4_configurations.md
b/docs/_docs/0.8.0/2_4_configurations.md
index 0a5a4ab..207bf80 100644
--- a/docs/_docs/0.8.0/2_4_configurations.md
+++ b/docs/_docs/0.8.0/2_4_configurations.md
@@ -469,6 +469,10 @@ Configs that control compaction (merging of log files onto
a new parquet base fi
Property: `hoodie.cleaner.policy` <br/>
<span style="color:grey"> Cleaning policy to be used. Hudi will delete older
versions of parquet files to re-claim space. Any Query/Computation referring to
this version of the file will fail. It is good to make sure that the data is
retained for more than the maximum query execution time.</span>
+#### withFailedWritesCleaningPolicy(policy =
HoodieFailedWritesCleaningPolicy.EAGER) {#withFailedWritesCleaningPolicy}
+Property: `hoodie.cleaner.policy.failed.writes` <br/>
+<span style="color:grey"> Cleaning policy for failed writes to be used. Hudi
will delete any files written by failed writes to re-claim space. Choose to
perform this rollback of failed writes `eagerly` before every writer starts
(only supported for single writer) or `lazily` by the cleaner (required for
multi-writers)</span>
+
#### retainCommits(no_of_commits_to_retain = 24) {#retainCommits}
Property: `hoodie.cleaner.commits.retained` <br/>
<span style="color:grey">Number of commits to retain. So data will be retained
for num_of_commits * time_between_commits (scheduled). This also directly
translates into how much you can incrementally pull on this table</span>
@@ -831,59 +835,59 @@ Configs that control locking mechanisms if
[WriteConcurrencyMode=optimistic_conc
[withLockConfig](#withLockConfig) (HoodieLockConfig) <br/>
#### withLockProvider(lockProvider =
org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider)
{#withLockProvider}
-Property: `hoodie.writer.lock.provider` <br/>
+Property: `hoodie.write.lock.provider` <br/>
<span style="color:grey">Lock provider class name, user can provide their own
implementation of LockProvider which should be subclass of
org.apache.hudi.common.lock.LockProvider</span>
#### withZkQuorum(zkQuorum) {#withZkQuorum}
-Property: `hoodie.writer.lock.zookeeper.url` <br/>
+Property: `hoodie.write.lock.zookeeper.url` <br/>
<span style="color:grey">Set the list of comma separated servers to connect
to</span>
#### withZkBasePath(zkBasePath) {#withZkBasePath}
-Property: `hoodie.writer.lock.zookeeper.base_path` [Required] <br/>
+Property: `hoodie.write.lock.zookeeper.base_path` [Required] <br/>
<span style="color:grey">The base path on Zookeeper under which to create a
ZNode to acquire the lock. This should be common for all jobs writing to the
same table</span>
#### withZkPort(zkPort) {#withZkPort}
-Property: `hoodie.writer.lock.zookeeper.port` [Required] <br/>
+Property: `hoodie.write.lock.zookeeper.port` [Required] <br/>
<span style="color:grey">The connection port to be used for Zookeeper</span>
#### withZkLockKey(zkLockKey) {#withZkLockKey}
-Property: `hoodie.writer.lock.zookeeper.lock_key` [Required] <br/>
+Property: `hoodie.write.lock.zookeeper.lock_key` [Required] <br/>
<span style="color:grey">Key name under base_path at which to create a ZNode
and acquire lock. Final path on zk will look like base_path/lock_key. We
recommend setting this to the table name</span>
#### withZkConnectionTimeoutInMs(connectionTimeoutInMs = 15000)
{#withZkConnectionTimeoutInMs}
-Property: `hoodie.writer.lock.zookeeper.connection_timeout_ms` <br/>
+Property: `hoodie.write.lock.zookeeper.connection_timeout_ms` <br/>
<span style="color:grey">How long to wait when connecting to ZooKeeper before
considering the connection a failure</span>
#### withZkSessionTimeoutInMs(sessionTimeoutInMs = 60000)
{#withZkSessionTimeoutInMs}
-Property: `hoodie.writer.lock.zookeeper.session_timeout_ms` <br/>
+Property: `hoodie.write.lock.zookeeper.session_timeout_ms` <br/>
<span style="color:grey">How long to wait after losing a connection to
ZooKeeper before the session is expired</span>
#### withNumRetries(num_retries = 3) {#withNumRetries}
-Property: `hoodie.writer.lock.num_retries` <br/>
+Property: `hoodie.write.lock.num_retries` <br/>
<span style="color:grey">Maximum number of times to retry by lock provider
client</span>
#### withRetryWaitTimeInMillis(retryWaitTimeInMillis = 5000)
{#withRetryWaitTimeInMillis}
-Property: `hoodie.writer.lock.wait_time_ms_between_retry` <br/>
+Property: `hoodie.write.lock.wait_time_ms_between_retry` <br/>
<span style="color:grey">Initial amount of time to wait between retries by
lock provider client</span>
#### withHiveDatabaseName(hiveDatabaseName) {#withHiveDatabaseName}
-Property: `hoodie.writer.lock.hivemetastore.database` [Required] <br/>
+Property: `hoodie.write.lock.hivemetastore.database` [Required] <br/>
<span style="color:grey">The Hive database to acquire lock against</span>
#### withHiveTableName(hiveTableName) {#withHiveTableName}
-Property: `hoodie.writer.lock.hivemetastore.table` [Required] <br/>
+Property: `hoodie.write.lock.hivemetastore.table` [Required] <br/>
<span style="color:grey">The Hive table under the hive database to acquire
lock against</span>
#### withClientNumRetries(clientNumRetries = 0) {#withClientNumRetries}
-Property: `hoodie.writer.lock.client.num_retries` <br/>
+Property: `hoodie.write.lock.client.num_retries` <br/>
<span style="color:grey">Maximum number of times to retry to acquire lock
additionally from the hudi client</span>
#### withRetryWaitTimeInMillis(retryWaitTimeInMillis = 10000)
{#withRetryWaitTimeInMillis}
-Property: `hoodie.writer.lock.client.wait_time_ms_between_retry` <br/>
+Property: `hoodie.write.lock.client.wait_time_ms_between_retry` <br/>
<span style="color:grey">Amount of time to wait between retries from the hudi
client</span>
#### withConflictResolutionStrategy(lockProvider =
org.apache.hudi.client.transaction.SimpleConcurrentFileWritesConflictResolutionStrategy)
{#withConflictResolutionStrategy}
-Property: `hoodie.writer.lock.conflict.resolution.strategy` <br/>
+Property: `hoodie.write.lock.conflict.resolution.strategy` <br/>
<span style="color:grey">Lock provider class name, this should be subclass of
org.apache.hudi.client.transaction.ConflictResolutionStrategy</span>
diff --git a/docs/_docs/0.8.0/2_9_concurrency_control.md
b/docs/_docs/0.8.0/2_9_concurrency_control.md
index 563da9b..1aab8e4 100644
--- a/docs/_docs/0.8.0/2_9_concurrency_control.md
+++ b/docs/_docs/0.8.0/2_9_concurrency_control.md
@@ -57,8 +57,6 @@ There are 2 different server based lock providers that
require different configu
hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
hoodie.write.lock.zookeeper.url
hoodie.write.lock.zookeeper.port
-hoodie.write.lock.wait_time_ms
-hoodie.write.lock.num_retries
hoodie.write.lock.zookeeper.lock_key
hoodie.write.lock.zookeeper.base_path
```
@@ -69,8 +67,6 @@ hoodie.write.lock.zookeeper.base_path
hoodie.write.lock.provider=org.apache.hudi.hive.HiveMetastoreBasedLockProvider
hoodie.write.lock.hivemetastore.database
hoodie.write.lock.hivemetastore.table
-hoodie.write.lock.wait_time_ms
-hoodie.write.lock.num_retries
```
`The HiveMetastore URI's are picked up from the hadoop configuration file
loaded during runtime.`
@@ -89,8 +85,6 @@ inputDF.write.format("hudi")
.option("hoodie.write.concurrency.mode",
"optimistic_concurrency_control")
.option("hoodie.write.lock.zookeeper.url", "zookeeper")
.option("hoodie.write.lock.zookeeper.port", "2181")
- .option("hoodie.write.lock.wait_time_ms", "12000")
- .option("hoodie.write.lock.num_retries", "2")
.option("hoodie.write.lock.zookeeper.lock_key", "test_table")
.option("hoodie.write.lock.zookeeper.base_path", "/test")
.option(RECORDKEY_FIELD_OPT_KEY, "uuid")
diff --git a/docs/_docs/2_4_configurations.md b/docs/_docs/2_4_configurations.md
index e176550..d8f0c90 100644
--- a/docs/_docs/2_4_configurations.md
+++ b/docs/_docs/2_4_configurations.md
@@ -468,6 +468,10 @@ Configs that control compaction (merging of log files onto
a new parquet base fi
Property: `hoodie.cleaner.policy` <br/>
<span style="color:grey"> Cleaning policy to be used. Hudi will delete older
versions of parquet files to re-claim space. Any Query/Computation referring to
this version of the file will fail. It is good to make sure that the data is
retained for more than the maximum query execution time.</span>
+#### withFailedWritesCleaningPolicy(policy =
HoodieFailedWritesCleaningPolicy.EAGER) {#withFailedWritesCleaningPolicy}
+Property: `hoodie.cleaner.policy.failed.writes` <br/>
+<span style="color:grey"> Cleaning policy for failed writes to be used. Hudi
will delete any files written by failed writes to re-claim space. Choose to
perform this rollback of failed writes `eagerly` before every writer starts
(only supported for single writer) or `lazily` by the cleaner (required for
multi-writers)</span>
+
#### retainCommits(no_of_commits_to_retain = 24) {#retainCommits}
Property: `hoodie.cleaner.commits.retained` <br/>
<span style="color:grey">Number of commits to retain. So data will be retained
for num_of_commits * time_between_commits (scheduled). This also directly
translates into how much you can incrementally pull on this table</span>
@@ -830,59 +834,59 @@ Configs that control locking mechanisms if
[WriteConcurrencyMode=optimistic_conc
[withLockConfig](#withLockConfig) (HoodieLockConfig) <br/>
#### withLockProvider(lockProvider =
org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider)
{#withLockProvider}
-Property: `hoodie.writer.lock.provider` <br/>
+Property: `hoodie.write.lock.provider` <br/>
<span style="color:grey">Lock provider class name, user can provide their own
implementation of LockProvider which should be subclass of
org.apache.hudi.common.lock.LockProvider</span>
#### withZkQuorum(zkQuorum) {#withZkQuorum}
-Property: `hoodie.writer.lock.zookeeper.url` <br/>
+Property: `hoodie.write.lock.zookeeper.url` <br/>
<span style="color:grey">Set the list of comma separated servers to connect
to</span>
#### withZkBasePath(zkBasePath) {#withZkBasePath}
-Property: `hoodie.writer.lock.zookeeper.base_path` [Required] <br/>
+Property: `hoodie.write.lock.zookeeper.base_path` [Required] <br/>
<span style="color:grey">The base path on Zookeeper under which to create a
ZNode to acquire the lock. This should be common for all jobs writing to the
same table</span>
#### withZkPort(zkPort) {#withZkPort}
-Property: `hoodie.writer.lock.zookeeper.port` [Required] <br/>
+Property: `hoodie.write.lock.zookeeper.port` [Required] <br/>
<span style="color:grey">The connection port to be used for Zookeeper</span>
#### withZkLockKey(zkLockKey) {#withZkLockKey}
-Property: `hoodie.writer.lock.zookeeper.lock_key` [Required] <br/>
+Property: `hoodie.write.lock.zookeeper.lock_key` [Required] <br/>
<span style="color:grey">Key name under base_path at which to create a ZNode
and acquire lock. Final path on zk will look like base_path/lock_key. We
recommend setting this to the table name</span>
#### withZkConnectionTimeoutInMs(connectionTimeoutInMs = 15000)
{#withZkConnectionTimeoutInMs}
-Property: `hoodie.writer.lock.zookeeper.connection_timeout_ms` <br/>
+Property: `hoodie.write.lock.zookeeper.connection_timeout_ms` <br/>
<span style="color:grey">How long to wait when connecting to ZooKeeper before
considering the connection a failure</span>
#### withZkSessionTimeoutInMs(sessionTimeoutInMs = 60000)
{#withZkSessionTimeoutInMs}
-Property: `hoodie.writer.lock.zookeeper.session_timeout_ms` <br/>
+Property: `hoodie.write.lock.zookeeper.session_timeout_ms` <br/>
<span style="color:grey">How long to wait after losing a connection to
ZooKeeper before the session is expired</span>
#### withNumRetries(num_retries = 3) {#withNumRetries}
-Property: `hoodie.writer.lock.num_retries` <br/>
+Property: `hoodie.write.lock.num_retries` <br/>
<span style="color:grey">Maximum number of times to retry by lock provider
client</span>
#### withRetryWaitTimeInMillis(retryWaitTimeInMillis = 5000)
{#withRetryWaitTimeInMillis}
-Property: `hoodie.writer.lock.wait_time_ms_between_retry` <br/>
+Property: `hoodie.write.lock.wait_time_ms_between_retry` <br/>
<span style="color:grey">Initial amount of time to wait between retries by
lock provider client</span>
#### withHiveDatabaseName(hiveDatabaseName) {#withHiveDatabaseName}
-Property: `hoodie.writer.lock.hivemetastore.database` [Required] <br/>
+Property: `hoodie.write.lock.hivemetastore.database` [Required] <br/>
<span style="color:grey">The Hive database to acquire lock against</span>
#### withHiveTableName(hiveTableName) {#withHiveTableName}
-Property: `hoodie.writer.lock.hivemetastore.table` [Required] <br/>
+Property: `hoodie.write.lock.hivemetastore.table` [Required] <br/>
<span style="color:grey">The Hive table under the hive database to acquire
lock against</span>
#### withClientNumRetries(clientNumRetries = 0) {#withClientNumRetries}
-Property: `hoodie.writer.lock.client.num_retries` <br/>
+Property: `hoodie.write.lock.client.num_retries` <br/>
<span style="color:grey">Maximum number of times to retry to acquire lock
additionally from the hudi client</span>
#### withRetryWaitTimeInMillis(retryWaitTimeInMillis = 10000)
{#withRetryWaitTimeInMillis}
-Property: `hoodie.writer.lock.client.wait_time_ms_between_retry` <br/>
+Property: `hoodie.write.lock.client.wait_time_ms_between_retry` <br/>
<span style="color:grey">Amount of time to wait between retries from the hudi
client</span>
#### withConflictResolutionStrategy(lockProvider =
org.apache.hudi.client.transaction.SimpleConcurrentFileWritesConflictResolutionStrategy)
{#withConflictResolutionStrategy}
-Property: `hoodie.writer.lock.conflict.resolution.strategy` <br/>
+Property: `hoodie.write.lock.conflict.resolution.strategy` <br/>
<span style="color:grey">Lock provider class name, this should be subclass of
org.apache.hudi.client.transaction.ConflictResolutionStrategy</span>
diff --git a/docs/_docs/2_9_concurrency_control.md
b/docs/_docs/2_9_concurrency_control.md
index f4ada0a..918556e 100644
--- a/docs/_docs/2_9_concurrency_control.md
+++ b/docs/_docs/2_9_concurrency_control.md
@@ -56,8 +56,6 @@ There are 2 different server based lock providers that
require different configu
hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
hoodie.write.lock.zookeeper.url
hoodie.write.lock.zookeeper.port
-hoodie.write.lock.wait_time_ms
-hoodie.write.lock.num_retries
hoodie.write.lock.zookeeper.lock_key
hoodie.write.lock.zookeeper.base_path
```
@@ -68,8 +66,6 @@ hoodie.write.lock.zookeeper.base_path
hoodie.write.lock.provider=org.apache.hudi.hive.HiveMetastoreBasedLockProvider
hoodie.write.lock.hivemetastore.database
hoodie.write.lock.hivemetastore.table
-hoodie.write.lock.wait_time_ms
-hoodie.write.lock.num_retries
```
`The HiveMetastore URI's are picked up from the hadoop configuration file
loaded during runtime.`
@@ -88,8 +84,6 @@ inputDF.write.format("hudi")
.option("hoodie.write.concurrency.mode",
"optimistic_concurrency_control")
.option("hoodie.write.lock.zookeeper.url", "zookeeper")
.option("hoodie.write.lock.zookeeper.port", "2181")
- .option("hoodie.write.lock.wait_time_ms", "12000")
- .option("hoodie.write.lock.num_retries", "2")
.option("hoodie.write.lock.zookeeper.lock_key", "test_table")
.option("hoodie.write.lock.zookeeper.base_path", "/test")
.option(RECORDKEY_FIELD_OPT_KEY, "uuid")