This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push: new a961350 [MINOR] Fix concurrency docs (#2794) a961350 is described below commit a961350740abf4d1637798bc287bd0b6b9800305 Author: n3nash <nagar...@uber.com> AuthorDate: Fri Apr 9 00:48:45 2021 -0700 [MINOR] Fix concurrency docs (#2794) --- docs/_docs/0.8.0/2_4_configurations.md | 32 ++++++++++++++++------------- docs/_docs/0.8.0/2_9_concurrency_control.md | 6 ------ docs/_docs/2_4_configurations.md | 32 ++++++++++++++++------------- docs/_docs/2_9_concurrency_control.md | 6 ------ 4 files changed, 36 insertions(+), 40 deletions(-) diff --git a/docs/_docs/0.8.0/2_4_configurations.md b/docs/_docs/0.8.0/2_4_configurations.md index 0a5a4ab..207bf80 100644 --- a/docs/_docs/0.8.0/2_4_configurations.md +++ b/docs/_docs/0.8.0/2_4_configurations.md @@ -469,6 +469,10 @@ Configs that control compaction (merging of log files onto a new parquet base fi Property: `hoodie.cleaner.policy` <br/> <span style="color:grey"> Cleaning policy to be used. Hudi will delete older versions of parquet files to re-claim space. Any Query/Computation referring to this version of the file will fail. It is good to make sure that the data is retained for more than the maximum query execution time.</span> +#### withFailedWritesCleaningPolicy(policy = HoodieFailedWritesCleaningPolicy.EAGER) {#withFailedWritesCleaningPolicy} +Property: `hoodie.cleaner.policy.failed.writes` <br/> +<span style="color:grey"> Cleaning policy for failed writes to be used. Hudi will delete any files written by failed writes to re-claim space. Choose to perform this rollback of failed writes `eagerly` before every writer starts (only supported for single writer) or `lazily` by the cleaner (required for multi-writers)</span> + #### retainCommits(no_of_commits_to_retain = 24) {#retainCommits} Property: `hoodie.cleaner.commits.retained` <br/> <span style="color:grey">Number of commits to retain. So data will be retained for num_of_commits * time_between_commits (scheduled). This also directly translates into how much you can incrementally pull on this table</span> @@ -831,59 +835,59 @@ Configs that control locking mechanisms if [WriteConcurrencyMode=optimistic_conc [withLockConfig](#withLockConfig) (HoodieLockConfig) <br/> #### withLockProvider(lockProvider = org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider) {#withLockProvider} -Property: `hoodie.writer.lock.provider` <br/> +Property: `hoodie.write.lock.provider` <br/> <span style="color:grey">Lock provider class name, user can provide their own implementation of LockProvider which should be subclass of org.apache.hudi.common.lock.LockProvider</span> #### withZkQuorum(zkQuorum) {#withZkQuorum} -Property: `hoodie.writer.lock.zookeeper.url` <br/> +Property: `hoodie.write.lock.zookeeper.url` <br/> <span style="color:grey">Set the list of comma separated servers to connect to</span> #### withZkBasePath(zkBasePath) {#withZkBasePath} -Property: `hoodie.writer.lock.zookeeper.base_path` [Required] <br/> +Property: `hoodie.write.lock.zookeeper.base_path` [Required] <br/> <span style="color:grey">The base path on Zookeeper under which to create a ZNode to acquire the lock. This should be common for all jobs writing to the same table</span> #### withZkPort(zkPort) {#withZkPort} -Property: `hoodie.writer.lock.zookeeper.port` [Required] <br/> +Property: `hoodie.write.lock.zookeeper.port` [Required] <br/> <span style="color:grey">The connection port to be used for Zookeeper</span> #### withZkLockKey(zkLockKey) {#withZkLockKey} -Property: `hoodie.writer.lock.zookeeper.lock_key` [Required] <br/> +Property: `hoodie.write.lock.zookeeper.lock_key` [Required] <br/> <span style="color:grey">Key name under base_path at which to create a ZNode and acquire lock. Final path on zk will look like base_path/lock_key. We recommend setting this to the table name</span> #### withZkConnectionTimeoutInMs(connectionTimeoutInMs = 15000) {#withZkConnectionTimeoutInMs} -Property: `hoodie.writer.lock.zookeeper.connection_timeout_ms` <br/> +Property: `hoodie.write.lock.zookeeper.connection_timeout_ms` <br/> <span style="color:grey">How long to wait when connecting to ZooKeeper before considering the connection a failure</span> #### withZkSessionTimeoutInMs(sessionTimeoutInMs = 60000) {#withZkSessionTimeoutInMs} -Property: `hoodie.writer.lock.zookeeper.session_timeout_ms` <br/> +Property: `hoodie.write.lock.zookeeper.session_timeout_ms` <br/> <span style="color:grey">How long to wait after losing a connection to ZooKeeper before the session is expired</span> #### withNumRetries(num_retries = 3) {#withNumRetries} -Property: `hoodie.writer.lock.num_retries` <br/> +Property: `hoodie.write.lock.num_retries` <br/> <span style="color:grey">Maximum number of times to retry by lock provider client</span> #### withRetryWaitTimeInMillis(retryWaitTimeInMillis = 5000) {#withRetryWaitTimeInMillis} -Property: `hoodie.writer.lock.wait_time_ms_between_retry` <br/> +Property: `hoodie.write.lock.wait_time_ms_between_retry` <br/> <span style="color:grey">Initial amount of time to wait between retries by lock provider client</span> #### withHiveDatabaseName(hiveDatabaseName) {#withHiveDatabaseName} -Property: `hoodie.writer.lock.hivemetastore.database` [Required] <br/> +Property: `hoodie.write.lock.hivemetastore.database` [Required] <br/> <span style="color:grey">The Hive database to acquire lock against</span> #### withHiveTableName(hiveTableName) {#withHiveTableName} -Property: `hoodie.writer.lock.hivemetastore.table` [Required] <br/> +Property: `hoodie.write.lock.hivemetastore.table` [Required] <br/> <span style="color:grey">The Hive table under the hive database to acquire lock against</span> #### withClientNumRetries(clientNumRetries = 0) {#withClientNumRetries} -Property: `hoodie.writer.lock.client.num_retries` <br/> +Property: `hoodie.write.lock.client.num_retries` <br/> <span style="color:grey">Maximum number of times to retry to acquire lock additionally from the hudi client</span> #### withRetryWaitTimeInMillis(retryWaitTimeInMillis = 10000) {#withRetryWaitTimeInMillis} -Property: `hoodie.writer.lock.client.wait_time_ms_between_retry` <br/> +Property: `hoodie.write.lock.client.wait_time_ms_between_retry` <br/> <span style="color:grey">Amount of time to wait between retries from the hudi client</span> #### withConflictResolutionStrategy(lockProvider = org.apache.hudi.client.transaction.SimpleConcurrentFileWritesConflictResolutionStrategy) {#withConflictResolutionStrategy} -Property: `hoodie.writer.lock.conflict.resolution.strategy` <br/> +Property: `hoodie.write.lock.conflict.resolution.strategy` <br/> <span style="color:grey">Lock provider class name, this should be subclass of org.apache.hudi.client.transaction.ConflictResolutionStrategy</span> diff --git a/docs/_docs/0.8.0/2_9_concurrency_control.md b/docs/_docs/0.8.0/2_9_concurrency_control.md index 563da9b..1aab8e4 100644 --- a/docs/_docs/0.8.0/2_9_concurrency_control.md +++ b/docs/_docs/0.8.0/2_9_concurrency_control.md @@ -57,8 +57,6 @@ There are 2 different server based lock providers that require different configu hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider hoodie.write.lock.zookeeper.url hoodie.write.lock.zookeeper.port -hoodie.write.lock.wait_time_ms -hoodie.write.lock.num_retries hoodie.write.lock.zookeeper.lock_key hoodie.write.lock.zookeeper.base_path ``` @@ -69,8 +67,6 @@ hoodie.write.lock.zookeeper.base_path hoodie.write.lock.provider=org.apache.hudi.hive.HiveMetastoreBasedLockProvider hoodie.write.lock.hivemetastore.database hoodie.write.lock.hivemetastore.table -hoodie.write.lock.wait_time_ms -hoodie.write.lock.num_retries ``` `The HiveMetastore URI's are picked up from the hadoop configuration file loaded during runtime.` @@ -89,8 +85,6 @@ inputDF.write.format("hudi") .option("hoodie.write.concurrency.mode", "optimistic_concurrency_control") .option("hoodie.write.lock.zookeeper.url", "zookeeper") .option("hoodie.write.lock.zookeeper.port", "2181") - .option("hoodie.write.lock.wait_time_ms", "12000") - .option("hoodie.write.lock.num_retries", "2") .option("hoodie.write.lock.zookeeper.lock_key", "test_table") .option("hoodie.write.lock.zookeeper.base_path", "/test") .option(RECORDKEY_FIELD_OPT_KEY, "uuid") diff --git a/docs/_docs/2_4_configurations.md b/docs/_docs/2_4_configurations.md index e176550..d8f0c90 100644 --- a/docs/_docs/2_4_configurations.md +++ b/docs/_docs/2_4_configurations.md @@ -468,6 +468,10 @@ Configs that control compaction (merging of log files onto a new parquet base fi Property: `hoodie.cleaner.policy` <br/> <span style="color:grey"> Cleaning policy to be used. Hudi will delete older versions of parquet files to re-claim space. Any Query/Computation referring to this version of the file will fail. It is good to make sure that the data is retained for more than the maximum query execution time.</span> +#### withFailedWritesCleaningPolicy(policy = HoodieFailedWritesCleaningPolicy.EAGER) {#withFailedWritesCleaningPolicy} +Property: `hoodie.cleaner.policy.failed.writes` <br/> +<span style="color:grey"> Cleaning policy for failed writes to be used. Hudi will delete any files written by failed writes to re-claim space. Choose to perform this rollback of failed writes `eagerly` before every writer starts (only supported for single writer) or `lazily` by the cleaner (required for multi-writers)</span> + #### retainCommits(no_of_commits_to_retain = 24) {#retainCommits} Property: `hoodie.cleaner.commits.retained` <br/> <span style="color:grey">Number of commits to retain. So data will be retained for num_of_commits * time_between_commits (scheduled). This also directly translates into how much you can incrementally pull on this table</span> @@ -830,59 +834,59 @@ Configs that control locking mechanisms if [WriteConcurrencyMode=optimistic_conc [withLockConfig](#withLockConfig) (HoodieLockConfig) <br/> #### withLockProvider(lockProvider = org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider) {#withLockProvider} -Property: `hoodie.writer.lock.provider` <br/> +Property: `hoodie.write.lock.provider` <br/> <span style="color:grey">Lock provider class name, user can provide their own implementation of LockProvider which should be subclass of org.apache.hudi.common.lock.LockProvider</span> #### withZkQuorum(zkQuorum) {#withZkQuorum} -Property: `hoodie.writer.lock.zookeeper.url` <br/> +Property: `hoodie.write.lock.zookeeper.url` <br/> <span style="color:grey">Set the list of comma separated servers to connect to</span> #### withZkBasePath(zkBasePath) {#withZkBasePath} -Property: `hoodie.writer.lock.zookeeper.base_path` [Required] <br/> +Property: `hoodie.write.lock.zookeeper.base_path` [Required] <br/> <span style="color:grey">The base path on Zookeeper under which to create a ZNode to acquire the lock. This should be common for all jobs writing to the same table</span> #### withZkPort(zkPort) {#withZkPort} -Property: `hoodie.writer.lock.zookeeper.port` [Required] <br/> +Property: `hoodie.write.lock.zookeeper.port` [Required] <br/> <span style="color:grey">The connection port to be used for Zookeeper</span> #### withZkLockKey(zkLockKey) {#withZkLockKey} -Property: `hoodie.writer.lock.zookeeper.lock_key` [Required] <br/> +Property: `hoodie.write.lock.zookeeper.lock_key` [Required] <br/> <span style="color:grey">Key name under base_path at which to create a ZNode and acquire lock. Final path on zk will look like base_path/lock_key. We recommend setting this to the table name</span> #### withZkConnectionTimeoutInMs(connectionTimeoutInMs = 15000) {#withZkConnectionTimeoutInMs} -Property: `hoodie.writer.lock.zookeeper.connection_timeout_ms` <br/> +Property: `hoodie.write.lock.zookeeper.connection_timeout_ms` <br/> <span style="color:grey">How long to wait when connecting to ZooKeeper before considering the connection a failure</span> #### withZkSessionTimeoutInMs(sessionTimeoutInMs = 60000) {#withZkSessionTimeoutInMs} -Property: `hoodie.writer.lock.zookeeper.session_timeout_ms` <br/> +Property: `hoodie.write.lock.zookeeper.session_timeout_ms` <br/> <span style="color:grey">How long to wait after losing a connection to ZooKeeper before the session is expired</span> #### withNumRetries(num_retries = 3) {#withNumRetries} -Property: `hoodie.writer.lock.num_retries` <br/> +Property: `hoodie.write.lock.num_retries` <br/> <span style="color:grey">Maximum number of times to retry by lock provider client</span> #### withRetryWaitTimeInMillis(retryWaitTimeInMillis = 5000) {#withRetryWaitTimeInMillis} -Property: `hoodie.writer.lock.wait_time_ms_between_retry` <br/> +Property: `hoodie.write.lock.wait_time_ms_between_retry` <br/> <span style="color:grey">Initial amount of time to wait between retries by lock provider client</span> #### withHiveDatabaseName(hiveDatabaseName) {#withHiveDatabaseName} -Property: `hoodie.writer.lock.hivemetastore.database` [Required] <br/> +Property: `hoodie.write.lock.hivemetastore.database` [Required] <br/> <span style="color:grey">The Hive database to acquire lock against</span> #### withHiveTableName(hiveTableName) {#withHiveTableName} -Property: `hoodie.writer.lock.hivemetastore.table` [Required] <br/> +Property: `hoodie.write.lock.hivemetastore.table` [Required] <br/> <span style="color:grey">The Hive table under the hive database to acquire lock against</span> #### withClientNumRetries(clientNumRetries = 0) {#withClientNumRetries} -Property: `hoodie.writer.lock.client.num_retries` <br/> +Property: `hoodie.write.lock.client.num_retries` <br/> <span style="color:grey">Maximum number of times to retry to acquire lock additionally from the hudi client</span> #### withRetryWaitTimeInMillis(retryWaitTimeInMillis = 10000) {#withRetryWaitTimeInMillis} -Property: `hoodie.writer.lock.client.wait_time_ms_between_retry` <br/> +Property: `hoodie.write.lock.client.wait_time_ms_between_retry` <br/> <span style="color:grey">Amount of time to wait between retries from the hudi client</span> #### withConflictResolutionStrategy(lockProvider = org.apache.hudi.client.transaction.SimpleConcurrentFileWritesConflictResolutionStrategy) {#withConflictResolutionStrategy} -Property: `hoodie.writer.lock.conflict.resolution.strategy` <br/> +Property: `hoodie.write.lock.conflict.resolution.strategy` <br/> <span style="color:grey">Lock provider class name, this should be subclass of org.apache.hudi.client.transaction.ConflictResolutionStrategy</span> diff --git a/docs/_docs/2_9_concurrency_control.md b/docs/_docs/2_9_concurrency_control.md index f4ada0a..918556e 100644 --- a/docs/_docs/2_9_concurrency_control.md +++ b/docs/_docs/2_9_concurrency_control.md @@ -56,8 +56,6 @@ There are 2 different server based lock providers that require different configu hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider hoodie.write.lock.zookeeper.url hoodie.write.lock.zookeeper.port -hoodie.write.lock.wait_time_ms -hoodie.write.lock.num_retries hoodie.write.lock.zookeeper.lock_key hoodie.write.lock.zookeeper.base_path ``` @@ -68,8 +66,6 @@ hoodie.write.lock.zookeeper.base_path hoodie.write.lock.provider=org.apache.hudi.hive.HiveMetastoreBasedLockProvider hoodie.write.lock.hivemetastore.database hoodie.write.lock.hivemetastore.table -hoodie.write.lock.wait_time_ms -hoodie.write.lock.num_retries ``` `The HiveMetastore URI's are picked up from the hadoop configuration file loaded during runtime.` @@ -88,8 +84,6 @@ inputDF.write.format("hudi") .option("hoodie.write.concurrency.mode", "optimistic_concurrency_control") .option("hoodie.write.lock.zookeeper.url", "zookeeper") .option("hoodie.write.lock.zookeeper.port", "2181") - .option("hoodie.write.lock.wait_time_ms", "12000") - .option("hoodie.write.lock.num_retries", "2") .option("hoodie.write.lock.zookeeper.lock_key", "test_table") .option("hoodie.write.lock.zookeeper.base_path", "/test") .option(RECORDKEY_FIELD_OPT_KEY, "uuid")