rajarshisarkar commented on a change in pull request #4216: URL: https://github.com/apache/iceberg/pull/4216#discussion_r816676230
########## File path: docs/versioned/integrations/aws.md ########## @@ -188,21 +180,22 @@ However, if you are streaming data to Iceberg, this will easily create a lot of Therefore, it is recommended to turn off the archive feature in Glue by setting `glue.skip-archive` to `true`. For more details, please read [Glue Quotas](https://docs.aws.amazon.com/general/latest/gr/glue.html) and the [UpdateTable API](https://docs.aws.amazon.com/glue/latest/webapi/API_UpdateTable.html). -#### DynamoDB for Commit Locking +#### Optimistic Locking -Glue does not have a strong guarantee over concurrent updates to a table. -Although it throws `ConcurrentModificationException` when detecting two processes updating a table at the same time, -there is no guarantee that one update would not clobber the other update. -Therefore, [DynamoDB](https://aws.amazon.com/dynamodb) can be used for Glue, so that for every commit, -`GlueCatalog` first obtains a lock using a helper DynamoDB table and then try to safely modify the Glue table. +By default, Iceberg uses Glue's optimistic locking for concurrent updates to a table. +With optimistic locking, each table has a version id. +If you retrieve the table metadata, Iceberg records the version id of that table. +You can update the table, but only if the version id on the server side has not changed. +If there is a version mismatch, it means that someone else has modified the table before you did. +The update attempt fails, because you have a stale version of the table. +If this happens, Iceberg simply tries again by retrieving the table metadata and then tries to update it. +Optimistic locking prevents you from accidentally overwriting changes that were made by others. Review comment: Thanks, I have made the changes. ########## File path: docs/versioned/integrations/aws.md ########## @@ -188,21 +180,22 @@ However, if you are streaming data to Iceberg, this will easily create a lot of Therefore, it is recommended to turn off the archive feature in Glue by setting `glue.skip-archive` to `true`. For more details, please read [Glue Quotas](https://docs.aws.amazon.com/general/latest/gr/glue.html) and the [UpdateTable API](https://docs.aws.amazon.com/glue/latest/webapi/API_UpdateTable.html). -#### DynamoDB for Commit Locking +#### Optimistic Locking -Glue does not have a strong guarantee over concurrent updates to a table. -Although it throws `ConcurrentModificationException` when detecting two processes updating a table at the same time, -there is no guarantee that one update would not clobber the other update. -Therefore, [DynamoDB](https://aws.amazon.com/dynamodb) can be used for Glue, so that for every commit, -`GlueCatalog` first obtains a lock using a helper DynamoDB table and then try to safely modify the Glue table. +By default, Iceberg uses Glue's optimistic locking for concurrent updates to a table. +With optimistic locking, each table has a version id. +If you retrieve the table metadata, Iceberg records the version id of that table. +You can update the table, but only if the version id on the server side has not changed. +If there is a version mismatch, it means that someone else has modified the table before you did. +The update attempt fails, because you have a stale version of the table. +If this happens, Iceberg simply tries again by retrieving the table metadata and then tries to update it. +Optimistic locking prevents you from accidentally overwriting changes that were made by others. +It also prevents others from accidentally overwriting your changes. -This feature requires the following lock related catalog properties: - -1. Set `lock-impl` as `org.apache.iceberg.aws.glue.DynamoLockManager`. -2. Set `lock.table` as the DynamoDB table name you would like to use. If the lock table with the given name does not exist in DynamoDB, a new table is created with billing mode set as [pay-per-request](https://aws.amazon.com/blogs/aws/amazon-dynamodb-on-demand-no-capacity-planning-and-pay-per-request-pricing). - -Other lock related catalog properties can also be used to adjust locking behaviors such as heartbeat interval. -For more details, please refer to [Lock catalog properties](../configuration/#lock-catalog-properties). +{{< hint info >}} +Please use AWS SDK version >= 2.17.131 to leverage Glue's Optimistic Locking. +If the AWS SDK version is below 2.17.131, then please refer the [DynamoDb Lock Manager section](#dynamodb-lock-manager). Review comment: I have made the changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org