szehon-ho commented on code in PR #6570:
URL: https://github.com/apache/iceberg/pull/6570#discussion_r1171963570
##########
docs/configuration.md:
##########
@@ -178,8 +178,13 @@ The HMS table locking is a 2-step process:
| iceberg.hive.lock-heartbeat-interval-ms | 240000 (4 min) | The heartbeat
interval for the HMS locks. |
| iceberg.hive.metadata-refresh-max-retries | 2 | Maximum number
of retries when the metadata file is missing |
| iceberg.hive.table-level-lock-evict-ms | 600000 (10 min) | The timeout
for the JVM table lock is |
+| iceberg.engine.hive.lock-enabled | true | If enabled HMS
locks will be used to ensure of the atomicity of the commits |
Review Comment:
Can be shortened: Use HMS locks to ensure atomicity of commits
?
##########
hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreUtil.java:
##########
@@ -48,14 +50,28 @@ public class MetastoreUtil {
private MetastoreUtil() {}
/**
- * Calls alter_table method using the metastore client. If possible, an
environmental context will
- * be used that turns off stats updates to avoid recursive listing.
+ * Calls alter_table method using the metastore client. If the HMS supports
then, environmental
+ * context with will be set in a way that turns off stats updates to avoid
recursive file listing.
*/
public static void alterTable(
IMetaStoreClient client, String databaseName, String tblName, Table
table) {
- EnvironmentContext envContext =
- new EnvironmentContext(
- ImmutableMap.of(StatsSetupConst.DO_NOT_UPDATE_STATS,
StatsSetupConst.TRUE));
- ALTER_TABLE.invoke(client, databaseName, tblName, table, envContext);
+ alterTable(client, databaseName, tblName, table, ImmutableMap.of());
+ }
+
+ /**
+ * Calls alter_table method using the metastore client. If the HMS supports
then, environmental
+ * context with will be set in a way that turns off stats updates to avoid
recursive file listing.
+ */
+ public static void alterTable(
+ IMetaStoreClient client,
+ String databaseName,
+ String tblName,
+ Table table,
+ Map<String, String> extraEnv) {
+ Map<String, String> env = Maps.newHashMapWithExpectedSize(extraEnv.size()
+ 1);
+ env.putAll(extraEnv);
+ env.put(StatsSetupConst.DO_NOT_UPDATE_STATS, StatsSetupConst.TRUE);
+
Review Comment:
I was looking through DynMethod, there is no check and will just pass
whatever args it can fit. I think it will be nice to add a 'argLength' method
DynMethod and then assert that argLength == 5 here, so ensure the env is not
silently dropped.
##########
docs/configuration.md:
##########
@@ -178,8 +178,13 @@ The HMS table locking is a 2-step process:
| iceberg.hive.lock-heartbeat-interval-ms | 240000 (4 min) | The heartbeat
interval for the HMS locks. |
| iceberg.hive.metadata-refresh-max-retries | 2 | Maximum number
of retries when the metadata file is missing |
| iceberg.hive.table-level-lock-evict-ms | 600000 (10 min) | The timeout
for the JVM table lock is |
+| iceberg.engine.hive.lock-enabled | true | If enabled HMS
locks will be used to ensure of the atomicity of the commits |
Note: `iceberg.hive.lock-check-max-wait-ms` and
`iceberg.hive.lock-heartbeat-interval-ms` should be less than the [transaction
timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout)
of the Hive Metastore (`hive.txn.timeout` or `metastore.txn.timeout` in the
newer versions). Otherwise, the heartbeats on the lock (which happens during
the lock checks) would end up expiring in the
Hive Metastore before the lock is retried from Iceberg.
+Note: `iceberg.engine.hive.lock-enabled` should only be set to `false` if
[HIVE-26882](https://issues.apache.org/jira/browse/HIVE-26882)
+is available on the Hive Metastore server and every writer of a given table is
using Iceberg 1.2 or later.
Review Comment:
We should change this to warn. We also miss warning about some catalogs
setting true and others setting false.
How about this to add some details. I use HiveCatalog here, even though its
not defined in the doc, otherwise its quite lengthy to continue the language-
"catalog using Hive Metastore connector"
```
Warn: Setting `iceberg.engine.hive.lock-enabled` will cause HiveCatalog to
commit to tables without using Hive locks. This should only be set to `false`
if all the following conditions are met:
* [HIVE-26882](https://issues.apache.org/jira/browse/HIVE-26882)
is available on the Hive Metastore server
* All other HiveCatalogs committing to tables that this HiveCatalog commits
to are also on Iceberg 1.3 or later
* All other HiveCatalogs committing to tables that this HiveCatalog commits
to have also disabled Hive locks on commit.
Failing to ensure these conditions risks corrupting the table.
Even with `iceberg.engine.hive.lock-enabled` set to `false`, a HiveCatalog
can still use locks for individual tables by setting the table property
'engine.hive.lock-enabled'='true'. This is useful in the case where other
HiveCatalogs cannot be upgraded and set to commit without using Hive locks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]