szehon-ho commented on code in PR #6570:
URL: https://github.com/apache/iceberg/pull/6570#discussion_r1171963570


##########
docs/configuration.md:
##########
@@ -178,8 +178,13 @@ The HMS table locking is a 2-step process:
 | iceberg.hive.lock-heartbeat-interval-ms   | 240000 (4 min)  | The heartbeat 
interval for the HMS locks.                                    |
 | iceberg.hive.metadata-refresh-max-retries | 2               | Maximum number 
of retries when the metadata file is missing                  |
 | iceberg.hive.table-level-lock-evict-ms    | 600000 (10 min) | The timeout 
for the JVM table lock is                                        |
+| iceberg.engine.hive.lock-enabled          | true            | If enabled HMS 
locks will be used to ensure of the atomicity of the commits  |

Review Comment:
   Can be shortened: Use HMS locks to ensure atomicity of commits
   
   ?



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreUtil.java:
##########
@@ -48,14 +50,28 @@ public class MetastoreUtil {
   private MetastoreUtil() {}
 
   /**
-   * Calls alter_table method using the metastore client. If possible, an 
environmental context will
-   * be used that turns off stats updates to avoid recursive listing.
+   * Calls alter_table method using the metastore client. If the HMS supports 
then, environmental
+   * context with will be set in a way that turns off stats updates to avoid 
recursive file listing.
    */
   public static void alterTable(
       IMetaStoreClient client, String databaseName, String tblName, Table 
table) {
-    EnvironmentContext envContext =
-        new EnvironmentContext(
-            ImmutableMap.of(StatsSetupConst.DO_NOT_UPDATE_STATS, 
StatsSetupConst.TRUE));
-    ALTER_TABLE.invoke(client, databaseName, tblName, table, envContext);
+    alterTable(client, databaseName, tblName, table, ImmutableMap.of());
+  }
+
+  /**
+   * Calls alter_table method using the metastore client. If the HMS supports 
then, environmental
+   * context with will be set in a way that turns off stats updates to avoid 
recursive file listing.
+   */
+  public static void alterTable(
+      IMetaStoreClient client,
+      String databaseName,
+      String tblName,
+      Table table,
+      Map<String, String> extraEnv) {
+    Map<String, String> env = Maps.newHashMapWithExpectedSize(extraEnv.size() 
+ 1);
+    env.putAll(extraEnv);
+    env.put(StatsSetupConst.DO_NOT_UPDATE_STATS, StatsSetupConst.TRUE);
+

Review Comment:
   I was looking through DynMethod, there is no check and will just pass 
whatever args it can fit.  I think it will be nice to add a 'argLength' method 
DynMethod and then assert that argLength == 5 here, so ensure the env is not 
silently dropped.



##########
docs/configuration.md:
##########
@@ -178,8 +178,13 @@ The HMS table locking is a 2-step process:
 | iceberg.hive.lock-heartbeat-interval-ms   | 240000 (4 min)  | The heartbeat 
interval for the HMS locks.                                    |
 | iceberg.hive.metadata-refresh-max-retries | 2               | Maximum number 
of retries when the metadata file is missing                  |
 | iceberg.hive.table-level-lock-evict-ms    | 600000 (10 min) | The timeout 
for the JVM table lock is                                        |
+| iceberg.engine.hive.lock-enabled          | true            | If enabled HMS 
locks will be used to ensure of the atomicity of the commits  |
 
 Note: `iceberg.hive.lock-check-max-wait-ms` and 
`iceberg.hive.lock-heartbeat-interval-ms` should be less than the [transaction 
timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout)
 
 of the Hive Metastore (`hive.txn.timeout` or `metastore.txn.timeout` in the 
newer versions). Otherwise, the heartbeats on the lock (which happens during 
the lock checks) would end up expiring in the 
 Hive Metastore before the lock is retried from Iceberg.
 
+Note: `iceberg.engine.hive.lock-enabled` should only be set to `false` if 
[HIVE-26882](https://issues.apache.org/jira/browse/HIVE-26882)
+is available on the Hive Metastore server and every writer of a given table is 
using Iceberg 1.2 or later.

Review Comment:
   We should change this to warn.  We also miss warning about some catalogs 
setting true and others setting false.  
   
   How about this to add some details.  I use HiveCatalog here, even though its 
not defined in the doc, otherwise its quite lengthy to continue the language- 
"catalog using Hive Metastore connector"
   
   ```
   Warn: Setting `iceberg.engine.hive.lock-enabled` will cause HiveCatalog to 
commit to tables without using Hive locks.  This should only be set to `false` 
if all the following conditions are met: 
   * [HIVE-26882](https://issues.apache.org/jira/browse/HIVE-26882)
   is available on the Hive Metastore server
   * All other HiveCatalogs committing to tables that this HiveCatalog commits 
to are also on Iceberg 1.3 or later
   * All other HiveCatalogs committing to tables that this HiveCatalog commits 
to have also disabled Hive locks on commit.
   
   Failing to ensure these conditions risks corrupting the table.
   
   Even with `iceberg.engine.hive.lock-enabled` set to `false`, a HiveCatalog 
can still use locks for individual tables by setting the table property 
'engine.hive.lock-enabled'='true'.  This is useful in the case where other 
HiveCatalogs cannot be upgraded and set to commit without using Hive locks.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to