szehon-ho opened a new pull request #3099:
URL: https://github.com/apache/iceberg/pull/3099


   The options are: implement it in ClientPoolImpl, or use 
RetryingMetaStoreClient.  From the initial discussion, leaning towards option 
2. Some justifications below :
   
   * RetryingMetaStoreClient is used today in Hive and Spark, and is more 
battle-tested. HiveClientPool will have to catch up to all the Hive exception 
types to retry: https://github.com/apache/iceberg/pull/2844 for some missing 
exceptions 
   * Handles UGI impersonation logic for reconnect, which is missing 
HiveClientPool (needed in Kerberized environments)
   * ClientPoolImpl does not support any configuration of retry and 
retry-backoff. It hard-codes to 1 retry and no backoff, (the default in 
RetryingMetaStoreClient is 1s backoff for instance)
   * Re-using RetryingMetaStoreClient can unify all hive configs for the 
execution engine, instead of having the configs per catalog. For instance in 
Spark, setting 'spark.hadoop.hive.metastore.client.connect.retry.delay' will 
set it for all Hive connections (for Iceberg and non-Iceberg tables)
   
   
   Implementation details: adding RetryingMetaStoreClient will make redundant 
(and harmful) the ClientPoolImpl retry, so disable it for this case, but not 
remove it to preserve for other client pools.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to