szehon-ho opened a new pull request #3099: URL: https://github.com/apache/iceberg/pull/3099
The options are: implement it in ClientPoolImpl, or use RetryingMetaStoreClient. From the initial discussion, leaning towards option 2. Some justifications below : * RetryingMetaStoreClient is used today in Hive and Spark, and is more battle-tested. HiveClientPool will have to catch up to all the Hive exception types to retry: https://github.com/apache/iceberg/pull/2844 for some missing exceptions * Handles UGI impersonation logic for reconnect, which is missing HiveClientPool (needed in Kerberized environments) * ClientPoolImpl does not support any configuration of retry and retry-backoff. It hard-codes to 1 retry and no backoff, (the default in RetryingMetaStoreClient is 1s backoff for instance) * Re-using RetryingMetaStoreClient can unify all hive configs for the execution engine, instead of having the configs per catalog. For instance in Spark, setting 'spark.hadoop.hive.metastore.client.connect.retry.delay' will set it for all Hive connections (for Iceberg and non-Iceberg tables) Implementation details: adding RetryingMetaStoreClient will make redundant (and harmful) the ClientPoolImpl retry, so disable it for this case, but not remove it to preserve for other client pools. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
