pvary commented on a change in pull request #2119:
URL: https://github.com/apache/iceberg/pull/2119#discussion_r561690101
##########
File path:
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveClientPool.java
##########
@@ -83,6 +83,20 @@ protected HiveMetaStoreClient reconnect(HiveMetaStoreClient
client) {
return client;
}
+ @Override
+ protected boolean failureDetection(HiveMetaStoreClient client, Exception ex)
{
Review comment:
MetaException does not have an embedded Exception (no `ex.getCause()`
because the MetaException is declared on the Thrift interface and only has a
message field) 😢. Also it is used for signaling connection problems and input
data issues as well, and the only difference between those is the exception
message text. So we can only check the message text to decide on the cause
which is too brittle for my taste.
So I would prefer a solution where we do not try to differentiate between
the MetaExceptions parsing the text message.
Since we can not have a definite answer when the error is caused by a
connection problem or a bad parametrization of the HMS call, I think
reconnection every time when we have a MetaException should be avoided if
possible. I would use a cheap probe to check the liveliness of the connection
(for example `client.getServerVersion()`)
I see the following possibilities:
- Periodic check for the connection - Fix number of extra HMS calls, no
extra delay on other HMS calls
- Check when there is a MetaException - Double HMS calls for every call
where the response is MetaException
- Some combination of the above
The decision should depend on the use-cases we would like to cover.
- Who will use the `ClientPool`?
- How likely that we issue calls which are badly parametrized?
- How likely that we will have connection issues?
To answer the questions from the Hive side:
- I do not see places where we can have MetaException because of the wrong
parametrization of the calls (if so we have to fix it 😄)
- If there is a MetaException then that is because of the connection issues
OTOH I consider ClientPool as an API which could/should be used by other
components, and such an API should handle errors better. I would accept the
extra complexity to avoid extra `reconnect` attempts, and even maybe extra HMS
calls too, but can be convinced otherwise.
That's my 2 cents.
Peter
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]