[ 
https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289961#comment-14289961
 ] 

Sushanth Sowmyan commented on HIVE-9436:
----------------------------------------

[~thejas]/[~hsubramaniyan] : I have a couple of thoughts about moving 
JDOException retries solely to the metastore:

a) Firstly, we have had cases so far where a JDOException invalidates the 
connection on the metastore side, and retrying from the metastore has not 
helped. Retrying from the client-side, though, causes a fresh openTransaction() 
that clears the connection and all history, sometimes by hitting a different 
HMSHandler, and this causes the retry from client to be more successful than a 
retry from server. Admittedly, this is more likely because we need to clean up 
our metastore code to make sure that the retry from the metastore-side handles 
this properly, and thus, is something we should attempt to improve.
b) Second, from a perspective of a loaded metastore, having a metastore thread 
do retries, thus using up valuable metastore resources/time is more wasteful 
than having the client do retries. We thus tend to keep our metastore-side 
retries to a low amount, but the fact that we have client-side retries as well 
gives us an ability to be fail-fast on the metastore, but retry a large number 
of times in particular clients if we find the need to do so. Particularly, in 
HA configurations, I've seen a large number of retries and longer 
retry-intervals on the client side that allow a connection to go through 
despite metastore HUPs.
c) Thirdly, speaking of HA, retrying on the client-side allows us to hit 
alternate metastores as well, if configured, if we have scenarios where one 
metastore is getting bogged down. As you mention, client should ideally only be 
retrying connection exceptions, but JDOExceptions are frequently the result of 
connection exceptions raised by the connection pool from the metastore to the 
db.

There is definitely scope for refactoring and improvement in all this, I will 
look into it further, but for now, this is a simpler bugfix to enable the 
already-existing regex to work correctly.

> RetryingMetaStoreClient does not retry JDOExceptions
> ----------------------------------------------------
>
>                 Key: HIVE-9436
>                 URL: https://issues.apache.org/jira/browse/HIVE-9436
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0, 0.13.1
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>         Attachments: HIVE-9436.2.patch, HIVE-9436.patch
>
>
> RetryingMetaStoreClient has a bug in the following bit of code:
> {code}
>         } else if ((e.getCause() instanceof MetaException) &&
>             e.getCause().getMessage().matches("JDO[a-zA-Z]*Exception")) {
>           caughtException = (MetaException) e.getCause();
>         } else {
>           throw e.getCause();
>         }
> {code}
> The bug here is that java String.matches matches the entire string to the 
> regex, and thus, that match will fail if the message contains anything before 
> or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we 
> should match .\*JDO[a-zA-Z]\*Exception.\*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to