[ 
https://issues.apache.org/jira/browse/HIVE-18078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293044#comment-16293044
 ] 

Sergey Shelukhin commented on HIVE-18078:
-----------------------------------------

Looks like it's possible for cluster fraction checks to fail in the unit test 
because the session is returned to the user before the queueing of the even 
that the test waits for so it's possible for the user to wait for the cycle 
that is before the init is processed. Not related to this patch, I will update 
it here anyway.

> WM getSession needs some retry logic
> ------------------------------------
>
>                 Key: HIVE-18078
>                 URL: https://issues.apache.org/jira/browse/HIVE-18078
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-18078.01.patch, HIVE-18078.01.patch, 
> HIVE-18078.04.patch, HIVE-18078.05.patch, HIVE-18078.05.patch, 
> HIVE-18078.only.patch, HIVE-18078.patch
>
>
> When we get a bad session (e.g. no registry info because AM has gone 
> catatonic), the failure by the timeout future fails the getSession call.
> The retry model in TezTask is that it would get a session (which in original 
> model can be completely unusable, but we still get the object), and then 
> retry (reopen) if it's a lemon. If the reopen fails, we fail.
> getSession is not covered by this retry scheme, and should thus do its own 
> retries (or the retry logic needs to be changed)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to