[ https://issues.apache.org/jira/browse/HIVE-18078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293044#comment-16293044 ]
Sergey Shelukhin commented on HIVE-18078: ----------------------------------------- Looks like it's possible for cluster fraction checks to fail in the unit test because the session is returned to the user before the queueing of the even that the test waits for so it's possible for the user to wait for the cycle that is before the init is processed. Not related to this patch, I will update it here anyway. > WM getSession needs some retry logic > ------------------------------------ > > Key: HIVE-18078 > URL: https://issues.apache.org/jira/browse/HIVE-18078 > Project: Hive > Issue Type: Sub-task > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > Attachments: HIVE-18078.01.patch, HIVE-18078.01.patch, > HIVE-18078.04.patch, HIVE-18078.05.patch, HIVE-18078.05.patch, > HIVE-18078.only.patch, HIVE-18078.patch > > > When we get a bad session (e.g. no registry info because AM has gone > catatonic), the failure by the timeout future fails the getSession call. > The retry model in TezTask is that it would get a session (which in original > model can be completely unusable, but we still get the object), and then > retry (reopen) if it's a lemon. If the reopen fails, we fail. > getSession is not covered by this retry scheme, and should thus do its own > retries (or the retry logic needs to be changed) -- This message was sent by Atlassian JIRA (v6.4.14#64029)