[jira] [Comment Edited] (HBASE-19834) Signalling server-hosted-clients to abort retries

stack (JIRA) Mon, 22 Jan 2018 15:21:11 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-19834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335142#comment-16335142
 ]


stack edited comment on HBASE-19834 at 1/22/18 11:20 PM:
---------------------------------------------------------

{quote}One difficulty is that the master main thread can get hung-up by the 
client retries 
{quote}
In the bulk of the cases, we should be ok. The incoming handler is a distinct 
thread apart from the Server main thread and it can do the kill; see patch on 
HBASE-19838 "Can not shutdown backup master cleanly when it has already tried 
to become the active master" The way it works is that all hosted clients share 
the Server Connection; the incoming shutdown handler calls a close on the 
Connection. That should be enough to shutdown ongoing client RPCs out of the 
Server.


was (Author: stack):
{quote}One difficulty is that the master main thread can get hung-up by the 
client retries 
{quote}
In the bulk of the cases, we should be ok. The incoming handler is a distinct 
thread apart from the Server main thread and it can do the kill; see patch on 
HBASE-19838 "Can not shutdown backup master cleanly when it has already tried 
to become the active master"

> Signalling server-hosted-clients to abort retries
> -------------------------------------------------
>
>                 Key: HBASE-19834
>                 URL: https://issues.apache.org/jira/browse/HBASE-19834
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: stack
>            Priority: Major
>             Fix For: 2.0.0-beta-2
>
>
> A few recent flakey tests have been variations on the server-hosted-client 
> retrying against a server or region that is never going to show up – usually 
> because cluster is being shutdown. One example is client stuck, retrying to 
> update hbase:meta with change in region or table state but hbase:meta is 
> down. Another is HBASE-19794 where the test hangs because backup Master is 
> trying to become active and as part of the startup, it is trying to read 
> table state from hbase:meta but hbase:meta is not available; it has been put 
> down as part of the cluster shutdown.
> One difficulty is that the master main thread can get hung-up by the client 
> retries (in some cases the client retries are in-lined with the main thread 
> so it is 'blocked'); it is no longer available to receive cluster shutdown or 
> other event types (e.g. see HBASE-19794). Some of our startup needs to be 
> refactored moved into our run method rather than done as some big 
> single-threaded startup as happens now in Master. We need this also for the 
> HBASE-19831 work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HBASE-19834) Signalling server-hosted-clients to abort retries

Reply via email to