[ 
https://issues.apache.org/jira/browse/HBASE-25447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255772#comment-17255772
 ] 

Bo Cui commented on HBASE-25447:
--------------------------------

1. the timeoutThread rarely encounter exceptions. and if timeoutThread throws 
exception, the node of master may have some serious problems, for example, 
resource leakage, stop master better than timeoutThread retry...
2. In the production env, we have two masters, one active and one standby, and 
standby might be fine, and HBase can be recovered quickly...

so i think , abort master better than timoutThread retry..

> remoteProc is suspended due to OOM ERROR
> ----------------------------------------
>
>                 Key: HBASE-25447
>                 URL: https://issues.apache.org/jira/browse/HBASE-25447
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2
>    Affects Versions: 3.0.0-alpha-1, 2.2.3
>            Reporter: Bo Cui
>            Assignee: Bo Cui
>            Priority: Major
>         Attachments: image-2020-12-26-11-49-38-018.png
>
>
> https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317
> If resource leakage occurs due to other components or reasons, 
> BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the 
> while (running.get()), and some procs will stuck...
>  !image-2020-12-26-11-49-38-018.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to