[ 
https://issues.apache.org/jira/browse/HBASE-25447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255841#comment-17255841
 ] 

Pankaj Kumar commented on HBASE-25447:
--------------------------------------

Discussed with [~Bo Cui] offline, reigon was stuck in RIT until there is HM 
failover; since remoteProc is suspended due to OOM (unable to create new native 
thread) while dispatching the proc.

Other chore services like CJ, QuotaObserverChore etc also failed with OOM. 
However it is env problem, It's better to abort HMaster so that healthy standby 
master will manage the HBase cluster operation after becoming active.

> remoteProc is suspended due to OOM ERROR
> ----------------------------------------
>
>                 Key: HBASE-25447
>                 URL: https://issues.apache.org/jira/browse/HBASE-25447
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2
>    Affects Versions: 3.0.0-alpha-1, 2.2.3
>            Reporter: Bo Cui
>            Assignee: Bo Cui
>            Priority: Major
>         Attachments: image-2020-12-26-11-49-38-018.png
>
>
> https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317
> If resource leakage occurs due to other components or reasons, 
> BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the 
> while (running.get()), and some procs will stuck...
>  !image-2020-12-26-11-49-38-018.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to