[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683233#comment-13683233
 ] 

Vinay commented on BOOKKEEPER-625:
----------------------------------

Yeah Agree to both of you.. we cannot handle the OOME for all threads..

The Actual intention I rised the Jira is, 
Since the NIOServerFactory is the basic thread which should be running to keep 
the BookieServer process running, we should avoid situations where this thread 
is not present and BookieServer process is still running.
Currently all exceptions are catched inside {{NIOServerFactory.run()}}, but 
still thread can exit due to errors such as OOME.

In one of our cluster, OOME came in NIOServerFactory thread and that thread 
exited. But user didnt know the problem because process was still running.

bq. It doesn't sound right to add code to deal with out of memory exceptions 
because it is really an anomalous situation.
Yes I agree. But one simple call to check the thread availability inside 
{{NIOServerFactory.isRunning()}} will solve the problem. As follows
{code:java}    public boolean isRunning() {
        return !ss.socket().isClosed() && isAlive();
    }{code}

Similar check we can add for other critical threads, without which keeping the 
process alive is useless.


                
> On OutOfMemoryError in NIOServerFactory thread bookie should shutdown
> ---------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-625
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-625
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 4.2.1, 4.3.0
>            Reporter: Vinay
>            Assignee: Vinay
>             Fix For: 4.2.2, 4.3.0
>
>
> Observed OutOfMemoryError in NIOServerFactory, but it didnt bring down the 
> bookie and it continued to run without serving. 
> On OOME in any thread, bookie should shutdown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to