[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537656#comment-13537656
 ] 

Sijie Guo commented on BOOKKEEPER-524:
--------------------------------------

Thanks Matteo. I think the NPE might be caused by a race condition between 
flushLedger and removeLedger.

when flushLedger, it first get the list of first entry, then flush ledger pages 
according to the first entry list. if removeLedger happened between them, 
removeLedger would remove ledger pages for that ledger from mapping, it cause 
NPE during flush.

I need to check the flush code to ensure there is no other NPE happened. 
besides that, it would be better to catch the throwable in SyncThread, when 
SyncThread quits, either turn it into readonly or shutdown. otherwise, it 
silence the exception until something bad happened (e.g journal disk is full. 
at this case, a bookie might take a long time to restart replaying its journal).


                
> Bookie journal filesystem gets full after SyncThread is terminated with 
> exception
> ---------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-524
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-524
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 4.2.0
>            Reporter: Matteo Merli
>            Priority: Blocker
>             Fix For: 4.2.0
>
>         Attachments: 
> 0001-BOOKKEEPER-524-Bookie-journal-filesystem-gets-full-a.patch
>
>
> The SyncThread get a NPE while the rest of the bookie is still running. This 
> causes the journal gc to be stopped and the filesystem get full.
> Tue Dec 18 17:01:18 2012: Exception in thread "SyncThread" 
> java.lang.NullPointerException
> Tue Dec 18 17:01:18 2012:       at 
> org.apache.bookkeeper.bookie.LedgerCacheImpl.getLedgerEntryPage(LedgerCacheImpl.java:153)
> Tue Dec 18 17:01:18 2012:       at 
> org.apache.bookkeeper.bookie.LedgerCacheImpl.flushLedger(LedgerCacheImpl.java:421)
> Tue Dec 18 17:01:18 2012:       at 
> org.apache.bookkeeper.bookie.LedgerCacheImpl.flushLedger(LedgerCacheImpl.java:363)
> Tue Dec 18 17:01:18 2012:       at 
> org.apache.bookkeeper.bookie.InterleavedLedgerStorage.flush(InterleavedLedgerStorage.java:148)
> Tue Dec 18 17:01:18 2012:       at 
> org.apache.bookkeeper.bookie.Bookie$SyncThread.run(Bookie.java:221)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to