[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13778415#comment-13778415
 ] 

Rakesh R commented on BOOKKEEPER-685:
-------------------------------------

Thanks [~hustlmsp] for the detailed explanation.

bq.could you clarify it when you are saying it leads to trouble?

Could you please see the following execution sequence:

Th1 - compaction thread
Th2 - SyncThread

1) Th1: addEntry and sets flushed.set(false); // Consider that added entry is 
the 'last entry' of the last ledger participated in compaction. After this, 
compaction would move to flush.
2) Th2: onRotateEntryLog and sets flushed.set(true);
3) Th1: scannerFactory.flush(); // since it sees flushed==true, it will iterate 
over the offsets and flush out
4) Th1: removeEntryLog
5) server crashed

In the above sequence, I could see a possible loss of 'last entry' which is not 
flushed into the entry logger. Any thoughts?

CompactionScannerFactory.java
{code}
        try {
            // compaction finished, flush any outstanding offsets
            scannerFactory.flush();
        } catch (IOException ioe) {
            LOG.error("Cannot flush compacted entries, skip removal", ioe);
            return;
        }

        // offsets have been flushed, its now safe to remove the old entrylogs
        for (Long l : toRemove) {
            removeEntryLog(l);
        }
{code}
                
> Race in compaction algorithm from BOOKKEEPER-664
> ------------------------------------------------
>
>                 Key: BOOKKEEPER-685
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-685
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Ivan Kelly
>            Priority: Blocker
>             Fix For: 4.2.2
>
>
> I discovered a race in the algorithm when I was forward porting to trunk.
> 1) Thread1: flushed.set(false)
> 2) Thread2: onRotateEntryLog() // flushed.set(true)
> 3) Thread1: entryLogger addEntry L123-E456
> 4) Thread1: offsets > max, waits for flushed, flushed is true(as set in 2), 
> L123-E456 updated in ledger cache
> 5) T2: L123 flushed out of ledger cache
> 6) Crash
> This will possible lose 1 entry. I've only reasoned this, not observed it, 
> but it can happen.
> The fix is pretty easy. EntryLoggerListener should notify with the point 
> offset in the entry log it has synced as far as. 
>       

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to