[jira] [Comment Edited] (BOOKKEEPER-685) Race in compaction algorithm from BOOKKEEPER-664

Sijie Guo (JIRA) Wed, 25 Sep 2013 19:01:13 -0700

    [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13778359#comment-13778359
 ]


Sijie Guo edited comment on BOOKKEEPER-685 at 9/26/13 2:00 AM:
---------------------------------------------------------------

{quote}
as per my previous comments, I was thinking, just interchanging of #addEntry 
and flushed.set(false) sequence won't actually solve the issue. It can still 
enter into the execution flow of flush the ledger cache entries which are not 
yet flushed to the entry logger.
{quote}

I don't mean to discourage discussion. but when you are saying "it leads 
trouble", could you explain what kind of steps that would cause this issue and 
what is the effect? otherwise, it is unclear for people about what you are 
saying.

{quote}
BOOKKEEPER-447
{quote}

first of all, I shouldn't mention BOOKKEEPER-447 in this thread, since it is 
totally a different topic. but as you were raising an unclear question, I have 
to raise it as a background reference for all the corner cases I know.

in order to make it clear for anyone who wants to join this discussion, please 
make sure you understand the background first.

1) this jira is about the data loss during compaction, which is moving entries 
from one entry logger to the other entry logger. Those entries are not 
protected by journal, so GCThread needs to do compaction in following sequence 
to guarantee no data loss: a) adding entries to entry logger; b) add index 
entries to ledger cache; c) remove old entry log file only after the data added 
by a) & b) is flushed. And setFlushed after addEntry is to guarantee this 
sequence. (if you don't think it works, please provide the sequence to convince 
me.)

2) the normal add entries are already protected by journal, so there is no data 
loss in that pipeline. 

3) the case in BOOKKEEPER-447 is an irrelative issue to 1) & 2), which an index 
entry is flushed before its journal entry flushed. It is an issue of invalid 
adds not data loss, which happens in normal add flow (e.g. a ledger is evicting 
from ledger cache and forced to write back to disk).


                
      was (Author: hustlmsp):
    {quote}
as per my previous comments, I was thinking, just interchanging of #addEntry 
and flushed.set(false) sequence won't actually solve the issue. It can still 
enter into the execution flow of flush the ledger cache entries which are not 
yet flushed to the entry logger.
{quote}

I don't mean to discourage discussion. but when you are saying "it leads 
trouble", could you explain what kind of steps that would cause this issue and 
what is the effect? otherwise, it is unclear for people about what you are 
saying.

{quote}
BOOKKEEPER-447
{quote}

first of all, I shouldn't mention BOOKKEEPER-447 in this thread, since it is 
totally a different topic. but as you were raising an unclear question, I have 
to raise it as a background reference for all the corner cases I know.

in order to make it clear for anyone who wants to join this discussion, please 
make sure you understand the background first.

1) this jira is about the data loss during compaction, which is moving entries 
from one entry logger to the other entry logger. Those entries are not 
protected by journal, so GCThread needs to do compaction in following sequence 
to guarantee no data loss: a) adding entries to entry logger; b) add index 
entries to ledger cache; c) remove old entry log file only after the data added 
by a) & b) is flushed. And setFlushed after addEntry is to guarantee this 
sequence. (if you don't think it works, please provide the sequence to confirm 
it.)

2) the normal add entries are already protected by journal, so there is no data 
loss in that pipeline. 

3) the case in BOOKKEEPER-447 is an irrelative issue to 1) & 2), which an index 
entry is flushed before its journal entry flushed. It is an issue of invalid 
adds not data loss, which happens in normal add flow (e.g. a ledger is evicting 
from ledger cache and forced to write back to disk).


                  
> Race in compaction algorithm from BOOKKEEPER-664
> ------------------------------------------------
>
>                 Key: BOOKKEEPER-685
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-685
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Ivan Kelly
>            Priority: Blocker
>             Fix For: 4.2.2
>
>
> I discovered a race in the algorithm when I was forward porting to trunk.
> 1) Thread1: flushed.set(false)
> 2) Thread2: onRotateEntryLog() // flushed.set(true)
> 3) Thread1: entryLogger addEntry L123-E456
> 4) Thread1: offsets > max, waits for flushed, flushed is true(as set in 2), 
> L123-E456 updated in ledger cache
> 5) T2: L123 flushed out of ledger cache
> 6) Crash
> This will possible lose 1 entry. I've only reasoned this, not observed it, 
> but it can happen.
> The fix is pretty easy. EntryLoggerListener should notify with the point 
> offset in the entry log it has synced as far as. 
>       

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (BOOKKEEPER-685) Race in compaction algorithm from BOOKKEEPER-664

Reply via email to