[
https://issues.apache.org/jira/browse/BOOKKEEPER-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13778359#comment-13778359
]
Sijie Guo edited comment on BOOKKEEPER-685 at 9/26/13 2:00 AM:
---------------------------------------------------------------
{quote}
as per my previous comments, I was thinking, just interchanging of #addEntry
and flushed.set(false) sequence won't actually solve the issue. It can still
enter into the execution flow of flush the ledger cache entries which are not
yet flushed to the entry logger.
{quote}
I don't mean to discourage discussion. but when you are saying "it leads
trouble", could you explain what kind of steps that would cause this issue and
what is the effect? otherwise, it is unclear for people about what you are
saying.
{quote}
BOOKKEEPER-447
{quote}
first of all, I shouldn't mention BOOKKEEPER-447 in this thread, since it is
totally a different topic. but as you were raising an unclear question, I have
to raise it as a background reference for all the corner cases I know.
in order to make it clear for anyone who wants to join this discussion, please
make sure you understand the background first.
1) this jira is about the data loss during compaction, which is moving entries
from one entry logger to the other entry logger. Those entries are not
protected by journal, so GCThread needs to do compaction in following sequence
to guarantee no data loss: a) adding entries to entry logger; b) add index
entries to ledger cache; c) remove old entry log file only after the data added
by a) & b) is flushed. And setFlushed after addEntry is to guarantee this
sequence. (if you don't think it works, please provide the sequence to convince
me.)
2) the normal add entries are already protected by journal, so there is no data
loss in that pipeline.
3) the case in BOOKKEEPER-447 is an irrelative issue to 1) & 2), which an index
entry is flushed before its journal entry flushed. It is an issue of invalid
adds not data loss, which happens in normal add flow (e.g. a ledger is evicting
from ledger cache and forced to write back to disk).
was (Author: hustlmsp):
{quote}
as per my previous comments, I was thinking, just interchanging of #addEntry
and flushed.set(false) sequence won't actually solve the issue. It can still
enter into the execution flow of flush the ledger cache entries which are not
yet flushed to the entry logger.
{quote}
I don't mean to discourage discussion. but when you are saying "it leads
trouble", could you explain what kind of steps that would cause this issue and
what is the effect? otherwise, it is unclear for people about what you are
saying.
{quote}
BOOKKEEPER-447
{quote}
first of all, I shouldn't mention BOOKKEEPER-447 in this thread, since it is
totally a different topic. but as you were raising an unclear question, I have
to raise it as a background reference for all the corner cases I know.
in order to make it clear for anyone who wants to join this discussion, please
make sure you understand the background first.
1) this jira is about the data loss during compaction, which is moving entries
from one entry logger to the other entry logger. Those entries are not
protected by journal, so GCThread needs to do compaction in following sequence
to guarantee no data loss: a) adding entries to entry logger; b) add index
entries to ledger cache; c) remove old entry log file only after the data added
by a) & b) is flushed. And setFlushed after addEntry is to guarantee this
sequence. (if you don't think it works, please provide the sequence to confirm
it.)
2) the normal add entries are already protected by journal, so there is no data
loss in that pipeline.
3) the case in BOOKKEEPER-447 is an irrelative issue to 1) & 2), which an index
entry is flushed before its journal entry flushed. It is an issue of invalid
adds not data loss, which happens in normal add flow (e.g. a ledger is evicting
from ledger cache and forced to write back to disk).
> Race in compaction algorithm from BOOKKEEPER-664
> ------------------------------------------------
>
> Key: BOOKKEEPER-685
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-685
> Project: Bookkeeper
> Issue Type: Bug
> Reporter: Ivan Kelly
> Priority: Blocker
> Fix For: 4.2.2
>
>
> I discovered a race in the algorithm when I was forward porting to trunk.
> 1) Thread1: flushed.set(false)
> 2) Thread2: onRotateEntryLog() // flushed.set(true)
> 3) Thread1: entryLogger addEntry L123-E456
> 4) Thread1: offsets > max, waits for flushed, flushed is true(as set in 2),
> L123-E456 updated in ledger cache
> 5) T2: L123 flushed out of ledger cache
> 6) Crash
> This will possible lose 1 entry. I've only reasoned this, not observed it,
> but it can happen.
> The fix is pretty easy. EntryLoggerListener should notify with the point
> offset in the entry log it has synced as far as.
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira