[
https://issues.apache.org/jira/browse/BOOKKEEPER-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13627125#comment-13627125
]
Sijie Guo edited comment on BOOKKEEPER-572 at 4/10/13 12:37 AM:
----------------------------------------------------------------
when I raised the reverse add sequence idea in BOOKKEEPER-447, a problem
bothered me for a period: what happened if we add a buggy record successfully
in journal, but fail to apply it into ledger storage? it looks like the buggy
record would alive in journal forever, since journal replaying would end up
throwing exception when applying this buggy record. it is bad if this happened.
And I think this problem is still not addressed in current patch. I am not sure
is it a big problem for now or even in future.
this issue is kind of partial-update, it would happen in all the
update-in-place systems either database or filesystem. in current bookie
journal, we don't have enough information to undo the operation when recovery
to bring storage back to a consistent state. that is why we discussed a lot for
BOOKKEEPER-447.
Another way to address this issue is COW(copy-on-write), which is that ZFS and
Btrfs used to address inconsistency problem in metadata & data. since updates
would not be applied in-place, it is easy to keep consistency by rolling back
to the state in previous checkpoint. I was starting some hack works on
implementing a log-structured ledger index, which 1) make index page written in
more sequential way 2) avoid updating index page in-place, which might also
addressed the issue in BOOKKEEPER-447 more gracefully.
was (Author: hustlmsp):
when I raised the reverse add sequence idea in BOOKKEEPER-447, a problem
bothered me for a period: what happened if we add a buggy record successfully
in journal, but fail to apply it into ledger storage? it looks like the buggy
record would alive in journal forever, since journal replaying would end up
throwing exception when applying this buggy record. it is bad if this happened.
And I think this problem is still not addressed in current patch. I am not sure
is it a big problem for now or even in future.
this issue is kind of partial-update, it would happen in all the
update-in-place systems either database or filesystem. usually there is redo
log in database, which would bring partial-update pages back to a consistent
state when recovery. but the journal in bookie server is not a redo log, so it
is difficult to handle this case.
Another way to address this issue is COW(copy-on-write), which is that ZFS and
Btrfs used to address inconsistency problem in metadata & data. since updates
would not be applied in-place, it is easy to keep consistency by rolling back
to the state in previous checkpoint. I was starting some hack works on
implementing a log-structured ledger index, which 1) make index page written in
more sequential way 2) avoid updating index page in-place, which might also
addressed the issue in BOOKKEEPER-447 more gracefully.
> Make the journal a write ahead log
> ----------------------------------
>
> Key: BOOKKEEPER-572
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-572
> Project: Bookkeeper
> Issue Type: Bug
> Reporter: Ivan Kelly
> Assignee: Ivan Kelly
> Fix For: 4.3.0
>
> Attachments:
> 0001-BOOKKEEPER-572-Write-to-the-journal-before-writing-t.patch,
> 0001-BOOKKEEPER-572-Write-to-the-journal-before-writing-t.patch,
> 0001-BOOKKEEPER-572-Write-to-the-journal-before-writing-t.patch,
> 0001-BOOKKEEPER-572-Write-to-the-journal-before-writing-t.patch,
> 0003-BOOKKEEPER-572-Write-to-the-journal-before-writing-t.patch,
> 0003-BOOKKEEPER-572-Write-to-the-journal-before-writing-t.patch,
> BookieServer-2013-02-22.snapshot
>
>
> A bookie adds to the LedgerStorage before writing to the journal. This is the
> fundamental problem behind BOOKKEEPER-447 and blocks a nice solution to
> BOOKKEEPER-530. By writing to the memory state before the journal, we exposed
> ourselves to bugs if the bookie crashed before we wrote to the journal. The
> entry may exist in index, but not in the entrylog, a situation which cannot
> be distinguished from an I/O error. The comments on BOOKKEEPER-447 goes into
> more details.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira