[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13627125#comment-13627125
 ] 

Sijie Guo edited comment on BOOKKEEPER-572 at 4/10/13 12:37 AM:
----------------------------------------------------------------

when I raised the reverse add sequence idea in BOOKKEEPER-447, a problem 
bothered me for a period: what happened if we add a buggy record successfully 
in journal, but fail to apply it into ledger storage? it looks like the buggy 
record would alive in journal forever, since journal replaying would end up 
throwing exception when applying this buggy record. it is bad if this happened. 
And I think this problem is still not addressed in current patch. I am not sure 
is it a big problem for now or even in future. 

this issue is kind of partial-update, it would happen in all the 
update-in-place systems either database or filesystem. in current bookie 
journal, we don't have enough information to undo the operation when recovery 
to bring storage back to a consistent state. that is why we discussed a lot for 
BOOKKEEPER-447.

Another way to address this issue is COW(copy-on-write), which is that ZFS and 
Btrfs used to address inconsistency problem in metadata & data. since updates 
would not be applied in-place, it is easy to keep consistency by rolling back 
to the state in previous checkpoint.  I was starting some hack works on 
implementing a log-structured ledger index, which 1) make index page written in 
more sequential way 2) avoid updating index page in-place, which might also 
addressed the issue in BOOKKEEPER-447 more gracefully.


                
      was (Author: hustlmsp):
    when I raised the reverse add sequence idea in BOOKKEEPER-447, a problem 
bothered me for a period: what happened if we add a buggy record successfully 
in journal, but fail to apply it into ledger storage? it looks like the buggy 
record would alive in journal forever, since journal replaying would end up 
throwing exception when applying this buggy record. it is bad if this happened. 
And I think this problem is still not addressed in current patch. I am not sure 
is it a big problem for now or even in future. 

this issue is kind of partial-update, it would happen in all the 
update-in-place systems either database or filesystem. usually there is redo 
log in database, which would bring partial-update pages back to a consistent 
state when recovery. but the journal in bookie server is not a redo log, so it 
is difficult to handle this case.

Another way to address this issue is COW(copy-on-write), which is that ZFS and 
Btrfs used to address inconsistency problem in metadata & data. since updates 
would not be applied in-place, it is easy to keep consistency by rolling back 
to the state in previous checkpoint.  I was starting some hack works on 
implementing a log-structured ledger index, which 1) make index page written in 
more sequential way 2) avoid updating index page in-place, which might also 
addressed the issue in BOOKKEEPER-447 more gracefully.


                  
> Make the journal a write ahead log
> ----------------------------------
>
>                 Key: BOOKKEEPER-572
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-572
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: 4.3.0
>
>         Attachments: 
> 0001-BOOKKEEPER-572-Write-to-the-journal-before-writing-t.patch, 
> 0001-BOOKKEEPER-572-Write-to-the-journal-before-writing-t.patch, 
> 0001-BOOKKEEPER-572-Write-to-the-journal-before-writing-t.patch, 
> 0001-BOOKKEEPER-572-Write-to-the-journal-before-writing-t.patch, 
> 0003-BOOKKEEPER-572-Write-to-the-journal-before-writing-t.patch, 
> 0003-BOOKKEEPER-572-Write-to-the-journal-before-writing-t.patch, 
> BookieServer-2013-02-22.snapshot
>
>
> A bookie adds to the LedgerStorage before writing to the journal. This is the 
> fundamental problem behind BOOKKEEPER-447 and blocks a nice solution to 
> BOOKKEEPER-530. By writing to the memory state before the journal, we exposed 
> ourselves to bugs if the bookie crashed before we wrote to the journal. The 
> entry may exist in index, but not in the entrylog, a situation which cannot 
> be distinguished from an I/O error. The comments on BOOKKEEPER-447 goes into 
> more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to