Entry log file is overwritten when fail to read lastLogId.
----------------------------------------------------------
Key: BOOKKEEPER-182
URL: https://issues.apache.org/jira/browse/BOOKKEEPER-182
Project: Bookkeeper
Issue Type: Bug
Reporter: Sijie Guo
Assignee: Sijie Guo
we found data corruption happened on entry log files.
2012-03-06 07:26:14,947 - ERROR [NIOServerFactory-3181:BookieServer@413] -
Error reading 229@114724
java.io.IOException: problem found in 0@229 at position + 89030194 entry
belongs to 6373236044838956613 not 114724
at
org.apache.bookkeeper.bookie.EntryLogger.readEntry(EntryLogger.java:347)
at
org.apache.bookkeeper.bookie.LedgerDescriptor.readEntry(LedgerDescriptor.java:180)
at org.apache.bookkeeper.bookie.Bookie.readEntry(Bookie.java:1081)
at
org.apache.bookkeeper.proto.BookieServer.processPacket(BookieServer.java:386)
at
org.apache.bookkeeper.proto.NIOServerFactory$Cnxn.readRequest(NIOServerFactory.java:315)
at
org.apache.bookkeeper.proto.NIOServerFactory$Cnxn.doIO(NIOServerFactory.java:213)
at
org.apache.bookkeeper.proto.NIOServerFactory.run(NIOServerFactory.java:124
then we did some investigation on failed ledger:
first looked into ledger 114724's index file.
{code}
entry 75 : (log:11, pos: 100526580)
entry 76 : (log:11, pos: 101849530)
entry 77 : (log:11, pos: 103176596)
entry 78 : (log:11, pos: 104403977)
entry 79 : (log:11, pos: 105756017)
entry 80 : (log:11, pos: 106740803)
entry 81 : (log:0, pos: 73365)
entry 82 : (log:0, pos: 1366625)
entry 83 : (log:0, pos: 2719276)
entry 84 : (log:0, pos: 4065142)
{code}
from entry 80, the data is written in 0 entry log which is less than 11. (means
data is written to an older entry log file)
then we looked into ledger directory as below
{code}
2147483550 Mar 5 11:30 /var/bookkeeper/ledger/0.log
94122988 Mar 5 11:33 /var/bookkeeper/ledger/1.log
1984247565 Mar 5 11:34 /var/bookkeeper/ledger/2.log
288376 Mar 5 11:34 /var/bookkeeper/ledger/3.log
747151813 Mar 6 03:17 /var/bookkeeper/ledger/4.log
410381287 Mar 6 07:43 /var/bookkeeper/ledger/5.log
2147483363 Feb 27 19:59 /var/bookkeeper/ledger/7.log
2147483565 Feb 29 09:40 /var/bookkeeper/ledger/9.log
1691783168 Mar 1 03:22 /var/bookkeeper/ledger/a.log
125556720 Mar 1 08:30 /var/bookkeeper/ledger/b.log
0 Mar 1 08:33 /var/bookkeeper/ledger/c.log
{code}
the 0-5 entry log files are overwritten.
looked into the code, found that when bookie server failed to read lastLogId,
it would set the lastLogId to -1. then start writing entry log files from 0.
and also there is not checking about the existen of the entry log file.
it would better to scan the directories to found the biggest log id and start
from it. and check whether the file exists or not when creating a new entry log
file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira