Hello Mike and Robert,
I am using the stable version of Lucene(i.e. 3.6) and what is actually
going on is that the checksum (i.e. a long) is written as 8 bytes: the
first 4 are 0, then the mismatched checksum value(i.e. checksum-1) is
written in the next 4(reference:
ChecksumIndexOutput.prepareCommit()).When finishCommit() happens the
correct checksum is written to the buffer and then on close it's flushed
to the directory.
A comment states that this is done for better testing. I've followed the
code with the debugger and printed out the bytes in the logger and I can
say that seeking back and overwriting are done as they should be.
You can run the test as 'mvn test
-Dtest=org.apache.james.mailbox.lucene.hbase.IndexingTest' but there
will be a lot of byte printing.
I am now looking at the AppendingCodec in version 4, and see if I can
better use that implementation.
Thank you,
Mihai
On 26.06.2012 13:30, Michael McCandless wrote:
Hmm, the checksum is there to ensure all bits were persisted properly.
But one trickiness is we first write 4 0 bytes, then seek back and
write the checksum over those 4 bytes. Could it be that the HBase
IndexOutput impl can't handle seeking back and overwriting?
If so, you should have a look at AppendingCodec, which fixes the
places in Lucene's default codec that seek backwards on write ...
Mike McCandless
http://blog.mikemccandless.com
On Mon, Jun 25, 2012 at 11:55 AM, Mihai Soloi <[email protected]> wrote:
Hello everybody,
I'm Mihai, a GSoC student, and I'm implementing an HBaseDirectory for Lucene
[1] in order to use it on James mailbox indexing. I've implemented
HIndexOutput/Input, they're persisting the segments file just fine in an
HBase table, but when I try to get an IndexWriter from my directory, it
reads the segment_N file but due to the check in SegmentInfos the current
checksum is different from the persisted one. I've tried finding a solution
but I can't reach one. Do you guys have any idea why this happens? This is
the stack trace:
org.apache.lucene.index.CorruptIndexException: checksum mismatch in segments
file (resource: ChecksumIndexInput(anonymous IndexInput))
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:335)
at
org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:182)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1168)
at
org.apache.james.mailbox.lucene.hbase.IndexingTest.getWriter(IndexingTest.java:82)
at
org.apache.james.mailbox.lucene.hbase.IndexingTest.testIndexWriter(IndexingTest.java:123)
[1] http://code.google.com/a/apache-extras.org/p/mailbox-lucene-index-hbase/
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]