Hello Mike and Robert,

I am using the stable version of Lucene(i.e. 3.6) and what is actually going on is that the checksum (i.e. a long) is written as 8 bytes: the first 4 are 0, then the mismatched checksum value(i.e. checksum-1) is written in the next 4(reference: ChecksumIndexOutput.prepareCommit()).When finishCommit() happens the correct checksum is written to the buffer and then on close it's flushed to the directory.

A comment states that this is done for better testing. I've followed the code with the debugger and printed out the bytes in the logger and I can say that seeking back and overwriting are done as they should be.

You can run the test as 'mvn test -Dtest=org.apache.james.mailbox.lucene.hbase.IndexingTest' but there will be a lot of byte printing.

I am now looking at the AppendingCodec in version 4, and see if I can better use that implementation.

Thank you,
Mihai


On 26.06.2012 13:30, Michael McCandless wrote:
Hmm, the checksum is there to ensure all bits were persisted properly.

But one trickiness is we first write 4 0 bytes, then seek back and
write the checksum over those 4 bytes.  Could it be that the HBase
IndexOutput impl can't handle seeking back and overwriting?

If so, you should have a look at AppendingCodec, which fixes the
places in Lucene's default codec that seek backwards on write ...

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jun 25, 2012 at 11:55 AM, Mihai Soloi <[email protected]> wrote:
Hello everybody,

I'm Mihai, a GSoC student, and I'm implementing an HBaseDirectory for Lucene
[1] in order to use it on James mailbox indexing. I've implemented
HIndexOutput/Input, they're persisting the segments file just fine in an
HBase table, but when I try to get an IndexWriter from my directory, it
reads the segment_N file but due to the check in SegmentInfos the current
checksum is different from the persisted one. I've tried finding a solution
but I can't reach one. Do you guys have any idea why this happens? This is
the stack trace:

org.apache.lucene.index.CorruptIndexException: checksum mismatch in segments
file (resource: ChecksumIndexInput(anonymous IndexInput))
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:335)
    at
org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:182)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1168)
    at
org.apache.james.mailbox.lucene.hbase.IndexingTest.getWriter(IndexingTest.java:82)
    at
org.apache.james.mailbox.lucene.hbase.IndexingTest.testIndexWriter(IndexingTest.java:123)

[1] http://code.google.com/a/apache-extras.org/p/mailbox-lucene-index-hbase/

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to