[ https://issues.apache.org/jira/browse/LUCENE-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008764#comment-13008764 ]
Uwe Schindler commented on LUCENE-2975: --------------------------------------- Before I commit this stuff, I wanted to conclude all Robert and me found out yesterday (in fact, Robert was doing the debugging work). I just had the idea how to solve it. The issue has not really something to do with MMap and also is in the Hotspot VM even before 1.6.0_22 (I hit the bug in our server logs for a IndexReader.document() call and were not able to reproduce, this was 1.6.0_21). So this bug also affects previous Lucene versions, *BUT*: The bug only happens, if the IndexInput.readByte() method is inlined by hotspot and we are *not* using BufferedIndexInput (which has its own VInt impl). Lucene 3.0 and prev had a very complicated readByte() method in MMap, hotspot never inlined. But since an performance update in MMapIndexInput/MultiMMapIndexInput, the readByte method got a three liner simply delegating to the ByteBuffer's getByte() and catching an exception to fallback to another impl (for changing the buffer slice in MultiMMap). So most calls are simply calls to NIOs getByte() which may be intrinsics or whatever (we don't know what Hotspot does there). Sun optimizes a lot at NIO! This leads to a problem with the loop inside readVInt; all other methods in IndexInput have already unwinded loops, so readLong is not a "for (i=0; i<64; i+=8)" loop, it is coded with all shifts precalculated. To solve the bug, I did the same for readVInt and readVLong. We don't know which bug in hotspot is the real cause for this, so David Weiss's bug looks really identical, especially as we had in one of our tests also an endless loop (Robert have seen it, not confirmed). There are a lot of hotspot bugs related to loops, so one of them hit us here. > hotspot bug in readvint gives wrong results > ------------------------------------------- > > Key: LUCENE-2975 > URL: https://issues.apache.org/jira/browse/LUCENE-2975 > Project: Lucene - Java > Issue Type: Bug > Affects Versions: 3.1 > Reporter: Uwe Schindler > Priority: Blocker > Fix For: 3.1 > > Attachments: LUCENE-2975.patch, LUCENE-2975.patch, LUCENE-2975.patch, > LUCENE-2975.patch, perf.png > > > When testing the 3.1-RC1 made by Yonik on the PANGAEA (www.pangaea.de) > productive system I figured out that suddenly on a large segment (about 5 > GiB) some stored fiels suddenly produce a strange deflate decompression > problem (CompressionTools) although the stored fields are no longer pre-3.0 > compressed. It seems that the header of the stored field is read incorrectly > at the buffer boundary in MultiMMapDir and then FieldsReader just incorrectly > detects a deflate-compressed field (CompressionTools). > The error occurs reproducible on CheckIndex with MMapDirectory, but not with > NIODir or SimpleDir. The FDT file of that segment is 2.6 GiB, on Solaris the > chunk size is Integer.MAX_VALUE, so we have 2 MultiMMap IndexInputs. > Robert and me have the index ready as a tar file, we will do tests on our > local machines and hopefully solve the bug, maybe introduced by Robert's > recent changes to MMap. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org