[ https://issues.apache.org/jira/browse/LUCENE-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463247 ]
Michael McCandless commented on LUCENE-140: ------------------------------------------- Doron, > (1) the sequence of ops brought by Jason is wrong: > ... > > Problem here is that the docIDs found in (b) may be altered in step > (d) and so step (f) would delete the wrong docs. In particular, it > might attempt to delete ids that are out of the range. This might > expose exactly the BitVector problem, and would explain the whole > thing, but I too cannot see how it explains the delete-by-term case. Right, the case I fixed only happens when the Lucene deleteDocument(int docNum) is [slightly] mis-used. Ie if you are "playing by the rules" you would never have hit this bug. And this particular use case is indeed incorrect: doc numbers are only valid to the one reader that you got them from. > I think however that the test Mike added does not expose the docs > out of order bug - I tried this test without the fix and it only > fail on the "gotException assert" - if you comment this assert the > test pass. Huh, I see my test case (in IndexReader) indeed hitting the original "docs out of order" exception. If I take the current trunk and comment out the (one line) bounds check in BitVector.set and run that test, it hits the "docs out of order" exception. Are you sure you updated the change (to tighten the check to a <= from a <) to index/SegmentMerger.java? Because, I did indeed find that the test failed to fail when I first wrote it (but should have). So in digging why it didn't fail as expected, I found that the check for "docs out of order" missed the boundary case of the same doc number twice in a row. Once I fixed that, the test failed as expected. > (3) maxDoc() computation in SegmentReader is based (on some paths) > in RandomAccessFile.length(). IIRC I saw cases (in previous project) > where File.length() or RAF.length() (not sure which of the two) did > not always reflect real length, if the system was very busy IO wise, > unless FD.sync() was called (with performance hit). Yes I saw this too. From the follow-on discussion it sounds like we haven't found a specific known JVM bug here. Still, it does make me nervous that we rely on file length to derive maxDoc. In general I think we should rely on as little as possible from the file system (there are so many cross platform issues/differences) and instead explicitly store things like maxDoc into the index. I will open a separate Jira issue to track this. Also I will record this path in the instrumentation patch for 1.9.1 just to see if we are actually hitting something here (I think unlikely but possible). > docs out of order > ----------------- > > Key: LUCENE-140 > URL: https://issues.apache.org/jira/browse/LUCENE-140 > Project: Lucene - Java > Issue Type: Bug > Components: Index > Affects Versions: unspecified > Environment: Operating System: Linux > Platform: PC > Reporter: legez > Assigned To: Michael McCandless > Attachments: bug23650.txt, corrupted.part1.rar, corrupted.part2.rar > > > Hello, > I can not find out, why (and what) it is happening all the time. I got an > exception: > java.lang.IllegalStateException: docs out of order > at > org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:219) > at > org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:191) > at > org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:172) > at > org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:135) > at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:88) > at > org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:341) > at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:250) > at Optimize.main(Optimize.java:29) > It happens either in 1.2 and 1.3rc1 (anyway what happened to it? I can not > find > it neither in download nor in version list in this form). Everything seems > OK. I > can search through index, but I can not optimize it. Even worse after this > exception every time I add new documents and close IndexWriter new segments is > created! I think it has all documents added before, because of its size. > My index is quite big: 500.000 docs, about 5gb of index directory. > It is _repeatable_. I drop index, reindex everything. Afterwards I add a few > docs, try to optimize and receive above exception. > My documents' structure is: > static Document indexIt(String id_strony, Reader reader, String > data_wydania, > String id_wydania, String id_gazety, String data_wstawienia) > { > Document doc = new Document(); > doc.add(Field.Keyword("id", id_strony )); > doc.add(Field.Keyword("data_wydania", data_wydania)); > doc.add(Field.Keyword("id_wydania", id_wydania)); > doc.add(Field.Text("id_gazety", id_gazety)); > doc.add(Field.Keyword("data_wstawienia", data_wstawienia)); > doc.add(Field.Text("tresc", reader)); > return doc; > } > Sincerely, > legez -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]