[ 
https://issues.apache.org/jira/browse/LUCENE-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463483
 ] 

Doron Cohen commented on LUCENE-140:
------------------------------------

Jed, is it possible that when re-creating the index, while IndexWriter is 
constructed with create=true, FSDirectory is opened with create=false?
I suspect so, because otherwise, old  .del files would have been deleted. 
If indeed so, newly created segments, which have same names as segments in 
previous (bad) runs, when opened, would read the (bad) old .del file. 
This would possibly expose the bug fixed by Michael. 
I may be over speculating here, but if this is the case, it can also explain 
why changing the merge factor from 4 to 10 exposed the problem. 

In fact, let me speculate even further - if indeed when creating the index from 
scratch, the FSDirectory is (mistakenly) opened with create=false, as long as 
you always repeated the same sequencing of adding and deleting docs, you were 
likely to almost not suffer from this mistake, because segments created with 
same names as (old) .del files simply see docs as deleted before the docs are 
actually deleted by the program. The search behaves wrongly, not finding these 
docs before they are actually deleted, but no exception is thrown when adding 
docs. However once the merge factor was changed from 4 to 10, the matching 
between old .del files and new segments (with same names) was broken, and the 
out-of-order exception appeared. 

...and if this is not the case, we would need to look for something else...

> docs out of order
> -----------------
>
>                 Key: LUCENE-140
>                 URL: https://issues.apache.org/jira/browse/LUCENE-140
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: unspecified
>         Environment: Operating System: Linux
> Platform: PC
>            Reporter: legez
>         Assigned To: Michael McCandless
>         Attachments: bug23650.txt, corrupted.part1.rar, corrupted.part2.rar, 
> indexing-failure.log, LUCENE-140-2007-01-09-instrumentation.patch
>
>
> Hello,
>   I can not find out, why (and what) it is happening all the time. I got an
> exception:
> java.lang.IllegalStateException: docs out of order
>         at
> org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:219)
>         at
> org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:191)
>         at
> org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:172)
>         at 
> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:135)
>         at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:88)
>         at 
> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:341)
>         at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:250)
>         at Optimize.main(Optimize.java:29)
> It happens either in 1.2 and 1.3rc1 (anyway what happened to it? I can not 
> find
> it neither in download nor in version list in this form). Everything seems 
> OK. I
> can search through index, but I can not optimize it. Even worse after this
> exception every time I add new documents and close IndexWriter new segments is
> created! I think it has all documents added before, because of its size.
> My index is quite big: 500.000 docs, about 5gb of index directory.
> It is _repeatable_. I drop index, reindex everything. Afterwards I add a few
> docs, try to optimize and receive above exception.
> My documents' structure is:
>   static Document indexIt(String id_strony, Reader reader, String 
> data_wydania,
> String id_wydania, String id_gazety, String data_wstawienia)
> {
>     Document doc = new Document();
>     doc.add(Field.Keyword("id", id_strony ));
>     doc.add(Field.Keyword("data_wydania", data_wydania));
>     doc.add(Field.Keyword("id_wydania", id_wydania));
>     doc.add(Field.Text("id_gazety", id_gazety));
>     doc.add(Field.Keyword("data_wstawienia", data_wstawienia));
>     doc.add(Field.Text("tresc", reader));
>     return doc;
> }
> Sincerely,
> legez

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to