IndexWriter.maybeMerge() removes deleted documents from index (Lucene 3.1.0 to 
3.5.0)
-------------------------------------------------------------------------------------

                 Key: LUCENE-3838
                 URL: https://issues.apache.org/jira/browse/LUCENE-3838
             Project: Lucene - Java
          Issue Type: Bug
          Components: core/index
    Affects Versions: 3.5, 3.4, 3.3, 3.2, 3.1
         Environment: Windows, Linux, OSX
            Reporter: Ivan Stojanovic
            Priority: Blocker


My company uses Lucene for high performance, heavy loaded farms of translation 
repositories with hundreds of simultaneous add/delete/update/search/retrieve 
threads. In order to support this complex architecture beside other things and 
tricks used here I rely on docId-s being unchanged until I ask that explicitly 
(using IndexWriter.optimize() - IndexWriter.forceMerge()).

For this behavior LogMergePolicy is used.

This worked fine until we raised the Lucene version from 3.0.2 to 3.5.0. Until 
version 3.1.0 merge triggerred by IndexWriter.addDocument() didn't expunge 
deleted documents ensuring that docId-s stayed unchanged and making some 
critical jobs possible without impact on index size. IndexWriter.optimize() did 
the actual deleted documents removal.

>From Lucene version 3.1.0 IndexWriter.maybeMerge() does the same thing as 
>IndexWriter.forceMerge() regarding deleted documents. There is no difference. 
>This leads to unpredictable internal index structure changes during simple 
>document add (and possible delete) operations and in undefined point in time. 
>I looked into the Lucene source code and can definitely confirm this.

This issue makes our Lucene client code totally unusable.

Solution steps:

1) add a flag somewhere that will control whether the deleted documents should 
be removed in maybeMerge(). Note that this is only a half of what we need here.
2) make forceMerge() always remove deleted documents no matter if maybeMerge() 
removes them or not. Alternatively, there can be another parameter added to 
forceMerge() that will also tell if deleted documents should be removed from 
index or not.

The sample JUnit code that can replicate this issue is added below.



public class TempTest {

    private Analyzer _analyzer = new KeywordAnalyzer();

    @Test
    public void testIndex() throws Exception {
        File indexDir = new File("sample-index");
        if (indexDir.exists()) {
            indexDir.delete();
        }

        FSDirectory index = FSDirectory.open(indexDir);

        Document doc;

        IndexWriter writer = createWriter(index, true);
        try {
            doc = new Document();
            doc.add(new Field("field", "text0", Field.Store.YES,
                    Field.Index.ANALYZED));
            writer.addDocument(doc);

            doc = new Document();
            doc.add(new Field("field", "text1", Field.Store.YES,
                    Field.Index.ANALYZED));
            writer.addDocument(doc);

            doc = new Document();
            doc.add(new Field("field", "text2", Field.Store.YES,
                    Field.Index.ANALYZED));
            writer.addDocument(doc);

            writer.commit();
        } finally {
            writer.close();
        }

        IndexReader reader = IndexReader.open(index, false);
        try {
            reader.deleteDocument(1);
        } finally {
            reader.close();
        }

        writer = createWriter(index, false);
        try {
            for (int i = 3; i < 100; i++) {
                doc = new Document();
                doc.add(new Field("field", "text" + i, Field.Store.YES,
                        Field.Index.ANALYZED));
                writer.addDocument(doc);

                writer.commit();
            }
        } finally {
            writer.close();
        }

        boolean deleted;
        String text;

        reader = IndexReader.open(index, true);
        try {
            deleted = reader.isDeleted(1);
            text = reader.document(1).get("field");
        } finally {
            reader.close();
        }

        assertTrue(deleted); // This line breaks
        assertEquals("text1", text);
    }

    private MergePolicy createEngineMergePolicy() {
        LogDocMergePolicy mergePolicy = new LogDocMergePolicy();

        mergePolicy.setCalibrateSizeByDeletes(false);
        mergePolicy.setUseCompoundFile(true);
        mergePolicy.setNoCFSRatio(1.0);

        return mergePolicy;
    }

    private IndexWriter createWriter(Directory index, boolean create)
            throws Exception {
        IndexWriterConfig iwConfig = new IndexWriterConfig(Version.LUCENE_35,
                _analyzer);

        iwConfig.setOpenMode(create ? IndexWriterConfig.OpenMode.CREATE
                : IndexWriterConfig.OpenMode.APPEND);
        iwConfig.setMergePolicy(createEngineMergePolicy());
        iwConfig.setMergeScheduler(new ConcurrentMergeScheduler());

        return new IndexWriter(index, iwConfig);
    }

}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to