updateDocument (somtimes) no longer deleting documents after Update to 4.6

nospam Mon, 24 Feb 2014 02:34:29 -0800

Hi there,

we recently updated our application from lucene 3.0 to 3.6 with theeffect that (albeit using the SearchManager functionality as describedonhttp://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html)calls to searcherManager.maybeRefresh() were incredibly slow. e.g.taking about 30 seconds after adding one document to the index with anindex of about 9000 documents. I assumed that we did something wrongwith the configuration as 30 seconds could not be meant with NRT ;-)

Thus we migrated to the latest 4.6 version and indexing speed wasindeed very good now (with the searcherManager.maybeRefreshBlocking()call only taking milliseconds to complete). But after some wore testingwe discovered that somehow the indexWriter.updateDocument( term,documentToIndex ) functionality wasn't working anymore as expected - atleast somtetimes. It looks like either the updateDocument method doesnot longer reliably delete the old document before adding a new one -with the result that older documents are beeing returned by searchesbreaking our application.

Unfortunately I'm not able to reproduce the issues in a simple unittest but maybe somebody of the lucene experts knows what we are doingwrong here. Not sure if it is of any relevance but we are running onWindows with a 64 bit JDK 7 thus MMapDirectory is beeing used.


Our Index Writer is configured like this:

IndexWriterConfig conf = new IndexWriterConfig( Version.LUCENE_46, newLimitTokenCountAnalyzer( new DefaultAnalyzer(), Integer.MAX_VALUE ) );



        conf.setOpenMode( OpenMode.APPEND );

IndexWriter indexWriter = new IndexWriter( FSDirectory.open( new File(directoryPath )), conf );


SearcherManager is configured like this:

        searcherManager = new SearcherManager(indexWriter, true, null);

// The anlyzer that we are using looks like this:

        public class DefaultAnalyzer extends Analyzer
        {
           @Override

protected TokenStreamComponents createComponents(final StringfieldName,

                   final Reader reader) {

return new TokenStreamComponents(newWhitespaceTokenizer(LuceneSearchService.LUCENE_VERSION, reader));

           }
        }

The update of the index looks like this:

        // instead of 42 the unique business identifier is used
        Long myUniqueBusinessId = 42l;
        BytesRef ref = new BytesRef(NumericUtils.BUF_SIZE_LONG);

NumericUtils.longToPrefixCoded( myUniqueBusinessId.longValue(), 0, ref);

        Term term = new Term( "MY_UNIQUE_BUSINESS_ID", ref );

// this method may be called multiple times with the same term andluceneDocumentToIndex parameter

        indexWriter.updateDocument( term, luceneDocumentToIndex);

        // After performing a couple of updates we execute
        searcherManager.maybeRefreshBlocking();


// For searching we are using the following code
        searcher = searcherManager.acquire();

// luceneQuery is the query, filter is some sort of filtering that weapply, luceneSort is some sorting queryTopDocs topDocs = searcher.search( luceneQuery, filter, 1000,luceneSort );

// If we perform a query for MY_UNIQUE_BUSINESS_ID it will returnmultiple results instead of just one - this was neither the case withlucene 3.0 nor 3.6

In order to fix the issue I tried couple of things but to now avail. Itstill happens (not all the time though) that the lucene returns twodocuments when querying for MY_UNIQUE_BUSINESS_ID instead of just one

-       setting setMaxBufferedDeleteTerms to 1 in the config
        conf.setMaxBufferedDeleteTerms( 1 );
- explicetly deleting instead of just updating
        indexWriter.deleteDocuments( term );

- ensuring that the field MY_UNIQUE_BUSINESS_ID is stored in the indexand not just analysed

- trying to delete the document via indexWriter.tryDeleteDocument()
- calling indexWriter.maybeMerge() after the update
- calling indexWriter.commit() after the update

Sorry for the lenghty post but I wanted to include as much informationas possible. Let me know if something is missing...


Thanks for helping in advance ;-)

Kai

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

updateDocument (somtimes) no longer deleting documents after Update to 4.6

Reply via email to