Re: updateDocument (somtimes) no longer deleting documents after Update to 4.6

Erick Erickson Mon, 24 Feb 2014 11:17:03 -0800

I suspect you're finding the old doc that is simply marked
as deleted. Did you check for that?


One quick way to see if this is even in the right ballpark would be
to do a forceMerge. If the problem disappears, then this is
relevant I'd guess.

Warning: The operative word here is "guess", I haven't been
working in this layer for a long time...

Best,
Erick


On Mon, Feb 24, 2014 at 10:14 AM, <[email protected]> wrote:

> Hm it looks like this is somehow caused by the filters we are using for
> searching.
>
> I took one of the MY_UNIQUE_BUSINESS_ID ids, used in our applications
> search functionality and debuged the lucene search a little more. If I
> specify null for the filters I only get one result (which is correct). If I
> add the two filters that we usually use in our application I notice that
> the filters are triggered twice - for two different segments - and the
> result is contained in both segments. Looks like the first segment contains
> all documents in the index with the second segment containing only one -
> the document that should have been deleted upfront.
>
> This can be reproduced even after restarting the application and even
> after indexWriter.commit is triggered
>
> Could this be a bug? Or is this the desired behaviour?
>
> Best Regards
>
> Kai
>
>
> Am 2014-02-24 13:54, schrieb [email protected]:
>
>  I'll see if I can dig a little bit deeper into the 3.6 behavior, for
>> now I'm trying to get it running on 4.6 (as the index file is also a
>> lot smaller - on 3.6 it was about 2 GB for about 9000 documents, with
>> 4.6 it's only about 200 MB).
>>
>> And yes the business ID is indexed - otherwhise I wouldn't be able to
>> find it at all - The problem is not that I can't find it but I find it
>> twice. And to make matters worse not consistently all the bime but
>> only sometimes. Somehow it looks like the delete (before the update)
>> does sometimes work and sometimes not. Do you know any chances why
>> this could happen? Maybe something related to the MergePoliy (which we
>> don't set e.g. we are using the default)
>>
>> Best Regards
>>
>> Kai
>>
>>
>> Am 2014-02-24 12:10, schrieb Michael McCandless:
>>
>>> The 30 second turnaround time in 3.6.x is absurd; if you turn on
>>> IndexWriter's infoStream maybe it'd give a clue.  Or, capture a few
>>> stack traces and post them.
>>>
>>> How are you creating the luceneDocumentToIndex?  You must ensure that
>>> the business ID is in fact indexed as a field in the document,
>>> otherwise the update won't find it.
>>>
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Mon, Feb 24, 2014 at 5:33 AM,  <[email protected]> wrote:
>>>
>>>> Hi there,
>>>>
>>>> we recently updated our application from lucene 3.0 to 3.6 with the
>>>> effect
>>>> that (albeit using the SearchManager functionality as described on
>>>>
>>>> http://blog.mikemccandless.com/2011/09/lucenes-
>>>> searchermanager-simplifies.html)
>>>> calls to searcherManager.maybeRefresh() were incredibly slow. e.g.
>>>> taking
>>>> about 30 seconds after adding one document to the index with an index of
>>>> about 9000 documents. I assumed that we did something wrong with the
>>>> configuration as 30 seconds could not be meant with NRT ;-)
>>>>
>>>> Thus we migrated to the latest 4.6 version and indexing speed was indeed
>>>> very good now (with the searcherManager.maybeRefreshBlocking() call
>>>> only
>>>> taking milliseconds to complete). But after some wore testing we
>>>> discovered
>>>> that somehow the indexWriter.updateDocument( term, documentToIndex )
>>>> functionality wasn't working anymore as expected - at least somtetimes.
>>>> It
>>>> looks like either the updateDocument method does not longer reliably
>>>> delete
>>>> the old document before adding a new one - with the result that older
>>>> documents are beeing returned by searches breaking our application.
>>>>
>>>> Unfortunately I'm not able to reproduce the issues in a simple unit
>>>> test but
>>>> maybe somebody of the lucene experts knows what we are doing wrong
>>>> here. Not
>>>> sure if it is of any relevance but we are running on Windows with a 64
>>>> bit
>>>> JDK 7 thus MMapDirectory is beeing used.
>>>>
>>>> Our Index Writer is configured like this:
>>>>
>>>>         IndexWriterConfig conf = new IndexWriterConfig(
>>>> Version.LUCENE_46,
>>>> new LimitTokenCountAnalyzer( new DefaultAnalyzer(), Integer.MAX_VALUE )
>>>> );
>>>>
>>>>
>>>>         conf.setOpenMode( OpenMode.APPEND );
>>>>
>>>>         IndexWriter indexWriter = new IndexWriter( FSDirectory.open( new
>>>> File( directoryPath )), conf );
>>>>
>>>> SearcherManager is configured like this:
>>>>
>>>>         searcherManager = new SearcherManager(indexWriter, true, null);
>>>>
>>>> // The anlyzer that we are using looks like this:
>>>>
>>>>         public class DefaultAnalyzer extends Analyzer
>>>>         {
>>>>            @Override
>>>>            protected TokenStreamComponents createComponents(final String
>>>> fieldName,
>>>>                    final Reader reader) {
>>>>                  return new TokenStreamComponents(new
>>>> WhitespaceTokenizer(LuceneSearchService.LUCENE_VERSION, reader));
>>>>            }
>>>>         }
>>>>
>>>> The update of the index looks like this:
>>>>
>>>>         // instead of 42 the unique business identifier is used
>>>>         Long myUniqueBusinessId = 42l;
>>>>         BytesRef ref = new BytesRef(NumericUtils.BUF_SIZE_LONG);
>>>>         NumericUtils.longToPrefixCoded( myUniqueBusinessId.longValue(),
>>>> 0,
>>>> ref );
>>>>         Term term = new Term( "MY_UNIQUE_BUSINESS_ID", ref );
>>>>
>>>>         // this method may be called multiple times with the same term
>>>> and
>>>> luceneDocumentToIndex parameter
>>>>         indexWriter.updateDocument( term, luceneDocumentToIndex);
>>>>
>>>>         // After performing a couple of updates we execute
>>>>         searcherManager.maybeRefreshBlocking();
>>>>
>>>>
>>>> // For searching we are using the following code
>>>>         searcher = searcherManager.acquire();
>>>>         // luceneQuery is the query, filter is some sort of filtering
>>>> that
>>>> we apply, luceneSort is some sorting query
>>>>         TopDocs topDocs = searcher.search( luceneQuery, filter, 1000,
>>>> luceneSort );
>>>>
>>>> // If we perform a query for MY_UNIQUE_BUSINESS_ID it will return
>>>> multiple
>>>> results instead of just one - this was neither the case with lucene 3.0
>>>> nor
>>>> 3.6
>>>>
>>>>
>>>> In order to fix the issue I tried couple of things but to now avail. It
>>>> still happens (not all the time though) that the lucene returns two
>>>> documents when querying for MY_UNIQUE_BUSINESS_ID instead of just one
>>>> -       setting setMaxBufferedDeleteTerms to 1 in the config
>>>>         conf.setMaxBufferedDeleteTerms( 1 );
>>>> - explicetly deleting instead of just updating
>>>>         indexWriter.deleteDocuments( term );
>>>> - ensuring that the field MY_UNIQUE_BUSINESS_ID is stored in the index
>>>> and
>>>> not just analysed
>>>> - trying to delete the document via indexWriter.tryDeleteDocument()
>>>> - calling indexWriter.maybeMerge() after the update
>>>> - calling indexWriter.commit() after the update
>>>>
>>>>
>>>> Sorry for the lenghty post but I wanted to include as much information
>>>> as
>>>> possible. Let me know if something is missing...
>>>>
>>>> Thanks for helping in advance ;-)
>>>>
>>>> Kai
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: updateDocument (somtimes) no longer deleting documents after Update to 4.6

Reply via email to