Hi I am not sure how more than one client_no field ends up w/ a document, and I'm not sure it's related to the taxonomy at all.
However, looking at the code example you pasted above, and since you mention that you index+commit in one thread, while another thread does the reopen, I wonder if that's the issue: you first commit the taxo, then commit the index. But what if a new document makes it into the index after you committed to taxo, with a new client_no? In that case, the reopening thread will discover an "older" taxonomy, while the index will have categories with ordinals larger than the taxonomy's greatest ordinal? I also think that it's a mistake to commit and reopen in two separate threads. If possible, I suggest that you do that always in the same thread, and in that order: first commit the index, then the taxonomy. That way, if a document goes in to the index (and new facets to the taxonomy) after the index.commit(), then when you reopen the worse case is that the taxonomy is "ahead" of the index, which is fine. When you reopen, also reopen in the same order. Could you try that and see if that resolves your issue. Although, I don't understand how this can lead to more than one client_no ending up in one document, unless there's also a concurrency bug in the indexing code ... or I misunderstood the issue. Shai On Fri, Apr 11, 2014 at 2:49 PM, Rob Audenaerde <rob.audenae...@gmail.com>wrote: > Hi all, > > I have a issue using the near real-time search in the taxonomy. I could > really use some advise on how to debug/proceed this issue. > > The issue is as follows: > > I index 100k documents, with about 40 fields each. For each field, I also > add a FacetField (issues arises both with FacetField as > FloatAssociationFacetField). Each document has a unique number field > (client_no). > > When just indexing and searching afterwards, all is fine. > > When searching while indexing, sometimes the number of facets associated > with a document is to high, i.e. when collecting facets there are more that > one client_no on one document, which of course should not be the case. > > Before each search, I use the manager.maybeRefreshBlocking() before the > search, because I want the most-actual results. > > I have a taxonomy and indexreader combined in a ReferenceManager (I created > this before the SearcherTaxonomyManager existed, but it behaves exactly the > same, similar refcount logic) > > During indexing I commit every 5000 documents (not needed for the NRT > search, but needed to prevent loss in the application should shut down). I > commit as follows: > > public void commit() throws DocumentIndexException > { > try > { > synchronized ( GlobalIndexCommitAndCloseLock.LOCK ) > { > this.taxonomyWriter.commit(); > this.luceneIndexWriter.commit(); > } > } > catch ( final OutOfMemoryError | IOException e ) > { > tryCloseWritersOnOOME( this.luceneIndexWriter, > this.taxonomyWriter ); > throw new DocumentIndexException( e ); > } > } > > I use a standard IndexWriterConfig and both IndexWriter and TaxonomyWriter > are RAMDirectory(). > > My testcase indexes the 100k documents, while another thread is > continuously calling 'manager.maybeRefreshBlocking()'. This is enough to > sometimes cause the taxonomy to be incorrect. > > The number of indexing threads does not seems to influence the issue, as it > also appears when I have only 1 indexing thread. > > I know it is an index problem, because when I write in the index to file > instead of RAM and reopen it in a clean application, I see the same > behaviour. > > > I could really use some advise on how to debug/proceed this issue. If more > info is needed, just ask. > > Thanks in advance, > > -Rob >