Confusingly, I can't for the life of me seem to replicate this error (it used to be quite easy). I wish I still had the logs somewhere :-(
One important thing I left out is that we're also calling delete a lot. Most of the time when we get an update from Kafka it's an update to an existing document, so we call deleteDocument on IW (side note, why is there no delete for DTW?), then FacetsConfig.build, then addDocument. This, presumably, further exacerbates the concurrency issue we've been discussing. To summarize what I'm taking away from all this though, it seems like having multiple threads calling these functions is asking for trouble. To that end, I'm thinking of redesigning our system as follows: 1. Add a read/write lock for use when indexing. 2. Acquire the read lock when calling deleteDocument, FacetsConfig.build and addDocument. Continue calling these from multiple threads. 3. Create a separate thread that, every N seconds, acquires the write lock, calls commit on both IW and DTW, update our offsets in Kafka, then maybeRefresh on STR. I'll also investigate removing forceMerge, but that's relatively independent of all this. Thanks again! Will On Wed, Sep 28, 2016 at 4:19 AM Shai Erera <ser...@gmail.com> wrote: > *> However, that should not lead to NSFE. At worst it should lead to> > "ordinal is not known" (maybe as an AIOOBE) from the taxonomy reader.* > > That is correct, this interleaving indexing case can potentially result in > an AIOOBE like exception during faceted search, when the facets that are in > the "sneaked-in-docs" will be found be a search, but resolving the ordinals > to their labels will fail because the labels will be unknown to the > taxonomy. > > I wonder if committing the opposite order solves this problem. So in the > above use case, IW.commit() commits all the new docs with their facets, > then if more indexing happens before TIW.commit(), then the commit to the > taxonomy index results in more facets than are known to the search index, > but that's ok. > > I'm just not sure if that covers all concurrency cases though. I remember > this was discussed several times in the past, and we eventually reached a > conclusion, but clearly if it was the latter, it wasn't clarified in the > javadocs. I can't think of a use case that breaks this commit order though > ( > IW.commit() followed by TIW.commit()). This feels safe to me ... can you > try to think of a use case that breaks it? Assuming that each doc-indexing > does addTaxo() followed by addDoc(). > > Maybe we should have a helper which takes an IW and TIW and exposes > commit() APIs that will do it in the correct order? > > Now I'm thinking about SearcherTaxoManager -- it reopens the readers by > first re-opening IR, then TIR. It does so under the assumption of first > committing to TIW then to IW. Now if we reverse the order, then you need to > be more careful in when you commit changes to the two writers, and when you > re-open the readers. If you always do that from the same thread, then you > should be fine, the order of re-opens doesn't really matter. > > But you re-open from a different thread than the one you commit, I am not > sure that committing to IW first then TIW can play well with any re-open > order? I.e. one case which breaks it is you commit to IW, then re-open both > IR and TIR, before you commit to TIW, and you have a search which may find > ordinals that are unknown you to the TIR? > > So I'd say that if you refresh() from the same thread that you do commit(), > then commit to IW first then TIW, and use SearcherTaxoManager as it's > currently implemented. But I'd like to hear your thoughts about it. > > Shai > > > On Wed, Sep 28, 2016 at 1:26 PM Michael McCandless < > luc...@mikemccandless.com> wrote: > > > On Wed, Sep 28, 2016 at 3:05 AM, William Moss > > <will.m...@airbnb.com.invalid> wrote: > > > Thank you both for your quick reply! > > > > You're welcome! > > > > > * We actually tried the upgrade to 6.0 a few months back (when that was > > the > > > newest) and were getting similar errors to the ones I'm seeing now. We > > were > > > not able to track them down, which is part of the motivation for me > > asking > > > all these questions. We'll get there though :-) > > > > OK, we gotta get to the root cause. Sounds like it happens in either > > version... > > > > > * The last time we tested this (which I think was still post > > > ConcurrentMergePolicy) we saw that the read speed would slowly degrade > > over > > > time. My understanding was that forceMerge was very expensive, but > would > > > make reads faster once complete. Is this not correct? > > > > It really depends on what queries you are running. Really you should > > test in your use case and be certain that the massive expense of force > > merge is worthwhile / necessary. In general it's not worth it, ever > > if searches are a bit faster, except for indices that will never > > change again. > > > > > Also, we never > > > attempted to tune the MergePolicy at all, so while were on the subject, > > is > > > there good documentation on how to do that? I'm much prefer to get away > > > from calling forceMerge. If it's useful information, we've got a > > relatively > > > small corpus, only ~2+M documents. > > > > Just use the defaults :) Tuning those settings is dangerous unless > > you have a very specific problem to fix. > > > > > * We want to be able to ensure that if a machine or JVM crashes we are > > in a > > > coherent state. To that end, we need to call commit on Lucene and then > > > commit back what we've read so far to Kafka. Calling commit is the only > > way > > > to ensure this, right? > > > > Correct: commit in Lucene, then notify Kafka what offset you had > > indexed just before you called IW.commit. > > > > But you may want to replicate the index across machines if you don't > > want to have a single point of failure. We recently added > > near-real-time replication to Lucene for this use case ... > > > > > * To make sure I understand how maybeRefresh works, ignoring whether or > > not > > > we commit for a second, if I add a document via IndexWriter, it will > not > > be > > > reflected in IndexSearchers I get by calling acquire on > > SearcherAndTaxonomy > > > until I call maybeRefresh? > > > > Correct. > > > > > Now, on to the concurrency issue. I was thinking a little more about > this > > > and I think the fundamental issue is that while IndexWriter and > > > DirectoryTaxonomyWriter are each thread safe, them together are not. As > > > suggested by the documentation, we use one instance each of > IndexWriter, > > > DirectoryTaxonomyWriter and SearcherTaxonomyManager. Imagine the > > following > > > scenario: > > > [Thread 1] Add document to DirectoryTaxonomyWriter > > > [Thread 1] Add document to IndexWriter > > > [Thread 1] Call commit on DirectoryTaxonomyWriter > > > [Thread 2] Add document to DirectoryTaxonomyWriter > > > [Thread 2] Add document to IndexWriter > > > [Thread 1] Call commit on IndexWriter > > > The on disc representation now should contain things in the IndexWriter > > > that are not contained in the DirectoryTaxonomyWriter, right? > > > > Correct, I'm also confused about the commit order for this reason. > > Let's see what Shai says. > > > > However, that should not lead to NSFE. At worst it should lead to > > "ordinal is not known" (maybe as an AIOOBE) from the taxonomy reader. > > > > > Assuming maybeRefresh looks at the state on disk when it's doing it's > > > update (if this not true I don't understand why it was throwing > > > NoSuchFileException) then it can be out of sync as well? > > > > maybeRefresh (assuming new docs were indexed since you last called it) > > will write new index files holding those indexed docs, and then open > > them to do searching over them. > > > > > I apparently never made a full copy of the stack trace. I'll attempt to > > > regenerate it and post it here once I have it. > > > > OK, we need to understand that. It should not be happening ;) > > > > Mike McCandless > > > > http://blog.mikemccandless.com > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > >