I tried calling indexWriter.GetReader() and then using the Terms() method to check on that, but that also have non trivial cost as well.
I guess that I am trying to see if there are any other alternatives to update scenarios. There is a lot of material on how to optimize lucene indexes for writes only, but I haven't seen much (or any) on update stories. On Sat, Oct 20, 2012 at 11:25 PM, Itamar Syn-Hershko <[email protected]>wrote: > How would it know it doesn't need to do the delete? > > You provided IndexWriter with a command to delete by Term. It has to scan > the index for all docs with that term and mark them for deletion. That's > the calls to SegmentTermDocs.Seek() you see - 868K deletions by term > pending. If the term does not exist no docs will be found and nothing will > happen, but there's really no other way if looking up docs for deletion by > term, even if it doesn't exist. > > Since caches aren't involved in deletions, I'd assume performing a query > and then deleting on Term only if the query returns results would perform > faster if you expect to have a higher rate of new entries than updates, but > it has the risk of not being up to date (e.g. IndexWriter wasn't flushed). > > On Sat, Oct 20, 2012 at 10:50 PM, Oren Eini (Ayende Rahien) < > [email protected]> wrote: > > > And that isn't the case, if I am not calling DeleteDocuments(), I don't > see > > the cost of ApplyDeletes. > > > > On Sat, Oct 20, 2012 at 10:40 PM, Itamar Syn-Hershko <[email protected] > > >wrote: > > > > > The image still didn't go through, but I believe you are hitting this: > > > https://issues.apache.org/jira/browse/LUCENE-2275 > > > > > > On Sat, Oct 20, 2012 at 7:23 PM, Oren Eini (Ayende Rahien) < > > > [email protected]> wrote: > > > > > > > Attached > > > > > > > > On Sat, Oct 20, 2012 at 7:17 PM, Simon Svensson <[email protected]> > > > wrote: > > > > > > > >> Hi, > > > >> > > > >> I believe that your inline image did not survive the mailing list > > > >> software. Could you publish it somewhere instead? > > > >> > > > >> // Simon > > > >> > > > >> > > > >> On 2012-10-20 19:06, Oren Eini (Ayende Rahien) wrote: > > > >> > > > >>> To start with, I already read this: > http://wiki.apache.org/lucene-** > > > >>> java/ImproveIndexingSpeed< > > > http://wiki.apache.org/lucene-java/ImproveIndexingSpeed> > > > >>> > > > >>> I am profiling my Lucene code, and I noticed the following: > > > >>> > > > >>> Inline image 1 > > > >>> > > > >>> As you can see, applying the deletes takes quite a bit of time. > > > >>> > > > >>> I am always assuming that I update the documents in Lucene, so my > > > >>> process is: > > > >>> > > > >>> foreach(var item in items) // dummy code, but useful > > > >>> { > > > >>> indexWriter.DeleteDocuments(**new Term("UniqueId", item.Id)); > > > >>> > > > >>> indexWriter.AddDocument(item.**ToLuceneDocument()); > > > >>> } > > > >>> > > > >>> Is there a way to avoid the costly ApplyDeletes if it doesn't need > to > > > do > > > >>> the delete? > > > >>> > > > >> > > > >> > > > > > > > > > >
