addDocument can be a significant gain compared to updateDocument as doing a PK lookup on a unique field has a cost that is not negligible compared to indexing a document, especially if the indexing chain is simple (no large text fields with complex analyzers). Reindexing in place will also cause more merging. Overall I find the 3x factor a bit high, but not too surprising if documents and the analysis chain are simple, and/or if storage is slow.
Le mar. 9 mai 2017 à 16:06, Rob Audenaerde <rob.audenae...@gmail.com> a écrit : > As far as I know, the updateDocument method on the IndexWriter delete and > add. See also the javadoc: > > [..] Updates a document by first deleting the document(s) > containing term and then adding the new > document. The delete and then add are atomic as seen > by a reader on the same index (flush may happen only after > the add). [..] > > > On Tue, May 9, 2017 at 3:37 PM, Kudrettin Güleryüz <kudret...@gmail.com> > wrote: > > > I do update the entire document each time. Furthermore, this sometimes > > means deleting compressed archives which are stores as multiple documents > > for each compressed archive file and readding them. > > > > Is there an update method, is it better performance than remove then > add? I > > was simply removing modified files from the index (which doesn't seem to > > take long), and readd them. > > > > On Tue, May 9, 2017 at 9:33 AM Rob Audenaerde <rob.audenae...@gmail.com> > > wrote: > > > > > Do you update each entire document? (vs updating numeric docvalues?) > > > > > > That is implemented as 'delete and add' so I guess that will be slower > > than > > > clean sheet indexing. Not sure if it is 3x slower, that seems a bit > much? > > > > > > On Tue, May 9, 2017 at 3:24 PM, Kudrettin Güleryüz < > kudret...@gmail.com> > > > wrote: > > > > > > > Hi, > > > > > > > > For a 5.2.1 index that contains around 1.2 million documents, > updating > > > the > > > > index with 1.3 million files seems to take 3X longer than doing a > > scratch > > > > indexing. (Files are crawled over NFS, indexes are stored on a > > mechanical > > > > disk locally (Btrfs)) > > > > > > > > Is this expected for Lucene's update index logic, or should I further > > > debug > > > > my part of the code for update performance? > > > > > > > > Thank you, > > > > Kudret > > > > > > > > > >