OK I opened: https://issues.apache.org/jira/browse/LUCENE-2297
Mike On Fri, Mar 5, 2010 at 10:25 AM, Michael McCandless <luc...@mikemccandless.com> wrote: > Currently you can't tell IW to use the pool (ie, pool is only enabled > if you use NRT readers). We should probably make this an option at > ctor time, for situations like this. (In fact, in followon > discussions about further improvements to NRT we've already discussed > having such an option to IW's ctors). I'll open an issue for this. > > Indeed from that profiler output it looks like most of the time is > being spent opening the SegmentReaders (to do deletes), specifically > loading the terms dict index (64% overall) and loading the deleted > docs (10%). > > But... how long does step 2 take? Is it an option to not commit on > every update? How many docs do you typically update? > > If you are committing only so that an outside reader can reopen, you > should consider just using an NRT reader instead (assuming the reader > is in same JVM as IndexWriter). > > Roughly how much more RAM consumption do you see when you force pooling? > > Mike > > On Fri, Mar 5, 2010 at 9:18 AM, Bogdan Ghidireac <bog...@ecstend.com> wrote: >> Hi, >> >> I have an index with 100 million docs that has around 20GB on disk and >> an update rate of few hundred docs per minute. The new docs are >> grouped in batches and indexed once every few minutes. My problem is >> that the update performance degraded too much over time as the index >> increased in size (distinct docs). >> >> My indexing flow looks like this .. >> >> 0. create indexWriter (only once) >> 1. get the open indexWriter >> 2. for each doc call indexWriter.updateDocument(pkTerm, doc) >> 3. indexWriter.commit >> 4. indexWriter.waitForMerges >> 5. wait for new docs and goto 1. >> >> I ran a profiler for several minutes and I noticed that most of the >> time the indexer is busy applying the deletes. This takes so much time >> because all terms are loaded for every commit (see the attached >> profiler screenshot). >> >> The index writer has a pool or readers but they are not used unless >> near real time is enabled. I changed my code to force the pool to be >> used but the only way I can do this is to request a reader that is >> never used writer.getReader(). Of course, the memory consumption is >> higher now because I have terms in memory but the steps 3+4 compete in >> 1-2 secs compared to 8-10 secs. >> >> Is is possible to enable the readers pool at the IndexWriter >> constructor level? My current method looks like a hack ... >> I am using Lucene 2.9.2. on Linux. >> >> Regards, >> Bogdan >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org