RE: deprecated optimize()!

Uwe Schindler Sat, 28 Jan 2012 01:13:09 -0800

Hi,

> > > The first time my code used the 3.4 libraries with version level set
> > > to 3.4 and it tried to optimize() (still using this now deprecated
> > > old call), the new code
> > went wild!
> > > It took up more memory than the heap was limited to, so I believe it
> > > is taking
> > > up system resources.   We have turned off optimize for now.
> >
> > Did it throw OutOfMemoryException?
> 
> As you thought, there was No OOM Exception.
> 
> >I assume not - but I assume, you have seen more virtual  memory usage
> >in "top", but that's not really related to optimize/forceMerge.
> 
> I'm actually on a large Windows server box.


On Windows x64, Lucene will also automatically use memory mapping, too. What
you see in Windows Task Manager is also virtual memory usage. If the virtual
memory usage is much bigger than the actual heap -Xmx allocated that exactly
coming from memory-mapped files.

> It is only related to optimize() in that all memory is used _only_ when
the
> server goes for an optimize().
> Running through our web interface (which reuses its one IndexReader)
doesn't
> seem to drive up the memory (beyond use of the heap).

If you optimize, the system is having the index files multiple times mapped
into virtual address space. This is not heap space used (otherwise your
system would OOM).

> The problem is that when we run the optimize, the machine gets memory
> limited and everything then runs slow (nearly preventing us from viewing
what
> going on) and the optimize() takes "forever" (=an extra 1.5 hrs).  We are
not
> trying to share the machine, but it is a bit excessive.

How big is you index and how big are system memory and how much is allocated
to the JVM heap? Please note: Never allocate too much memory to the JVM,
most parts of Lucene are running in file system cache (like the
memory-mapped files), if you allocate too much -Xmx heap, the system will
hit millions of page faults and the kernel needs to swap in/out the index
files all the time. This affects optimize, as this transfers lots of data
causing page-faults all over, stealing also mapped pages from your active
IndexReader serving the searches. Depending on some factors (like sorting),
you should only allocate very little heap (by testing and waiting for OOM)
and keep the rest of the physical memory available for the FS cache.

To enable the Mmapping of index files in Lucene 3.0.1, you can use "new
MMapDirectory()" instead "FSDirectory.open()", then 3.0.1 will have the
similar virtual memory usage like 3.4. You can also force Lucene 3.4 to the
slower SimpleFSDirectory by instantiating that directly (which emulates
Lucene 3.0.1 file access on Windows). But this will slow down your searches!

> Further reading suggests that maybe the right approach is to wait for
> IndexWriter to insert cleanup as the need arises.  Does that sound right?

Exactly, simple answer: don't optimize! But you should still review your
heap paremeters, as there seems to be something very misconfigured in your
system (see above).

> Is my read that the system will prevent excessive segments and cleanup
> deleted documents on its own a reasonable assessment of state of Lucene in
> 3.4 or 4.0?  Thus I don't really need to call optimize()/maybeMerge().
> If I skip optimize()/ maybeMerge() am I missing anything I should be doing
to
> keep the index 'tidy'?  What I don't want is to find the index runs slow
only after
> months of many document updates.

A "tidy" index is not necessarily one with one segment and no deletions.
That's just a "compact" representation, but not always the fastest. The
slowdown by forceMerge(1) is much worse than the little overhead by
deletions and multiple segments. If you use the new IndexSearcher ctors with
ExecutorService, an unoptimized index may even be faster on searching, as it
can handle the segment searches in different threads!

The slowdowns of older Lucene versions with unoptimized indexes are solved
since 2.9, because searches are running on every segment privately and no
longer on a "MultiReader"-like approach that needs to merge terms and
postings on the fly (which causes slowness in pre-2.9).

More info about not using optimizing here: http://goo.gl/Dl1jO

> Any discussion by those with similar indexes would be welcome.
> 
> -Paul
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: deprecated optimize()!

Reply via email to