Hi, > > > The first time my code used the 3.4 libraries with version level set > > > to 3.4 and it tried to optimize() (still using this now deprecated > > > old call), the new code > > went wild! > > > It took up more memory than the heap was limited to, so I believe it > > > is taking > > > up system resources. We have turned off optimize for now. > > > > Did it throw OutOfMemoryException? > > As you thought, there was No OOM Exception. > > >I assume not - but I assume, you have seen more virtual memory usage > >in "top", but that's not really related to optimize/forceMerge. > > I'm actually on a large Windows server box.
On Windows x64, Lucene will also automatically use memory mapping, too. What you see in Windows Task Manager is also virtual memory usage. If the virtual memory usage is much bigger than the actual heap -Xmx allocated that exactly coming from memory-mapped files. > It is only related to optimize() in that all memory is used _only_ when the > server goes for an optimize(). > Running through our web interface (which reuses its one IndexReader) doesn't > seem to drive up the memory (beyond use of the heap). If you optimize, the system is having the index files multiple times mapped into virtual address space. This is not heap space used (otherwise your system would OOM). > The problem is that when we run the optimize, the machine gets memory > limited and everything then runs slow (nearly preventing us from viewing what > going on) and the optimize() takes "forever" (=an extra 1.5 hrs). We are not > trying to share the machine, but it is a bit excessive. How big is you index and how big are system memory and how much is allocated to the JVM heap? Please note: Never allocate too much memory to the JVM, most parts of Lucene are running in file system cache (like the memory-mapped files), if you allocate too much -Xmx heap, the system will hit millions of page faults and the kernel needs to swap in/out the index files all the time. This affects optimize, as this transfers lots of data causing page-faults all over, stealing also mapped pages from your active IndexReader serving the searches. Depending on some factors (like sorting), you should only allocate very little heap (by testing and waiting for OOM) and keep the rest of the physical memory available for the FS cache. To enable the Mmapping of index files in Lucene 3.0.1, you can use "new MMapDirectory()" instead "FSDirectory.open()", then 3.0.1 will have the similar virtual memory usage like 3.4. You can also force Lucene 3.4 to the slower SimpleFSDirectory by instantiating that directly (which emulates Lucene 3.0.1 file access on Windows). But this will slow down your searches! > Further reading suggests that maybe the right approach is to wait for > IndexWriter to insert cleanup as the need arises. Does that sound right? Exactly, simple answer: don't optimize! But you should still review your heap paremeters, as there seems to be something very misconfigured in your system (see above). > Is my read that the system will prevent excessive segments and cleanup > deleted documents on its own a reasonable assessment of state of Lucene in > 3.4 or 4.0? Thus I don't really need to call optimize()/maybeMerge(). > If I skip optimize()/ maybeMerge() am I missing anything I should be doing to > keep the index 'tidy'? What I don't want is to find the index runs slow only after > months of many document updates. A "tidy" index is not necessarily one with one segment and no deletions. That's just a "compact" representation, but not always the fastest. The slowdown by forceMerge(1) is much worse than the little overhead by deletions and multiple segments. If you use the new IndexSearcher ctors with ExecutorService, an unoptimized index may even be faster on searching, as it can handle the segment searches in different threads! The slowdowns of older Lucene versions with unoptimized indexes are solved since 2.9, because searches are running on every segment privately and no longer on a "MultiReader"-like approach that needs to merge terms and postings on the fly (which causes slowness in pre-2.9). More info about not using optimizing here: http://goo.gl/Dl1jO > Any discussion by those with similar indexes would be welcome. > > -Paul > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org