Hi Mike, This is a very belated reply, but I just wanted to say that I really appreciate your comments -- this has been a very helpful and informative discussion! (-:
Thanks, Chris On Thu, Jul 23, 2009 at 10:50 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Thu, Jul 23, 2009 at 10:03 AM, Nigel<nigelspl...@gmail.com> wrote: > > > Mike, the question you raise is whether (or to what degree) the OS will > swap > > out app memory in favor of IO cache. I don't know anything about how the > > Linux kernel makes those decisions, but I guess I had hoped that > (regardless > > of the swappiness setting) it would be less likely to swap out > application > > memory for IO, than it would be to replace some cached IO data with some > > different cached IO data. > > I think swappiness is exactly the configuration that tells Linux just > how happily it should swapp out application memory for IO cache vs > other IO cache for new IO cache. > > > The latter case is what kills Lucene performance > > when you've got a lot of index data in the IO cache and a file copy or > some > > other operation replaces it all with something else: the OS has no way of > > knowing that some IO cache is more desirable long-term than other IO > > cache. > > I agree that hurts Lucene, but the former also hurts Lucene. EG if > the OS swaps out our norms, terms index, deleted docs, field cache, > then that's gonna hurt search performance. You hit maybe 10 page faults > and suddenly you're looking at an unacceptable increase in the search > latency. > > For a dedicated search box (your case) it'd be great to wire these > pages (or, set swappiness to 0 and make sure you have plenty of RAM, > which is supposed to do the same thing I believe). > > > The former case (swapping app for IO cache) makes sense, I suppose, if > the > > app memory hasn't been used in a long time, but with an LRU cache you > should > > be hitting those pages pretty frequently by definition. > > EG if your terms index is large, I bet many pages will be seen by the > OS as rarely used. We do a binary search through it... so the upper > levels of that binary search tree are frequently hit, but the lower > levels will be much less frequently hit. I can see the OS happily > swapping out big chunks of the terms dict index. And it's quite costly > because we don't have good locality in how we access it (except > towards the very end of the binary search). > > > But if it does swap out your Java cache for something else, you're > > probably no worse off than before, right? In this case you have to > > hit the disk to fault in the paged-out cache; in the original case > > you have to hit the disk to read the index data that's not in IO > > cache. > > Hard to say... if it swaps out the postings, since we tend to access > them sequentially, we have good locality and so swapping back in > should be faster (I *think*). I guess norms, field cache and deleted > docs also have good locality. Though... I'm actually not sure how > effectively VM systems take advantage of locality when page faults are > hit. > > > Anyway, the interaction between these things (virtual memory, IO cache, > > disk, JVM, garbage collection, etc.) are complex and so the optimal > > configuration is very usage-dependent. The current Lucene behavior seems > to > > be the most flexible. When/if I get a chance to try the Java caching for > > our situation I'll report the results. > > I think the biggest low-hanging-fruit in this area would be an > optional JNI-based extension to Lucene that'd allow merging to tell > the OS *not* to cache the bytes that we are reading, and to optimizing > those file descriptors for sequential access (eg do aggressive > readahead). It's a nightmare that a big segment merge can evict not > only IO cache but also (with the default swappiness on most Linux > distros) evict our in-RAM caches too! > > Mike >