Kan Deng wrote:
1. Performance.
Since all the cached disk data resides outside JVM
heap space, the access efficiency from Java object to
those cached data cannot be too high.
True, but you need to compare the relative speeds. If data has to be
pulled from a file, then you're talking several milliseconds to fetch
from the disk. If it's in the OS's cache (and here I'm rather assuming
Linux since that's what I know about) you're talking about microseconds
rather than milliseconds to fetch the data from the OS. Once the data
is in the JVM, but not in the CPU cache, then you're down to nanosecods
to get the data from main memory (how many depends on the hardware; some
platforms take a while to get the data moving but when it comes, it's
very quick; some systems are fast to get going but don't have the
throughput). It's not the absolute times that are important though:
once you've got the data in the OS's cache then things like network
latency, display update speed and scheduling overheads begin to make
themselves felt and you won't make these any less by getting data into
the JVM's memory. Well, not much anyway.
2. Volatile.
Since the OS caches the disk data in a common area
shared by multiple processes, but not only JVM. If
there are other processes doing disk IO at the same
time, chances are the cached Lucene index data from
disk may be wiped.
What you can do by hanging on to a lot of memory is make the overall
machine performance worse. In fact by denying other processes memory,
you're going to force up the I/O rate and when you do need to go to the
disk then it'll take much longer -- net result, things run slower.
Generally speaking, because the OS has a more holistic view of resource
management, you'll get better overall performance.
Therefore, a more reliable and efficient cache should
reside inside JVM heap space. But due to the crowded
JVM heap space, we have to manually "evict" the less
frequently used data from the cache.
It's that last sentence that is the critical one. Yes, you can do your
own cache management, but how much better are you going to be than the
OS? Well, you _can_ be a lot better since you know what you're
doing. You can also be a _lot_ worse when you get it wrong. Choosing
the right point to flush data from the cache ("evict") is not all that
straightforward: the OS buffer cache was introduced into BSD unix in the
early '80s and we're still seeing work going on to improve the basic
strategy 20-odd years later.
If you find that you're spending an inordinate amount of time waiting
for I/O for the index from the OS, then that it the time to start
looking at caching strategies. My own feeling is that you're going to
find easier things to fix before you get that far.
Did I mis-understand anything?
Probably not, it's just that performance is more of an holistic approach
and an obvious, isolated, change isn't going to have the effect that you
want.
jch
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]