Kan Deng wrote:

1. Performance.
  Since all the cached disk data resides outside JVM
heap space, the access efficiency from Java object to
those cached data cannot be too high.
True, but you need to compare the relative speeds. If data has to be pulled from a file, then you're talking several milliseconds to fetch from the disk. If it's in the OS's cache (and here I'm rather assuming Linux since that's what I know about) you're talking about microseconds rather than milliseconds to fetch the data from the OS. Once the data is in the JVM, but not in the CPU cache, then you're down to nanosecods to get the data from main memory (how many depends on the hardware; some platforms take a while to get the data moving but when it comes, it's very quick; some systems are fast to get going but don't have the throughput). It's not the absolute times that are important though: once you've got the data in the OS's cache then things like network latency, display update speed and scheduling overheads begin to make themselves felt and you won't make these any less by getting data into the JVM's memory. Well, not much anyway.

2. Volatile.

  Since the OS caches the disk data in a common area
shared by multiple processes, but not only JVM. If
there are other processes doing disk IO at the same
time, chances are the cached Lucene index data from
disk may be wiped.
What you can do by hanging on to a lot of memory is make the overall machine performance worse. In fact by denying other processes memory, you're going to force up the I/O rate and when you do need to go to the disk then it'll take much longer -- net result, things run slower. Generally speaking, because the OS has a more holistic view of resource management, you'll get better overall performance.

Therefore, a more reliable and efficient cache should
reside inside JVM heap space. But due to the crowded
JVM heap space, we have to manually "evict" the less
frequently used data from the cache.
It's that last sentence that is the critical one. Yes, you can do your own cache management, but how much better are you going to be than the OS? Well, you _can_ be a lot better since you know what you're doing. You can also be a _lot_ worse when you get it wrong. Choosing the right point to flush data from the cache ("evict") is not all that straightforward: the OS buffer cache was introduced into BSD unix in the early '80s and we're still seeing work going on to improve the basic strategy 20-odd years later.

If you find that you're spending an inordinate amount of time waiting for I/O for the index from the OS, then that it the time to start looking at caching strategies. My own feeling is that you're going to find easier things to fix before you get that far.

Did I mis-understand anything?
Probably not, it's just that performance is more of an holistic approach and an obvious, isolated, change isn't going to have the effect that you want.

jch


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to