John, thanks a lot for your excellent reply. Especially, I think this sentence is very convincing,
> "Well, you _can_ be a lot better since you know what you're > doing. You can also be a _lot_ worse when you get it wrong. With such a high risk, probably I should try other tricks to improve the system performance, before rushing into the implementation of cache. thanks again, Kan --- John Haxby <[EMAIL PROTECTED]> wrote: > Kan Deng wrote: > > >1. Performance. > > > > Since all the cached disk data resides outside > JVM > >heap space, the access efficiency from Java object > to > >those cached data cannot be too high. > > > > > True, but you need to compare the relative speeds. > If data has to be > pulled from a file, then you're talking several > milliseconds to fetch > from the disk. If it's in the OS's cache (and here > I'm rather assuming > Linux since that's what I know about) you're talking > about microseconds > rather than milliseconds to fetch the data from the > OS. Once the data > is in the JVM, but not in the CPU cache, then you're > down to nanosecods > to get the data from main memory (how many depends > on the hardware; some > platforms take a while to get the data moving but > when it comes, it's > very quick; some systems are fast to get going but > don't have the > throughput). It's not the absolute times that are > important though: > once you've got the data in the OS's cache then > things like network > latency, display update speed and scheduling > overheads begin to make > themselves felt and you won't make these any less by > getting data into > the JVM's memory. Well, not much anyway. > > >2. Volatile. > > > > Since the OS caches the disk data in a common > area > >shared by multiple processes, but not only JVM. If > >there are other processes doing disk IO at the same > >time, chances are the cached Lucene index data from > >disk may be wiped. > > > > > What you can do by hanging on to a lot of memory is > make the overall > machine performance worse. In fact by denying other > processes memory, > you're going to force up the I/O rate and when you > do need to go to the > disk then it'll take much longer -- net result, > things run slower. > Generally speaking, because the OS has a more > holistic view of resource > management, you'll get better overall performance. > > >Therefore, a more reliable and efficient cache > should > >reside inside JVM heap space. But due to the > crowded > >JVM heap space, we have to manually "evict" the > less > >frequently used data from the cache. > > > > > It's that last sentence that is the critical one. > Yes, you can do your > own cache management, but how much better are you > going to be than the > OS? Well, you _can_ be a lot better since you > know what you're > doing. You can also be a _lot_ worse when you get > it wrong. Choosing > the right point to flush data from the cache > ("evict") is not all that > straightforward: the OS buffer cache was introduced > into BSD unix in the > early '80s and we're still seeing work going on to > improve the basic > strategy 20-odd years later. > > If you find that you're spending an inordinate > amount of time waiting > for I/O for the index from the OS, then that it the > time to start > looking at caching strategies. My own feeling is > that you're going to > find easier things to fix before you get that far. > > >Did I mis-understand anything? > > > > > Probably not, it's just that performance is more of > an holistic approach > and an obvious, isolated, change isn't going to have > the effect that you > want. > > jch > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]