As a follow-up... The real performance benefit comes in a shared server environment, where the Lucene process runs along side other processes - i.e. competes for the use of the OS file cache. Since the Lucene process can be configured with a dedicated memory pool, using facilities like NioFile allows for an large dedicated application cache - similar to how databases buffer data/index blocks and don't rely on the OS to do so.
If the Lucene process (we wrap Lucene in a server "process") is the "only" process on the server, the OS cache will likely perform well-enough for most applications. I will attempt to get some performance numbers using/not using NioFile performing actual Lucene queries. -----Original Message----- From: Robert Engels [mailto:[EMAIL PROTECTED] Sent: Thursday, December 08, 2005 10:37 AM To: Lucene-Dev Subject: NioFile cache performance I finally got around to writing a testcase to verify the numbers I presented. The following testcase and results are for the lowest level disk operations. On my machine reading from the cache, vs. going to disk (even when the data is in the OS cache) is 30%-40% faster. Since Lucene makes extensive use of disk IO and often reads the same data (e.g. reading the terms), a localized user-level cache can provide significant performance benefits. Using a 4mb file (so I could be "guarantee" the disk data would be in the OS cache as well), the test shows the following results. Most of the CPU time is actually used during the synchronization with multiple threads. I hacked together a version of MemoryLRUCache that used a ConcurrentHashMap from JDK 1.5, and it was another 50% faster ! At a minimum, if the ReadWriteLock class was modified to use the 1.5 facilities some significant additional performance gains should be realized. filesize is 4194304 non-cached time = 10578, avg = 0.010578 non-cached threaded (3 threads) time = 32094, avg = 0.010698 cached time = 6125, avg = 0.006125 cache hits 996365 cache misses 3635 cached threaded (3 threads) time = 20734, avg = 0.0069113333333333336 cache hits 3989089 cache misses 10911 When using the shared test (which is more like the lucene usage, since a single "file" is shared by multiple threads), the difference is even more dramatic with multiple threads (since the cache size is effectively reduced by the number of threads). This test also shows the value of using multiple file handles when using multiple threads to read a single file (rather than using a shared file handle). filesize is 4194304 non-cached time = 10594, avg = 0.010594 non-cached threaded (3 threads) time = 42110, avg = 0.014036666666666666 cached time = 6047, avg = 0.006047 cache hits 996827 cache misses 3173 cached threaded (3 threads) time = 20079, avg = 0.006693 cache hits 3995776 cache misses 4224