As a follow-up...

The real performance benefit comes in a shared server environment, where the
Lucene process runs along side other processes - i.e. competes for the use
of the OS file cache. Since the Lucene process can be configured with a
dedicated memory pool, using facilities like NioFile allows for an large
dedicated application cache - similar to how databases buffer data/index
blocks and don't rely on the OS to do so.

If the Lucene process (we wrap Lucene in a server "process") is the "only"
process on the server, the OS cache will likely perform well-enough for most
applications.

I will attempt to get some performance numbers using/not using NioFile
performing actual Lucene queries.
  -----Original Message-----
  From: Robert Engels [mailto:[EMAIL PROTECTED]
  Sent: Thursday, December 08, 2005 10:37 AM
  To: Lucene-Dev
  Subject: NioFile cache performance


  I finally got around to writing a testcase to verify the numbers I
presented. The following testcase and results are for the lowest level disk
operations. On my machine reading from the cache, vs. going to disk (even
when the data is in the OS cache) is 30%-40% faster. Since Lucene makes
extensive use of disk IO and often reads the same data (e.g. reading the
terms), a localized user-level cache can provide significant performance
benefits.

  Using a 4mb file (so I could be "guarantee" the disk data would be in the
OS cache as well), the test shows the following results.

  Most of the CPU time is actually used during the synchronization with
multiple threads. I hacked together a version of MemoryLRUCache that used a
ConcurrentHashMap from JDK 1.5, and it was another 50% faster ! At a
minimum, if the ReadWriteLock class was modified to use the 1.5 facilities
some significant additional performance gains should be realized.

  filesize is 4194304

  non-cached time = 10578, avg = 0.010578

  non-cached threaded (3 threads) time = 32094, avg = 0.010698

  cached time = 6125, avg = 0.006125

  cache hits 996365

  cache misses 3635

  cached threaded (3 threads) time = 20734, avg = 0.0069113333333333336

  cache hits 3989089

  cache misses 10911

  When using the shared test (which is more like the lucene usage, since a
single "file" is shared by multiple threads), the difference is even more
dramatic with multiple threads (since the cache size is effectively reduced
by the number of threads). This test also shows the value of using multiple
file handles when using multiple threads to read a single file (rather than
using a shared file handle).

  filesize is 4194304

  non-cached time = 10594, avg = 0.010594

  non-cached threaded (3 threads) time = 42110, avg = 0.014036666666666666

  cached time = 6047, avg = 0.006047

  cache hits 996827

  cache misses 3173

  cached threaded (3 threads) time = 20079, avg = 0.006693

  cache hits 3995776

  cache misses 4224

Reply via email to