Robert Engels wrote:
Attached are files that dramatically improve the searching performance
(2x improvement on several hardware configurations!) in a multithreaded,
high concurrency environment.
This looks like some good stuff! Can you perhaps break it down into
independent, layered patches? That way it would be easier to discuss
and integrate them.
The change has 3 parts:
1) remove synchronization required in SegmentReader document. This
required changes to FieldsReader to handle concurrent access.
This makes good sense. Stylistically, I would prefer the cloning be
done in ThreadLocal.initialValue(). That way if another method ever
needs the input streams the cloning code need not be altered.
2) change FSDirectory to use a 'nio' to improve concurrency. Changed to
use NioFile. This class has some workaround because under Windows, the
FileChannel is not fully reentrant, and so allocates multiple handles
per physical file - this code can be removed under non-Windows
systems. This also required changes to InputStream to allow for reading
at a direct offset.
Could you please explore making this a new Directory class, extending
rather than replacing FSDirectory? That would make it easier for folks
to evaluate. Look at MMapDirectory for an example.
Also, did you compare the performance of this to MMapDirectory? That
already uses nio, and should thus avoid the thread contention of
FSDirectory. However it does not scale well on 32-bit machines whose
address space limits indexes to 4GB.
Finally, for Windows-specific code, you can check
org.apache.lucene.util.Constants.WINDOWS at runtime.
3) move disk buffering into the Java layer to avoid the overhead of OS
calls. The buffer percentage can be configured to store the entire index
in memory. Running with as little as a 10% cache, the performance is
dramatically improved. Reading larger blocks also improves the
performance in most cases, but can actually degrade performance if doing
very small reads. Using the cache implies that you have configured the
JVM to have as much heap space available as the percent of index size on
the disk. The NioFile can be easily changed to use a "soft" cache to
avoid the potential of OutOfMemoryExceptions.
It would be nice if this functionality could be layered on any
Directory. Did you consider making a CachingDirectory that one can wrap
around an existing Directory implementation, that keeps an LRU cache of
data? Even 10% by default will probably break a lot of applications.
At the Internet Archive I frequently search indexes 100GB gigabyte
indexes on machines with just 1GB of RAM. So I am leery of enabling
this by default.
Cheers,
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]