I will look at separating it out. I wanted to get initial feedback before moving on.
1. I agree that the initialValue() is the way to go. I'll make the changes. 2. I agree that creating NioFSDirectory rather than modifying FSDirectory. I originally felt the memory mapped files would be the fastest, but it also requires OS calls, the "caching" code is CONSIDERABLY faster, since it does not need to do any JNI, or make OS calls. 3. I think a "simple" fix for the case you cite, is to add an additional 'max size' parameter, which controls the maximum size of the cache for each 'segment file', so using the mergeFactor, and compound files, you can easily compute what this max would be based on available memory and expected index size (number of files). The problem with a SoftCache and indices of that size, is that the JVM memory consumption would still grow to the limit before it discarded anything (which may be ideal in some cases). As for creating a CachingDirectory that can cache any directory that should be feasible as well, but I am not sure it would perform as well as the direct internal cache version. Robert -----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 25, 2005 4:20 PM To: java-dev@lucene.apache.org Subject: Re: major searching performance improvement Robert Engels wrote: > Attached are files that dramatically improve the searching performance > (2x improvement on several hardware configurations!) in a multithreaded, > high concurrency environment. This looks like some good stuff! Can you perhaps break it down into independent, layered patches? That way it would be easier to discuss and integrate them. > The change has 3 parts: > > 1) remove synchronization required in SegmentReader document. This > required changes to FieldsReader to handle concurrent access. This makes good sense. Stylistically, I would prefer the cloning be done in ThreadLocal.initialValue(). That way if another method ever needs the input streams the cloning code need not be altered. > 2) change FSDirectory to use a 'nio' to improve concurrency. Changed to > use NioFile. This class has some workaround because under Windows, the > FileChannel is not fully reentrant, and so allocates multiple handles > per physical file - this code can be removed under non-Windows > systems. This also required changes to InputStream to allow for reading > at a direct offset. Could you please explore making this a new Directory class, extending rather than replacing FSDirectory? That would make it easier for folks to evaluate. Look at MMapDirectory for an example. Also, did you compare the performance of this to MMapDirectory? That already uses nio, and should thus avoid the thread contention of FSDirectory. However it does not scale well on 32-bit machines whose address space limits indexes to 4GB. Finally, for Windows-specific code, you can check org.apache.lucene.util.Constants.WINDOWS at runtime. > 3) move disk buffering into the Java layer to avoid the overhead of OS > calls. The buffer percentage can be configured to store the entire index > in memory. Running with as little as a 10% cache, the performance is > dramatically improved. Reading larger blocks also improves the > performance in most cases, but can actually degrade performance if doing > very small reads. Using the cache implies that you have configured the > JVM to have as much heap space available as the percent of index size on > the disk. The NioFile can be easily changed to use a "soft" cache to > avoid the potential of OutOfMemoryExceptions. It would be nice if this functionality could be layered on any Directory. Did you consider making a CachingDirectory that one can wrap around an existing Directory implementation, that keeps an LRU cache of data? Even 10% by default will probably break a lot of applications. At the Internet Archive I frequently search indexes 100GB gigabyte indexes on machines with just 1GB of RAM. So I am leery of enabling this by default. Cheers, Doug --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]