[PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
-------------------------------------------------------------------------

                 Key: LUCENE-2133
                 URL: https://issues.apache.org/jira/browse/LUCENE-2133
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Search
    Affects Versions: 3.1
            Reporter: Christian Kohlschütter


Hi all,

up to the current version Lucene contains a conceptual flaw, that is the 
FieldCache. The FieldCache is a singleton which is supposed to cache certain 
information for every IndexReader that is currently open

The FieldCache is flawed because it is incorrect to assume that:
1. one IndexReader instance equals one index. In fact, there can be many clones 
(of SegmentReader) or decorators (FilterIndexReader) which all access the very 
same data.
2. the cache information remains valid for the lifetime of an IndexReader. In 
fact, some IndexReaders may be reopen()'ed and thus they may contain completely 
different information.
3. all IndexReaders need the same type of cache. In fact, because of the 
limitations imposed by the singleton construct there was no implementation 
other than FieldCacheImpl.

Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
inner-classes that could be moved to package level.

There have been a few attempts to improve FieldCache, namely LUCENE-831, 
LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There 
is a central registry for assigning Caches to IndexReader instances.

I now propose the following:
1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible 
cache instances ("IndexCache"). IndexCaches provide common caching 
functionality for all IndexReaders and may be extended (for example, 
SegmentReader would have a SegmentReaderIndexCache and store different data 
than a regular IndexCache)
2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
IndexFieldCache is an interface just like FieldCache and may support different 
implementations.
3. The IndexCache instances may be flushed/closed by the associated 
IndexReaders whenever necessary.
4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
(or at least, they do not impact the overall performance)
5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField) 

I have provided an patch which takes care of all these issues. It passes all 
JUnit tests.

The patch is quite large, admittedly, but the change required several 
modifications and some more to preserve backwards-compatibility. 
Backwards-compatibility is preserved by moving some of the updated 
functionality in the package org.apache.lucene.search.fields (field comparators 
and parsers, SortField) while adding wrapper instances and keeping old code in 
org.apache.lucene.search.

In detail and besides the above mentioned improvements, the following is 
provided:
1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
from SegmentReader to SegmentReaderIndexCache.
2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
close() method to all registered instances by calling an onClose() method with 
the threads' instances.
3. Analyzer.close now may throw an IOException (this already is covered by 
java.io.Closeable).
4. A change to Collector: allow IndexCache instead of IndexReader being passed 
to setNextReader()
5. SortField's numeric types have been replaced by direct assignments of 
FieldComparatorSource. This removes the "switch" statements and the possibility 
to throw IllegalArgumentExceptions because of unsupported type values.

The following classes have been deprecated and replaced by new classes in 
org.apache.lucene.search.fields:
- FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
- FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
- FieldCache (=> IndexFieldCache)
- FieldCacheImpl (=> IndexFieldCacheImpl)
- all classes in FieldCacheImpl (=> several package-level classes)
- all subclasses of FieldComparator (=> several package-level classes)

Final notes:
- The patch would be simpler if no backwards compatibility was necessary. The 
Lucene community has to decide which classes/methods can immediately be 
removed, which ones later, which not at all. Whenever new classes depend on the 
old ones, an appropriate notice exists in the javadocs.
- The patch introduces a new, deprecated class 
IndexFieldCacheSanityChecker.java which is just there for testing purposes, to 
show that no sanity checks are necessary any longer. This class may be removed 
at any time.
- I expect that the patch does not impact performance. On the contrary, as the 
patch removes a few unnecessary checks we might even see a slight speedup. No 
benchmarking has been done so far, though.
- I have tried to preserve the existing functionality wherever possible and to 
focus on the class/method structure only. We certainly may improve the caches' 
behavior, but this out of scope for this patch.
- The refactoring finally makes the high duplication of code visible: For all 
supported atomic types (byte, double, float, int, long, short) three classes 
each are required: *Cache, *Comparator and *Parser. I think that further 
simplification might be possible (maybe using Java generics?), but I guess the 
current patch is large enough for now.

Cheers,
Christian


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to