[ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Christian Kohlschütter updated LUCENE-2133: ------------------------------------------- Attachment: demo.patch Patch for demo code (src/demo), may be applied after approval of main patch > [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField > ------------------------------------------------------------------------- > > Key: LUCENE-2133 > URL: https://issues.apache.org/jira/browse/LUCENE-2133 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 3.1 > Reporter: Christian Kohlschütter > Attachments: contrib.patch, demo.patch, java.patch, test.patch > > > Hi all, > up to the current version Lucene contains a conceptual flaw, that is the > FieldCache. The FieldCache is a singleton which is supposed to cache certain > information for every IndexReader that is currently open > The FieldCache is flawed because it is incorrect to assume that: > 1. one IndexReader instance equals one index. In fact, there can be many > clones (of SegmentReader) or decorators (FilterIndexReader) which all access > the very same data. > 2. the cache information remains valid for the lifetime of an IndexReader. In > fact, some IndexReaders may be reopen()'ed and thus they may contain > completely different information. > 3. all IndexReaders need the same type of cache. In fact, because of the > limitations imposed by the singleton construct there was no implementation > other than FieldCacheImpl. > Furthermore, FieldCacheImpl and FieldComparator are bloated by several static > inner-classes that could be moved to package level. > There have been a few attempts to improve FieldCache, namely LUCENE-831, > LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: > There is a central registry for assigning Caches to IndexReader instances. > I now propose the following: > 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, > extensible cache instances ("IndexCache"). IndexCaches provide common caching > functionality for all IndexReaders and may be extended (for example, > SegmentReader would have a SegmentReaderIndexCache and store different data > than a regular IndexCache) > 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. > IndexFieldCache is an interface just like FieldCache and may support > different implementations. > 3. The IndexCache instances may be flushed/closed by the associated > IndexReaders whenever necessary. > 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected > (or at least, they do not impact the overall performance) > 5. Refactor FieldCacheImpl and the related classes (FieldComparator, > SortField) > I have provided an patch which takes care of all these issues. It passes all > JUnit tests. > The patch is quite large, admittedly, but the change required several > modifications and some more to preserve backwards-compatibility. > Backwards-compatibility is preserved by moving some of the updated > functionality in the package org.apache.lucene.search.fields (field > comparators and parsers, SortField) while adding wrapper instances and > keeping old code in org.apache.lucene.search. > In detail and besides the above mentioned improvements, the following is > provided: > 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved > from SegmentReader to SegmentReaderIndexCache. > 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the > close() method to all registered instances by calling an onClose() method > with the threads' instances. > 3. Analyzer.close now may throw an IOException (this already is covered by > java.io.Closeable). > 4. A change to Collector: allow IndexCache instead of IndexReader being > passed to setNextReader() > 5. SortField's numeric types have been replaced by direct assignments of > FieldComparatorSource. This removes the "switch" statements and the > possibility to throw IllegalArgumentExceptions because of unsupported type > values. > The following classes have been deprecated and replaced by new classes in > org.apache.lucene.search.fields: > - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter) > - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter) > - FieldCache (=> IndexFieldCache) > - FieldCacheImpl (=> IndexFieldCacheImpl) > - all classes in FieldCacheImpl (=> several package-level classes) > - all subclasses of FieldComparator (=> several package-level classes) > Final notes: > - The patch would be simpler if no backwards compatibility was necessary. The > Lucene community has to decide which classes/methods can immediately be > removed, which ones later, which not at all. Whenever new classes depend on > the old ones, an appropriate notice exists in the javadocs. > - The patch introduces a new, deprecated class > IndexFieldCacheSanityChecker.java which is just there for testing purposes, > to show that no sanity checks are necessary any longer. This class may be > removed at any time. > - I expect that the patch does not impact performance. On the contrary, as > the patch removes a few unnecessary checks we might even see a slight > speedup. No benchmarking has been done so far, though. > - I have tried to preserve the existing functionality wherever possible and > to focus on the class/method structure only. We certainly may improve the > caches' behavior, but this out of scope for this patch. > - The refactoring finally makes the high duplication of code visible: For all > supported atomic types (byte, double, float, int, long, short) three classes > each are required: *Cache, *Comparator and *Parser. I think that further > simplification might be possible (maybe using Java generics?), but I guess > the current patch is large enough for now. > Cheers, > Christian -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org