[
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787759#action_12787759
]
Christian Kohlschütter commented on LUCENE-2133:
------------------------------------------------
bq. And what about the doubling up insanity? It looks like you just commented
out that check? It appears to me that thats still an issue we want to check for
- we want to make sure Lucene core and users have a way to be sure they are not
using a toplevel reader and its sub readers for caches unless they really
intend to.
The new IndexFieldCacheSanityChecker can be removed out-of-the-box (as
documented in the class itself). It is just kind of a "showcase" class for this
issue.
The section I commented out is non-functional with the new change since there
is no more ReaderKey in IndexFieldCache.CacheEntry.
> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -------------------------------------------------------------------------
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9.1, 3.0
> Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch,
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the
> FieldCache. The FieldCache is a singleton which is supposed to cache certain
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In
> fact, some IndexReaders may be reopen()'ed and thus they may contain
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the
> limitations imposed by the singleton construct there was no implementation
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831,
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same:
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific,
> extensible cache instances ("IndexCache"). IndexCaches provide common caching
> functionality for all IndexReaders and may be extended (for example,
> SegmentReader would have a SegmentReaderIndexCache and store different data
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache.
> IndexFieldCache is an interface just like FieldCache and may support
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator,
> SortField)
> I have provided an patch which takes care of all these issues. It passes all
> JUnit tests.
> The patch is quite large, admittedly, but the change required several
> modifications and some more to preserve backwards-compatibility.
> Backwards-compatibility is preserved by moving some of the updated
> functionality in the package org.apache.lucene.search.fields (field
> comparators and parsers, SortField) while adding wrapper instances and
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the
> close() method to all registered instances by calling an onClose() method
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of
> FieldComparatorSource. This removes the "switch" statements and the
> possibility to throw IllegalArgumentExceptions because of unsupported type
> values.
> The following classes have been deprecated and replaced by new classes in
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparator (=> several package-level classes)
> Final notes:
> - The patch would be simpler if no backwards compatibility was necessary. The
> Lucene community has to decide which classes/methods can immediately be
> removed, which ones later, which not at all. Whenever new classes depend on
> the old ones, an appropriate notice exists in the javadocs.
> - The patch introduces a new, deprecated class
> IndexFieldCacheSanityChecker.java which is just there for testing purposes,
> to show that no sanity checks are necessary any longer. This class may be
> removed at any time.
> - I expect that the patch does not impact performance. On the contrary, as
> the patch removes a few unnecessary checks we might even see a slight
> speedup. No benchmarking has been done so far, though.
> - I have tried to preserve the existing functionality wherever possible and
> to focus on the class/method structure only. We certainly may improve the
> caches' behavior, but this out of scope for this patch.
> - The refactoring finally makes the high duplication of code visible: For all
> supported atomic types (byte, double, float, int, long, short) three classes
> each are required: *Cache, *Comparator and *Parser. I think that further
> simplification might be possible (maybe using Java generics?), but I guess
> the current patch is large enough for now.
> Cheers,
> Christian
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]