[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

JIRA Thu, 10 Dec 2009 12:26:42 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788896#action_12788896
 ]


Christian Kohlschütter commented on LUCENE-2133:
------------------------------------------------

bq. I don't know that back compat is really a concern if we are just leaving 
the old API intact as part of that, with its own caching mechanism? 

I don't think that this is an option. FieldCache is fundamentally flawed and 
already creates a lot of trouble that has only been somehow fixed recently 
(FieldCacheKey, Insanity checks).

FieldCache, the static "registry" of caches sort of violates fundamental OOP 
concepts -- I mean, almost *all* methods require IndexReader as the first 
parameter. Since we allow IndexReader composition and decoration this 
apparently was not the right way to go because it exactly caused the problems 
that have later been addressed by the aformentioned FieldCacheKey and insanity 
checks.

bq. Just deprecate the old API, and make a new one. This is a big pain, because 
you have to be sure you don't straddle the two apis on upgrading, but thats the 
boat we will be in anyway.

That is always an option, but as you see Uwe's initial reaction I think it is 
not so easy to convince everybody. Which is fine, because we now have a 
solution that is somehow intermediate between the two extremes (keeping as it 
is vs. complete rework) and still is completely bw-compatible.
With a "real" complete overhaul of FieldCache, I imagine a lot of additional 
work to be then done for other projects such as SOLR, Nutch etc., which I would 
rather see not to happen abruptly.

The IndexCache abstraction allows us to separate the two issues 1: "make caches 
non-static" and 2: "make the fieldcache snappier" very easily. We start with 
issue 1 here in LUCENE-2133, and then develop a new FancyFieldCache in 
LUCENE-xyz (which can be used in parallel to the then-old IndexFieldCache).

In parallel we can integrate Michael's, Earwin's and Yonik's suggestions from 
LUCENE-2135 without starting the discussion from the beginning.

I am not suggesting to take the current solution as a "temporary workaround", 
but as a base for future work. Anything else would make no sense indeed.

bq. Since I don't see that this provides anything over 831, I don't see how its 
not in the same boat.

LUCENE-831 still requires a static FieldCache, the root of all evil :) It 
addresses orthogonal problems (ValueSource, Tries etc.).
Finally, to cite yourself from LUCENE-831: "It probably makes sense to start 
from one of Hoss's original patches or even from scratch."

bq. I'm not sure we should target a specific release with this - we don't even 
know when 3.1 is going to happen. 2.9 took a year. Its anybodies guess - we 
should prob just do what makes sense and commit it when its ready.

The more complex the patches are, the longer it will take to integrate them 
into a new version.
The more such patches you have, the longer it will take to get to a new release.

Let's make it simple, submit what we have and build upon that.

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -------------------------------------------------------------------------
>
>                 Key: LUCENE-2133
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2133
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9.1, 3.0
>            Reporter: Christian Kohlschütter
>         Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparator (=> several package-level classes)
> Final notes:
> - The patch would be simpler if no backwards compatibility was necessary. The 
> Lucene community has to decide which classes/methods can immediately be 
> removed, which ones later, which not at all. Whenever new classes depend on 
> the old ones, an appropriate notice exists in the javadocs.
> - The patch introduces a new, deprecated class 
> IndexFieldCacheSanityChecker.java which is just there for testing purposes, 
> to show that no sanity checks are necessary any longer. This class may be 
> removed at any time.
> - I expect that the patch does not impact performance. On the contrary, as 
> the patch removes a few unnecessary checks we might even see a slight 
> speedup. No benchmarking has been done so far, though.
> - I have tried to preserve the existing functionality wherever possible and 
> to focus on the class/method structure only. We certainly may improve the 
> caches' behavior, but this out of scope for this patch.
> - The refactoring finally makes the high duplication of code visible: For all 
> supported atomic types (byte, double, float, int, long, short) three classes 
> each are required: *Cache, *Comparator and *Parser. I think that further 
> simplification might be possible (maybe using Java generics?), but I guess 
> the current patch is large enough for now.
> Cheers,
> Christian

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

Reply via email to