[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788863#action_12788863
 ] 

Christian Kohlschütter commented on LUCENE-2133:
------------------------------------------------

bq. This patch is a good step forward - it associates the cache directly
with IndexReader, where it belongs; it cleanly decouples cache from
reader (vs the hack we have today with IndexReader.getFieldCacheKey),
so that eg cloned readers can share the same cache; it also preserves
back compat, which is quite a stunning accomplishment

Yes, backwards compatibility was actually the driving factor for this patch.
I actually have not addressed major changes in functionality. This would 
definitely require rework that breaks backwards compatibility.
I just wanted to get rid of the static FieldCache, which caused me a lot of 
OOME headaches.

As I wrote above, I see this patch as a good starting point for further 
development. Imagine I had submitted a real, complete rework of the FieldCache 
like in LUCENE-831 -- it would be stuck as an open issue forever just like its 
predecessor.

There is nothing wrong with the current patch (that's why I suggest comitting 
it to trunk soon-ish) -- it does not break anything (it could even go into 3.1 
I guess).
Starting from a common codebase we can then later on extend it and address all 
the points you mentioned in Michael's most recent post:

bq. But... there are many more things I don't like about FieldCache, that I'm 
not sure the patch addresses:
bq. (snip) 

None of these (very important) issues have been addressed by the patch. 
Intentionally (as described).

bq. Some other questions about the patch:
bq.
bq.    * Consumers of the cache API (the sort comparator,
bq.      FieldCacheTerms/RangeFilter, and any other future users of "the
bq.      field cache") shouldn't have to move down into fields sub-package?

As you wish. I don't have problems with keep it in o.a.l.search. I was just a 
little scared about the plethora of classes in that package. Since we do not 
need to utilize package-level protection, it was actually possible to 
"encapsulate" that field-related functionality in another namespace. I just 
personally prefer to use many nested packages, I do not have problems of moving 
classes back to its parent package.

bq.    * It's a little strange that the term vectors & fields reader also
bq.      got pulled into the cache?

They are not in the FieldCache. Caching fields is only one functionality 
provided by IndexCache. I took the opportunity to demonstrate caching something 
other than fields. You now save a few bytes per SegmentReader instance because 
of that change. Maybe it would make sense to move SegmentReader.CoreReaders to 
SegmentReaderIndexCache completely. However, if I had included that as well 
into this issue, it would again be too large to be discussed in time for the 
next release.

If you have strong objections against moving these SegmentReader-specific 
things to the SegmentReaderIndexCache *now*, I can remove that part from the 
patch. However, I should probably then file another issue. Unfortunately, this 
will then depend on the outcome of this LUCENE-2133.

To summarize, I suggest:

1. Complete the additions to src/test (i.e. do not remove/modify the existing 
test cases but rather add new ones, as discussed previously)
2. Commit the patch to svn, target release for Lucene 3.1.
3. File another issue and discuss functional improvements for the 
IndexFieldCache part. Use Michael's wishlist as a starting point. Agree on 
breaking backwards compatibility, target for Lucene 4.0, 3.5 or whatever fancy 
version number. Incorporate things from LUCENE-831, LUCENE-1231 and also 
LUCENE-2135.
4. File a new issue for the improvements in SegmentReader (move things that are 
shared between the original reader and its clones to a common 
SegmentReaderIndexCache, such as CoreReader and ThreadLocals), keep backwards 
compatibility and target for Lucene 3.1.

What do you think?


> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -------------------------------------------------------------------------
>
>                 Key: LUCENE-2133
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2133
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9.1, 3.0
>            Reporter: Christian Kohlschütter
>         Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparator (=> several package-level classes)
> Final notes:
> - The patch would be simpler if no backwards compatibility was necessary. The 
> Lucene community has to decide which classes/methods can immediately be 
> removed, which ones later, which not at all. Whenever new classes depend on 
> the old ones, an appropriate notice exists in the javadocs.
> - The patch introduces a new, deprecated class 
> IndexFieldCacheSanityChecker.java which is just there for testing purposes, 
> to show that no sanity checks are necessary any longer. This class may be 
> removed at any time.
> - I expect that the patch does not impact performance. On the contrary, as 
> the patch removes a few unnecessary checks we might even see a slight 
> speedup. No benchmarking has been done so far, though.
> - I have tried to preserve the existing functionality wherever possible and 
> to focus on the class/method structure only. We certainly may improve the 
> caches' behavior, but this out of scope for this patch.
> - The refactoring finally makes the high duplication of code visible: For all 
> supported atomic types (byte, double, float, int, long, short) three classes 
> each are required: *Cache, *Comparator and *Parser. I think that further 
> simplification might be possible (maybe using Java generics?), but I guess 
> the current patch is large enough for now.
> Cheers,
> Christian

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to