[
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699880#action_12699880
]
Mark Miller commented on LUCENE-831:
------------------------------------
Okay, now that I half way understand this issue, I think I have to go back to
the basic motivations. The original big win was taken away by 1483, so lets see
if we really need a new API for the wins we have left.
h3. Advantage of new API (kind of as it is in the patch)
FieldCache is interface and it would be nice to move to abstract class,
ExtendedFieldCache is ugly
Avoid global sync by IndexReader to access cache
its easier/cleaner to block caching by multireaders (though I am almost
thinking I would prefer warnings/advice about performance and encouragement to
move to per segment)
It becomes easier to share a ValueSource instance across readers.
h3. Disadvantages of new API
If we want only SegmentReaders to have a ValueSource, you can't efficiently
back the old API with the new, causing RAM reqs jumps if you straddle the two
APIs and ask for the same array data from each.
Its probably a higher barrier to a custom Parser to implement and init a Reader
with a ValueSource (presumably that works per field) than to simply pass the
Parser on a SortField. However, Parser stops making sense if we end up being
able to back ValueSource with column stride fields. We could allow ValueSource
to be passed on the SortField (the current incarnation of this patch), but then
you have to go back to a global cache by reader the ValueSources passed that
way (you would also still have the per segment reader, settable ValueSource).
h3. Advantages of staying with old API
Avoid forcing large migration for users, with possible RAM req penalties if
they don't switch from deprecated code (we are doing something similar with
1483 even without deprecated code though - if you were using an external
multireader FieldCache that matched a sort FieldCache key, youd double your RAM
reqs).
h3. Thoughts
If we stayed with the old API, we could still allow a custom FieldCache to be
supplied. We could still back FieldCacheImpl with Uninverter to reduce code. We
could still have CachingFieldCache. Though CachingValueSource is much better :)
FieldCache implies caching, and so the name would be confusing. We could also
avoid CachingFieldCache though, as just making a pluggable FieldCache would
allow alternate caching implementations (with a bit more effort).
We could deprecate the Parser methods and force supplying a new FieldCache impl
for custom uninversion to get to an API suitable to be backed by CSF.
Or:
We could also move to ValueSource, but allow a ValueSource on multi-readers.
That would probably make straddling the API's much more possible (and
efficient) in the default case. We could advise that its best to work per
segment, but leave the option to the user.
h3. Conclusion
I am not sure. I thought I was convinced we might as well not even move from
FieldCache at all, but now that I've written a bit out, I'm thinking it would
be worth going to ValueSource. I'm just not positive on what we should support.
SortField ValueSource override keyed by reader? ValueSources on MultiReaders?
> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Reporter: Hoss Man
> Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff,
> fieldcache-overhaul.diff, fieldcache-overhaul.diff,
> LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff,
> LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch,
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch,
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch,
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch,
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed.
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]