[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Mark Miller (JIRA) Thu, 16 Apr 2009 14:36:40 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699880#action_12699880
 ]


Mark Miller commented on LUCENE-831:
------------------------------------

Okay, now that I half way understand this issue, I think I have to go back to 
the basic motivations. The original big win was taken away by 1483, so lets see 
if we really need a new API for the wins we have left.

h3. Advantage of new API (kind of as it is in the patch)
FieldCache is interface and it would be nice to move to abstract class, 
ExtendedFieldCache is ugly
Avoid global sync by IndexReader to access cache
its easier/cleaner to block caching by multireaders (though I am almost 
thinking I would prefer warnings/advice about performance and encouragement to 
move to per segment)
It becomes easier to share a ValueSource instance across readers.

h3. Disadvantages of new API
If we want only SegmentReaders to have a ValueSource, you can't efficiently 
back the old API with the new, causing RAM reqs jumps if you straddle the two 
APIs and ask for the same array data from each.

Its probably a higher barrier to a custom Parser to implement and init a Reader 
with a ValueSource (presumably that works per field) than to simply pass the 
Parser on a SortField. However, Parser stops making sense if we end up being 
able to back ValueSource with column stride fields. We could allow ValueSource 
to be passed on the SortField (the current incarnation of this patch), but then 
you have to go back to a global cache by reader the ValueSources passed that 
way (you would also still have the per segment reader, settable ValueSource).

h3. Advantages of staying with old API
Avoid forcing large migration for users, with possible RAM req penalties if 
they don't switch from deprecated code (we are doing something similar with 
1483 even without deprecated code though - if you were using an external 
multireader FieldCache that matched a sort FieldCache key, youd double your RAM 
reqs).

h3. Thoughts
If we stayed with the old API, we could still allow a custom FieldCache to be 
supplied. We could still back FieldCacheImpl with Uninverter to reduce code. We 
could still have CachingFieldCache. Though CachingValueSource is much better :) 
FieldCache implies caching, and so the name would be confusing. We could also 
avoid CachingFieldCache though, as just making a pluggable FieldCache would 
allow alternate caching implementations (with a bit more effort).

We could deprecate the Parser methods and force supplying a new FieldCache impl 
for custom uninversion to get to an API suitable to be backed by CSF.

Or:

We could also move to ValueSource, but allow a ValueSource on multi-readers. 
That would probably make straddling the API's much more possible (and 
efficient) in the default case. We could advise that its best to work per 
segment, but leave the option to the user.

h3. Conclusion
I am not sure. I thought I was convinced we might as well not even move from 
FieldCache at all, but now that I've written a bit out, I'm thinking it would 
be worth going to ValueSource. I'm just not positive on what we should support. 
SortField ValueSource override keyed by reader? ValueSources on MultiReaders?

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
> LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Reply via email to