[
https://issues.apache.org/jira/browse/LUCENE-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914407#action_12914407
]
Michael McCandless commented on LUCENE-2665:
--------------------------------------------
Here are some of the problems w/ FC that I'd love to see fixed here:
* The source of values should be fully flexible/pluggable -- FC does
uninversion, CSF pulls the array image from the index, an app can
plugin its own source. All three of these "sources" should be
consumed via the same API (eg ByteValues w/ getValue(int docID)).
* Value lookup needs to be a method call, w/ optional
getBackingArray(), for (manual today, automatic tomorrow) code
spec.
* Uninversion is dangerous -- eg if you accidentally have multiple
values per field, they "silently" overwrite one another.
* When you do legitimately have multiple values (eg numeric fields),
the Parser interface is also too inflexible -- eg the exception to
stop visiting terms, the inabilty to specify which (users have
requested "first only" and "last only") of multiple values should
be kept, etc.
* Cache should be stored/accessible via the reader, not in separate
external WeakHashMap. The eviction policy should be fully
visible/controllable by the app (or maybe app optionally hands us
a cache impl/factory). There should be no static FC.DEFAULT that
we have today.
* Insanity shouldn't be allowed/possible -- it's just too dangerous
today that we allow this. We should at least make it really hard
to do, by accident (eg, like you must use SlowMultiReader to prove
your insanity). EG caching values @ the MultiReader level. Or,
LUCENE-2527 (fasterButMoreRAM true/false causing a double entry).
* The entries are too strongly tied to field names. I may want
virtual entries, not backed by a "real" field. EG, say I want to
do a "blended" sort, say mixing in recency with elevance... I
should be able name this "RelevanceAndRecency" (say), which is not
a real field. I back this w/ my own FloatValues impl, which
under-the-hood somehow combines the two "sources" and presents a
FloatValues interface. Then I should be able to pass a SortField
somehow referencing my dynamic/virtual field.
* Cannot support multiple values per doc (this is a future
nice-to-have-but-don't-preclude sort of thing)
With these fixes, flex scoring (LUCENE-2392), the per-doc stats
(unique term count, total term count, boost, etc.) should all become
pluggable value sources.
> Rework FieldCache to be more flexible/general
> ---------------------------------------------
>
> Key: LUCENE-2665
> URL: https://issues.apache.org/jira/browse/LUCENE-2665
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Reporter: Ryan McKinley
> Attachments: LUCENE-2665-FieldCacheOverhaul.patch
>
>
> The existing FieldCache implementation is very rigid and does not allow much
> flexibility. In trying to implement simple features, it points to much
> larger structural problems.
> This patch aims to take a fresh approach to how we work with the FieldCache.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]