[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Michael McCandless (JIRA) Sat, 18 Apr 2009 11:41:36 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700523#action_12700523
 ]


Michael McCandless commented on LUCENE-831:
-------------------------------------------

bq. Some of your comments seem to indicate you think we will need to end up 
with an object rather than raw arrays?

Well, really I threw out all these future items to stir up the pot and
see if some clarity comes out of it ;) This is what I try to do
whenever I'm stuck on how to design something... some sort of defense
mechanism.

That said, what requires object instead of array?  EG for binary
fields (deleted docs) we'd have eg "BitVector getBits(...)".

For multi-valued fields, I'm not sure what's best.  I think Yonik did
something neat with Solr for holding multi-valued fields but I can't
find it now.  But, with ValueSource, we have the freedom to use arrays
for simple cases and something else for interesting ones?  It's not
either/or?

bq. And we would want to lose exposing Parser so that CFS can be a seamless 
backing. 

I see the CFS/CSF confusion has already struck!

But yes cleaner API would be a nice step forward...

bq. We have it? Just pass the CSFValueSource at IndexReader creation?

Yes I think we have this one.

Though... I feel like ValueSource should represent a single field's
values, and something else (FieldType?) returns the ValueSource for
that field.  Ie, I think we are overloading ValueSource now?

bq. Good point. We need a way to update, that can throw USO Exception?

Maybe... or we can defer for future.  We don't need full answers nor
impls for all of these now...

{quote}
> Possible future when Lucene computes sort cache (for text fields)
> and stores in the index

I'm not familiar with that idea, so not sure what affect this has...
{quote}

Sort cache is just getStringIndex()... all other types just use the
values directly (no need for separate ords).  If it's costly to
compute per-reopen we may want to store it in the index.  But
honestly, since we load the full thing into RAM, I wonder how
different the time'd really be loading it vs recomputing it.

bq. Good point again. Getting norms under this API will add a bit more meat to 
this issue.

Yeah I'm not sure whether norms/deleted docs "fit"; certainly we'd
need updatability first.  It's just that, from a distance, they are
clearly a "value per doc" for every doc in the index.  If we had norms
& deletions under this API then suddenly, [almost] for free, we'd get
pluggability of deleted docs & norms.

bq. I am kind of liking Uwe's idea of assigning ValueSources per field, though 
that could probably get messy. Perhaps a default, and then per field overrides?

I'm also more liking "per field" to be somehow handled.  Whether
IndexReader exposes that vs a FieldType (that also holds other
per-field stuff), I'm not sure.

bq. Anybody is updating norms on a regular basis on a serious project?

This is a good question -- I'd love to know too.

But I think updating CSFs would be compelling; having to reindex the
entire doc because only 1 or 2 metadata fields had changed is a common
annoyance.  Of course we'd have to figure out (or rule out) updating
the postings for such changes...


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
> LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Reply via email to