[ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700523#action_12700523 ]
Michael McCandless commented on LUCENE-831: ------------------------------------------- bq. Some of your comments seem to indicate you think we will need to end up with an object rather than raw arrays? Well, really I threw out all these future items to stir up the pot and see if some clarity comes out of it ;) This is what I try to do whenever I'm stuck on how to design something... some sort of defense mechanism. That said, what requires object instead of array? EG for binary fields (deleted docs) we'd have eg "BitVector getBits(...)". For multi-valued fields, I'm not sure what's best. I think Yonik did something neat with Solr for holding multi-valued fields but I can't find it now. But, with ValueSource, we have the freedom to use arrays for simple cases and something else for interesting ones? It's not either/or? bq. And we would want to lose exposing Parser so that CFS can be a seamless backing. I see the CFS/CSF confusion has already struck! But yes cleaner API would be a nice step forward... bq. We have it? Just pass the CSFValueSource at IndexReader creation? Yes I think we have this one. Though... I feel like ValueSource should represent a single field's values, and something else (FieldType?) returns the ValueSource for that field. Ie, I think we are overloading ValueSource now? bq. Good point. We need a way to update, that can throw USO Exception? Maybe... or we can defer for future. We don't need full answers nor impls for all of these now... {quote} > Possible future when Lucene computes sort cache (for text fields) > and stores in the index I'm not familiar with that idea, so not sure what affect this has... {quote} Sort cache is just getStringIndex()... all other types just use the values directly (no need for separate ords). If it's costly to compute per-reopen we may want to store it in the index. But honestly, since we load the full thing into RAM, I wonder how different the time'd really be loading it vs recomputing it. bq. Good point again. Getting norms under this API will add a bit more meat to this issue. Yeah I'm not sure whether norms/deleted docs "fit"; certainly we'd need updatability first. It's just that, from a distance, they are clearly a "value per doc" for every doc in the index. If we had norms & deletions under this API then suddenly, [almost] for free, we'd get pluggability of deleted docs & norms. bq. I am kind of liking Uwe's idea of assigning ValueSources per field, though that could probably get messy. Perhaps a default, and then per field overrides? I'm also more liking "per field" to be somehow handled. Whether IndexReader exposes that vs a FieldType (that also holds other per-field stuff), I'm not sure. bq. Anybody is updating norms on a regular basis on a serious project? This is a good question -- I'd love to know too. But I think updating CSFs would be compelling; having to reindex the entire doc because only 1 or 2 metadata fields had changed is a common annoyance. Of course we'd have to figure out (or rule out) updating the postings for such changes... > Complete overhaul of FieldCache API/Implementation > -------------------------------------------------- > > Key: LUCENE-831 > URL: https://issues.apache.org/jira/browse/LUCENE-831 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Reporter: Hoss Man > Assignee: Mark Miller > Fix For: 3.0 > > Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, > fieldcache-overhaul.diff, fieldcache-overhaul.diff, > LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, > LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, > LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, > LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, > LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, > LUCENE-831.patch > > > Motivation: > 1) Complete overhaul the API/implementation of "FieldCache" type things... > a) eliminate global static map keyed on IndexReader (thus > eliminating synch block between completley independent IndexReaders) > b) allow more customization of cache management (ie: use > expiration/replacement strategies, disk backed caches, etc) > c) allow people to define custom cache data logic (ie: custom > parsers, complex datatypes, etc... anything tied to a reader) > d) allow people to inspect what's in a cache (list of CacheKeys) for > an IndexReader so a new IndexReader can be likewise warmed. > e) Lend support for smarter cache management if/when > IndexReader.reopen is added (merging of cached data from subReaders). > 2) Provide backwards compatibility to support existing FieldCache API with > the new implementation, so there is no redundent caching as client code > migrades to new API. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org