Thanks Mark , this gets me moving. I will look into it somewhere soon.
Geert-Jan markrmiller wrote: > > Like I said, its pretty easy to add this, but its also going to suck. > Kind of exposes the fact that its missing the right extensibility at the > moment. Things are still a bit ugly overall. > > > Your going to need new CacheKeys for the data types you want to support. > A CacheKey builds and provides access to the field data and is simply: > > > *public* *abstract* *class* CacheKey { > > *public* *abstract* CacheData buildData(IndexReader r); > > *public* *abstract* *boolean* equals(Object o); > > *public* *abstract* *int* hashCode(); > > *public* *boolean* isMergable(); > > *public* CacheData mergeData(*int*[] starts, CacheData[] data) ; > > *public* *boolean* usesObjectArray(); > > > For a sparse storage implementation you would use an object array, so > have usesObjectArray return true and isMergable can then be false and > you dont have to support the mergeData method. > > > In buildData you will load your object array and return it. Here is an > array backed IntObjectArrayCacheKey build method: > > *public* CacheData buildData(IndexReader reader) *throws* IOException { > > *final* *int*[] retArray = getIntArray(reader); > > ObjectArray fieldValues = *new* ObjectArray() { > > *public* Object get(*int* index) { > > *return* *new* Integer(retArray[index]); > > } > > }; > > *return* *new* CacheData(fieldValues); > > } > > > *protected* *int*[] getIntArray(IndexReader reader) *throws* IOException { > > *final* *int*[] retArray = *new* *int*[reader.maxDoc()]; > > TermDocs termDocs = reader.termDocs(); > > TermEnum termEnum = reader.terms(*new* Term(field, "")); > > *try* { > > *do* { > > Term term = termEnum.term(); > > *if* (term == *null* || term.field() != field) > * break*; > * > int* termval = parser.parseInt(term.text()); > > termDocs.seek(termEnum); > > *while* (termDocs.next()) { > retArray[termDocs.doc()] = termval; > } > > } *while* (termEnum.next()); > > } *finally* { > > termDocs.close(); > > termEnum.close(); > > } > > *return* retArray; > > } > > > So it should be fairly straightforward to return a sparse implementation > backed object array from your new CacheKey (SparseIntObjectArrayCacheKey > or something). > > Now some more ugliness: You can turn on the ObjectArray cachekeys by > setting the system property 'use.object.array.sort' to true. This will > cause FieldSortedHitQueue to return ScoreDocComparators that use the > standard ObjectArray CacheKeys, IntObjectArrayCacheKey, > FloatObjectArrayCacheKey, etc.The method that builds each comparator > type knows what type to build for and whether to use primitive arrays or > ObjectArrays ie (from FieldSortedHitQueue): > > > *static* ScoreDocComparator comparatorDoubleOA(*final* IndexReader > reader, *final* String fieldname) > > > does this (it has to provide the CacheKey and know the return type): > > > *final* ObjectArray fieldOrder = (ObjectArray) > reader.getCachedData(*new* > DoubleObjectArrayCacheKey(field)).getCachePayload(); > > > So you have to either change all of the ObjectArray comparator builders > to use your CacheKeys: > > > *final* ObjectArray fieldOrder = (ObjectArray) > reader.getCachedData(*new* > SparseIntObjectArrayCacheKey(field)).getCachePayload(); > > > Or you have to add more options in > FieldSortedHitQueue.CacheEntry.buildData(IndexReader reader) and more > static comparator builders in FieldSortedHitQueue that use the right > CacheKeys. Obviously not very extensibility friendly at the moment. I'm > sure with some thought, things could be much better. If you decided to > jump into any of this, let me know if you have any suggestions, feedback. > > > - Mark > > > > Britske wrote: >> That ArrayObject suggestion makes sense to me. It amost seemed to be as >> if >> you were referring as this option (or at least the interfaces needed to >> implement this) were already available as 1 out of 2 options available in >> 831? >> >> Could you give me a hint at were I have to be looking to extend what >> you're >> suggesting? >> a new Cache, CacheFactory and Cachekey implementaiton for all types of >> cachekeys? This may sound a bit ignorant, but it would be my first time >> to >> get my head around the internals of an api instead of merely using it to >> imbed in a client application so any help is highly appreciated. >> >> Thanks for your help, >> >> Geert-Jan >> >> >> >> markrmiller wrote: >> >>> Its hard to predict the future of LUCENE-831. I would bet that it will >>> end up in Lucene at some point in one form or another, but its hard to >>> say if that form will be whats in the available patches (I'm a contrib >>> committer so I won't have any real say in that, so take that prediction >>> with a grain of salt). It has strong ties to other issues and a >>> committer hasn't really had their whack at it yet. >>> >>> Having said that though, LUCENE-831 allows for two types for dealing >>> with field values: either the old style int/string/long/etc arrays, or >>> for a small speed hit and faster reopens, an ArrayObject type that is >>> basically an Object that can provide access to one or two real or >>> virtual arrays. So technically you could use an ArrayObject that had a >>> sparse implementation behind it. Unfortunately, you would have to >>> implement new CachKeys to do this. Trivial to do, but reveals our >>> LUCENE-831 problem of exponential cachkey increases with every new >>> little option/idea and the juggling of which to use. I havn't thought >>> about it, but I'm hoping an API tweak can alleviate some of this. >>> >>> - Mark >>> >>> Britske wrote: >>> >>>> Hi, >>>> >>>> I recently saw activity on LUCENE-831 (Complete overhaul of FieldCache >>>> API/Implementation) which I have interest in. >>>> I posted previously on this with my concern that given the current >>>> default >>>> cache I sometimes get OOM-errors because I have a lot of fields which >>>> are >>>> sorted on, which ultimately causes the fieldcache to grow greater then >>>> available RAM. >>>> >>>> ultimately I want to subclass the new pluggable Fieldcache of >>>> lucene-831 >>>> to >>>> offload to disk (using ehcache or memcachedB or something) but havn't >>>> found >>>> the time yet. >>>> >>>> What I would like to know for now is if perhaps the newly implemented >>>> standard cache in LUCENE-831 uses another strategy of caching than the >>>> standard Fieldcache in Lucene. >>>> >>>> i.e: The normal cache consumes memory while generating a fieldcache for >>>> every document in lucene even though the document hasn't got that field >>>> set. >>>> >>>> Since my documents are very sparse in these fields I want to sort on it >>>> would differ a_lot when documents that don't have the field in question >>>> set >>>> don't add up in the used memory. >>>> >>>> So am I lucky? Or would I indeed have to cook up something myself? >>>> Thanks and best regards, >>>> >>>> Geert-Jan >>>> >>>> >>>> >>>> >>> I'm >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >>> >>> >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/LUCENE-831-%28complete-cache-overhaul%29--%3E-mem-use-tp20505283p20516307.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]