Thanks Mark ,
this gets me moving. I will look into it somewhere soon.
Geert-Jan
markrmiller wrote:
>
> Like I said, its pretty easy to add this, but its also going to suck.
> Kind of exposes the fact that its missing the right extensibility at the
> moment. Things are still a bit ugly overall.
>
>
> Your going to need new CacheKeys for the data types you want to support.
> A CacheKey builds and provides access to the field data and is simply:
>
>
> *public* *abstract* *class* CacheKey {
>
> *public* *abstract* CacheData buildData(IndexReader r);
>
> *public* *abstract* *boolean* equals(Object o);
>
> *public* *abstract* *int* hashCode();
>
> *public* *boolean* isMergable();
>
> *public* CacheData mergeData(*int*[] starts, CacheData[] data) ;
>
> *public* *boolean* usesObjectArray();
>
>
> For a sparse storage implementation you would use an object array, so
> have usesObjectArray return true and isMergable can then be false and
> you dont have to support the mergeData method.
>
>
> In buildData you will load your object array and return it. Here is an
> array backed IntObjectArrayCacheKey build method:
>
> *public* CacheData buildData(IndexReader reader) *throws* IOException {
>
> *final* *int*[] retArray = getIntArray(reader);
>
> ObjectArray fieldValues = *new* ObjectArray() {
>
> *public* Object get(*int* index) {
>
> *return* *new* Integer(retArray[index]);
>
> }
>
> };
>
> *return* *new* CacheData(fieldValues);
>
> }
>
>
> *protected* *int*[] getIntArray(IndexReader reader) *throws* IOException {
>
> *final* *int*[] retArray = *new* *int*[reader.maxDoc()];
>
> TermDocs termDocs = reader.termDocs();
>
> TermEnum termEnum = reader.terms(*new* Term(field, ""));
>
> *try* {
>
> *do* {
>
> Term term = termEnum.term();
>
> *if* (term == *null* || term.field() != field)
> * break*;
> *
> int* termval = parser.parseInt(term.text());
>
> termDocs.seek(termEnum);
>
> *while* (termDocs.next()) {
> retArray[termDocs.doc()] = termval;
> }
>
> } *while* (termEnum.next());
>
> } *finally* {
>
> termDocs.close();
>
> termEnum.close();
>
> }
>
> *return* retArray;
>
> }
>
>
> So it should be fairly straightforward to return a sparse implementation
> backed object array from your new CacheKey (SparseIntObjectArrayCacheKey
> or something).
>
> Now some more ugliness: You can turn on the ObjectArray cachekeys by
> setting the system property 'use.object.array.sort' to true. This will
> cause FieldSortedHitQueue to return ScoreDocComparators that use the
> standard ObjectArray CacheKeys, IntObjectArrayCacheKey,
> FloatObjectArrayCacheKey, etc.The method that builds each comparator
> type knows what type to build for and whether to use primitive arrays or
> ObjectArrays ie (from FieldSortedHitQueue):
>
>
> *static* ScoreDocComparator comparatorDoubleOA(*final* IndexReader
> reader, *final* String fieldname)
>
>
> does this (it has to provide the CacheKey and know the return type):
>
>
> *final* ObjectArray fieldOrder = (ObjectArray)
> reader.getCachedData(*new*
> DoubleObjectArrayCacheKey(field)).getCachePayload();
>
>
> So you have to either change all of the ObjectArray comparator builders
> to use your CacheKeys:
>
>
> *final* ObjectArray fieldOrder = (ObjectArray)
> reader.getCachedData(*new*
> SparseIntObjectArrayCacheKey(field)).getCachePayload();
>
>
> Or you have to add more options in
> FieldSortedHitQueue.CacheEntry.buildData(IndexReader reader) and more
> static comparator builders in FieldSortedHitQueue that use the right
> CacheKeys. Obviously not very extensibility friendly at the moment. I'm
> sure with some thought, things could be much better. If you decided to
> jump into any of this, let me know if you have any suggestions, feedback.
>
>
> - Mark
>
>
>
> Britske wrote:
>> That ArrayObject suggestion makes sense to me. It amost seemed to be as
>> if
>> you were referring as this option (or at least the interfaces needed to
>> implement this) were already available as 1 out of 2 options available in
>> 831?
>>
>> Could you give me a hint at were I have to be looking to extend what
>> you're
>> suggesting?
>> a new Cache, CacheFactory and Cachekey implementaiton for all types of
>> cachekeys? This may sound a bit ignorant, but it would be my first time
>> to
>> get my head around the internals of an api instead of merely using it to
>> imbed in a client application so any help is highly appreciated.
>>
>> Thanks for your help,
>>
>> Geert-Jan
>>
>>
>>
>> markrmiller wrote:
>>
>>> Its hard to predict the future of LUCENE-831. I would bet that it will
>>> end up in Lucene at some point in one form or another, but its hard to
>>> say if that form will be whats in the available patches (I'm a contrib
>>> committer so I won't have any real say in that, so take that prediction
>>> with a grain of salt). It has strong ties to other issues and a
>>> committer hasn't really had their whack at it yet.
>>>
>>> Having said that though, LUCENE-831 allows for two types for dealing
>>> with field values: either the old style int/string/long/etc arrays, or
>>> for a small speed hit and faster reopens, an ArrayObject type that is
>>> basically an Object that can provide access to one or two real or
>>> virtual arrays. So technically you could use an ArrayObject that had a
>>> sparse implementation behind it. Unfortunately, you would have to
>>> implement new CachKeys to do this. Trivial to do, but reveals our
>>> LUCENE-831 problem of exponential cachkey increases with every new
>>> little option/idea and the juggling of which to use. I havn't thought
>>> about it, but I'm hoping an API tweak can alleviate some of this.
>>>
>>> - Mark
>>>
>>> Britske wrote:
>>>
>>>> Hi,
>>>>
>>>> I recently saw activity on LUCENE-831 (Complete overhaul of FieldCache
>>>> API/Implementation) which I have interest in.
>>>> I posted previously on this with my concern that given the current
>>>> default
>>>> cache I sometimes get OOM-errors because I have a lot of fields which
>>>> are
>>>> sorted on, which ultimately causes the fieldcache to grow greater then
>>>> available RAM.
>>>>
>>>> ultimately I want to subclass the new pluggable Fieldcache of
>>>> lucene-831
>>>> to
>>>> offload to disk (using ehcache or memcachedB or something) but havn't
>>>> found
>>>> the time yet.
>>>>
>>>> What I would like to know for now is if perhaps the newly implemented
>>>> standard cache in LUCENE-831 uses another strategy of caching than the
>>>> standard Fieldcache in Lucene.
>>>>
>>>> i.e: The normal cache consumes memory while generating a fieldcache for
>>>> every document in lucene even though the document hasn't got that field
>>>> set.
>>>>
>>>> Since my documents are very sparse in these fields I want to sort on it
>>>> would differ a_lot when documents that don't have the field in question
>>>> set
>>>> don't add up in the used memory.
>>>>
>>>> So am I lucky? Or would I indeed have to cook up something myself?
>>>> Thanks and best regards,
>>>>
>>>> Geert-Jan
>>>>
>>>>
>>>>
>>>>
>>> I'm
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>>>
>>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
--
View this message in context:
http://www.nabble.com/LUCENE-831-%28complete-cache-overhaul%29--%3E-mem-use-tp20505283p20516307.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]