On Thu, Jun 18, 2009 at 23:20, Ard Schrijvers <a.schrijv...@onehippo.com> wrote: > As I like this solution, it seems to me to only suitable for dates, > right?
yeah, it probably works best with fixed length values. see also the wiki page I created about this: http://wiki.apache.org/jackrabbit/ReduceMemOfSharedFieldCache > How do we know that we are sorting on a date...by checking > whethet it has length 9..or that it starts with msq? as of IndexFormatVersion V3 (jackrabbit 1.5) the property type is stored as a payload on the indexed term. > Furthermore, I am > quite curious how you implemented this below. If you just used > substrings, we could gain quite a bit more with, but i am not sure > whether you already do this: > > Suppose > > String s = "msqyw2shb"; > > If you are having > > String[0] = s.subString(0,3); > > we reduce memory usage quite a bit more with > > String[0] = new String(s.subString(0,3)) > > Also see [1]. But perhaps you are already doing this. yes, I already did. I've put the test code on the wiki: > A direct small improvement we could directly make is replacing : > > retArray[termDocs.doc()] = term.text().substring(prefix.length()); > > with > > retArray[termDocs.doc()] = new String(term.text().substring(prefix.length())); hmm, you're right. it was actually my intention to reduce memory usage by only keeping the significant part of the term. we should fix that. regards marcel > It is a bit strange, but as for dates I think the prefix.length is > something like "lastModified" and a delimiter, suppose 13 chars..this > would bring back the char array retained in memory back from 22 to > 9...(for dates) > > Furthermore, it follows that using short property names saves you > memory. This could be avoided in the end if we index each property in > its own lucene field, instead of all in :_PROPERTIES and prefix the > value with the propertyname..this though requires quite some rewrite > for indexing i think. > > [1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4513622 > > > > On Thu, Jun 18, 2009 at 1:25 PM, Marcel > Reutegger<marcel.reuteg...@day.com> wrote: > > On Thu, Jun 18, 2009 at 09:37, Ard Schrijvers <a.schrijv...@onehippo.com> > > wrote: > >> If you happen to find the holy grail solution, I suppose you'll let us know > >> :-) Also if you would have some memory usage numbers with and without the > >> suggestion of mine regarding reducing the precision of you Date field, this > >> would be very valuable. > > > > hmm, I'm been thinking about a solution that I would call > > flyweight-substring-collation-key. it assumes that there is usually a > > major overlap of substrings of the the values to sort on. i.e. a > > lastModified value. so instead of always keeping the entire value we'd > > have a collation key that references multiple reusable substrings. > > > > assume we have the following values: > > > > - msqyw2shb > > - msqyw2t93 > > - msqyw2u0v > > - msqyw2usn > > - msqyw2vkf > > - msqyw2wc7 > > - msqyw2x3z > > - msqyw2xvr > > - msqyw2ynj > > - msqyw2zfb > > > > (those are date property values each 1 second after the previous one) > > > > we could create collation keys for use as comparable in the field > > cache like this: > > > > substring cache: > > [0] msq > > [1] shb > > [2] t93 > > [3] u0v > > [4] usn > > [5] vkf > > [6] wc7 > > [7] x3z > > [8] xvr > > [9] ynj > > [10] yw2 > > [11] zfb > > > > and then the actual comparable that reference the substrings in the cache: > > > > - {0, 10, 1} > > - {0, 10, 2} > > - {0, 10, 3} > > - {0, 10, 4} > > - {0, 10, 5} > > - {0, 10, 6} > > - {0, 10, 7} > > - {0, 10, 8} > > - {0, 10, 9} > > - {0, 10, 11} > > > > this will result in a lower memory consumption and using the reference > > indexes could even speed up the comparison. > > > > a quick test with 1 million dates values showed that the memory > > consumption drops to 50% with this approach. > > > > regards > > marcel > >