[ https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Toke Eskildsen updated LUCENE-1990: ----------------------------------- Attachment: ba.zip I made some small tweaks to improve performance and added long[]-backed versions of Packed (optimal space) and Aligned (no values span underlying blocks), the ran the performance tests on 5 different computers. It seems very clear that level 2 cache (and presumably RAM-speed, but I do not know how to determine that without root-access on a Linux box) plays a bigger role for access speed than mere CPU speed. One 3GHz with 1MB of level 2 cache was about half as fast than a 1.8GHz laptop with 2MB of level 2 cache. There is a whole lot of measurements and it is getting hard to digest. I've attached logs from the 5 computers, should anyone want to have a look. Some observations are: 1. The penalty of using long[] instead of int[] on my 32 bit laptop depends on the number of values in the array. For less than a million values, it is severe: The long[]-version if 30-60% slower, depending on whether packed or aligned values are used. Above that, it was 10% slower for Aligned, 25% slower for Packed. On the other hand, 64 bit machines dos not seem to care that much whether int[] or long[] is used: There was 10% win for arrays below 1M for one machine, 50% for arrays below 100K for another (8% for 1M, 6% for 10M) for another and a small loss of below 1% for all lenghts above 10K for a third. 2. There's a fast drop-off in speed when the array reaches a certain size that is correlated to level 2 cache size. After that, the speed does not decrease much when the array grows. This also affects direct writes to an int[] and has the interesting implication that a packed array out-performs the direct access approach for writes in a number of cases. For reads, there's no contest: Direct access to int[] is blazingly fast. 3. The access-speed of the different implementations converges when the number of values in the array rises (think 10M+ values): The slow round-trip to main memory dwarfs the logic used for value-extraction. Observation #3 supports Mike McCandless choice of going for the packed approach and #1 suggests using int[] as the internal structure for now. Using int[] as internal structure makes if unfeasible to accept longs as input (or rather: longs with more than 32 significant bits). I don't know if this is acceptable? > Add unsigned packed int impls in oal.util > ----------------------------------------- > > Key: LUCENE-1990 > URL: https://issues.apache.org/jira/browse/LUCENE-1990 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael McCandless > Priority: Minor > Attachments: ba.zip > > > There are various places in Lucene that could take advantage of an > efficient packed unsigned int/long impl. EG the terms dict index in > the standard codec in LUCENE-1458 could subsantially reduce it's RAM > usage. FieldCache.StringIndex could as well. And I think "load into > RAM" codecs like the one in TestExternalCodecs could use this too. > I'm picturing something very basic like: > {code} > interface PackedUnsignedLongs { > long get(long index); > void set(long index, long value); > } > {code} > Plus maybe an iterator for getting and maybe also for setting. If it > helps, most of the usages of this inside Lucene will be "write once" > so eg the set could make that an assumption/requirement. > And a factory somewhere: > {code} > PackedUnsignedLongs create(int count, long maxValue); > {code} > I think we should simply autogen the code (we can start from the > autogen code in LUCENE-1410), or, if there is an good existing impl > that has a compatible license that'd be great. > I don't have time near-term to do this... so if anyone has the itch, > please jump! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org