[ 
https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toke Eskildsen updated LUCENE-1990:
-----------------------------------

    Attachment: ba.zip

I made some small tweaks to improve performance and added long[]-backed 
versions of Packed (optimal space) and Aligned (no values span underlying 
blocks), the ran the performance tests on 5 different computers. It seems very 
clear that level 2 cache (and presumably RAM-speed, but I do not know how to 
determine that without root-access on a Linux box) plays a bigger role for 
access speed than mere CPU speed. One 3GHz with 1MB of level 2 cache was about 
half as fast than a 1.8GHz laptop with 2MB of level 2 cache.

There is a whole lot of measurements and it is getting hard to digest. I've 
attached logs from the 5 computers, should anyone want to have a look. Some 
observations are:

1. The penalty of using long[] instead of int[] on my 32 bit laptop depends on 
the number of values in the array. For less than a million values, it is 
severe: The long[]-version if 30-60% slower, depending on whether packed or 
aligned values are used. Above that, it was 10% slower for Aligned, 25% slower 
for Packed.
On the other hand, 64 bit machines dos not seem to care that much whether int[] 
or long[] is used: There was 10% win for arrays below 1M for one machine, 50% 
for arrays below 100K for another (8% for 1M, 6% for 10M) for another and a 
small loss of below 1% for all lenghts above 10K for a third.

2. There's a fast drop-off in speed when the array reaches a certain size that 
is correlated to level 2 cache size. After that, the speed does not decrease 
much when the array grows. This also affects direct writes to an int[] and has 
the interesting implication that a packed array out-performs the direct access 
approach for writes in a number of cases. For reads, there's no contest: Direct 
access to int[] is blazingly fast.

3. The access-speed of the different implementations converges when the number 
of values in the array rises (think 10M+ values): The slow round-trip to main 
memory dwarfs the logic used for value-extraction. 

Observation #3 supports Mike McCandless choice of going for the packed approach 
and #1 suggests using int[] as the internal structure for now. Using int[] as 
internal structure makes if unfeasible to accept longs as input (or rather: 
longs with more than 32 significant bits). I don't know if this is acceptable?

> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
>                 Key: LUCENE-1990
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1990
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: ba.zip
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl.  EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage.  FieldCache.StringIndex could as well.  And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs  {
>   long get(long index);
>   void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting.  If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
>   PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to