[ 
https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toke Eskildsen updated LUCENE-1990:
-----------------------------------

    Attachment: LUCENE-1990-te20100122.patch

I've uploaded a preliminary patch with packed32, packed64, directByte, 
directShort, directInt and directLong implementations. I've used Michael 
McCandless patch as foundation, but the new patch is generated to be 
independent from the old one. It uses maxValue instead of bitsPerValue for the 
header, there's no test of packed32 and there's a general need for cleanup. The 
main missing components are aligned32 and aligned64.

I've done quite a bit of refactoring and (cheater that I am) added setters to 
all implementations of Reader, although not to the interface. Besides the 
nitty-gritty details of the implementation, I suspect that the code for 
selecting which implementation to use is a prime candidate for discussion. It 
is located in PackedInts and tries to select the best implementation based on 
preference for packed, aligned and direct paired with preference for 32bit and 
64bit.

{code}
  private static IMPLEMENTATION getImplementation(
          long maxValue, PRIORITY priority, BLOCK_PREFERENCE block) {
    int bits = bitsRequired(maxValue);
    switch (priority) {
      case direct: {
        bits = getNextFixedSize(bits);
        break;
      }
      case aligned: {
        if (block == BLOCK_PREFERENCE.bit32) {
          if (bits == 7 || bits >= 11) {
            bits = getNextFixedSize(bits); // Align to byte, short, int or long
          }
        } else {
          if ((bits >= 13 && bits <= 15) || (bits >= 22)) {
            bits = getNextFixedSize(bits); // Align to short, int or long
          }
        }
      }
    }
    switch (bits) { // The safe choices
      case 8: return IMPLEMENTATION.directByte;
      case 16: return IMPLEMENTATION.directShort;
      case 32: return IMPLEMENTATION.directInt;
      case 63:
      case 64: return IMPLEMENTATION.directLong;
    }

    if (priority == PRIORITY.aligned || bits == 1 || bits == 2 || bits == 4) {
      return block == BLOCK_PREFERENCE.bit32 && bits < 32 ?
              IMPLEMENTATION.aligned32 : IMPLEMENTATION.aligned64;
    }
    return block == BLOCK_PREFERENCE.bit32 && bits < 32 ?
            IMPLEMENTATION.packed32 : IMPLEMENTATION.packed64;

    return IMPLEMENTATION.packed64;
{code}

I think that an "auto"-value for priority is worth considering: For 9, 17 and 
33 bits/value, packed is often faster than aligned due to only using half the 
memory and thus having lower risk of level 2 cache misses. For high bits/value, 
such as 30, 31, 62, 63 and 64 (guesstimating here), choosing direct seems to be 
the best choice for most situations. Users of PackedInts should not be expected 
to know this.

I'll start work on aligned32 and aligned64, but I will leave the rest of the 
patch alone for now, as I suspect that there'll be some changes to the current 
draft.

> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
>                 Key: LUCENE-1990
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1990
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1990-te20100122.patch, LUCENE-1990.patch, 
> LUCENE-1990_PerformanceMeasurements20100104.zip
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl.  EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage.  FieldCache.StringIndex could as well.  And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs  {
>   long get(long index);
>   void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting.  If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
>   PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to