[ 
https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1990:
---------------------------------------

    Attachment: LUCENE-1990.patch

Attached patch with my current roughed up approach for packed ints
(from LUCENE-2186).

Let's try to standardize the API, then merge the two approaches, then
I'll cutover with LUCENE-2186.

It includes gen.py, which autogens dedicated decoders for each of the
nbits cases, excluding 8, 16, 32, 64 bits, since these are done with
dedicated array reader impls.

It uses a single writer (I don't think we need specialized writers),
but the writer encodes in the same byte order as
IndexOutput.writeLong, so that the byte order matches the dedicated
array reader impls.

It only encodes into long[] -- we should create cases for int[]
(selected by the factory depending on 32 vs 64 bit jre).

We should also explore just reading in a full byte[] and using
Int/Short/Long buffer to decode.  This API should also allow for a
future mmap impl as well.

Probably we should name all of these UnsignedPackedInts, since they
require values >= 0.  (Hmm, though, the 64 bit case is tricky -- I
guess we make an exception for that case).


> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
>                 Key: LUCENE-1990
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1990
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1990.patch, 
> LUCENE-1990_PerformanceMeasurements20100104.zip
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl.  EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage.  FieldCache.StringIndex could as well.  And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs  {
>   long get(long index);
>   void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting.  If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
>   PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to