[
https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799707#action_12799707
]
Michael McCandless commented on LUCENE-1990:
--------------------------------------------
How about something like this API, for writing packed ints:
{code}
abstract class Writer {
public abstract void add(long v) throws IOException;
public abstract void finish() throws IOException;
}
{code}
then a factory:
{code}
enum Mode {Packed, Aligned, FixedArray};
public static Writer getWriter(IndexOutput out, int valueCount, long maxValue,
Mode mode);
{code}
(we can iterate on the names... always the hardest part).
Packed means full bit packing (most space efficient, but slowest
decode time), Aligned might waste some bits (eg for nbits=4, that's
naturally aligned, but for nbits=7, we'd waste 1 bit per long,
FixedArray (which'd use byte[], short[], int[], long[]) would
potentially waste the most bits but have the fastest decode.
If nbits happens to be 8, 16, 32, 64, the factory should just always
FixedArray I think? And of course powers of two will automatically be
Aligned (with the per-nbits specialized code).
Wew can also default impls to underlying int[] vs long[] backing store
depending on 54/32 bit jre, and, nbits. If jre is 32 bit but nbits is
> 32 bit I think we just use long[] backing.
For reading, a similar API:
{code}
abstract class Reader {
public abstract long get(index);
}
public static Reader getReader(IndexInput in);
{code}
> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
> Key: LUCENE-1990
> URL: https://issues.apache.org/jira/browse/LUCENE-1990
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Priority: Minor
> Attachments: LUCENE-1990_PerformanceMeasurements20100104.zip
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl. EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage. FieldCache.StringIndex could as well. And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs {
> long get(long index);
> void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting. If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
> PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]