[
https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803628#action_12803628
]
Toke Eskildsen commented on LUCENE-1990:
----------------------------------------
Looking at bit patterns and persistence, I see 3 different ones: Packed,
aligned32 and aligned64. Regardless of whether 32bit or 64bit is used when a
packed structure is created, it can be read as both 32bit and 64bit packed. As
for the special cases of 8, 16, 32 and 64 bits/value, the bit patterns are
identically to both packed and aligned. This leeds me to propose a header
designating one of the three structures mentioned.
The current draft from Michael McCandless states both bitsPerValue and maxValue
in the persistent format. It seems a redundant to have both, but I might be
missing something here? Either way, the bitsPerValue is ambiguous as it does
not translate to memory usage the same way for packed, aligned32 or aligned64.
Should I choose, I'd go for maxValue.
What about a header stating
{code}
format (String "packed", "aligned32" or "aligned64")
valueCount (vInt)
maxValue (vLong)
{code}
?
I have working code for packed32 and packed64 and am currently fitting it into
Michael's patch. I hope to finish it this weekend.
> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
> Key: LUCENE-1990
> URL: https://issues.apache.org/jira/browse/LUCENE-1990
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Priority: Minor
> Attachments: LUCENE-1990.patch,
> LUCENE-1990_PerformanceMeasurements20100104.zip
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl. EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage. FieldCache.StringIndex could as well. And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs {
> long get(long index);
> void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting. If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
> PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]