[ 
https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803628#action_12803628
 ] 

Toke Eskildsen commented on LUCENE-1990:
----------------------------------------

Looking at bit patterns and persistence, I see 3 different ones: Packed, 
aligned32 and aligned64. Regardless of whether 32bit or 64bit is used when a 
packed structure is created, it can be read as both 32bit and 64bit packed. As 
for the special cases of 8, 16, 32 and 64 bits/value, the bit patterns are 
identically to both packed and aligned. This leeds me to propose a header 
designating one of the three structures mentioned.

The current draft from Michael McCandless states both bitsPerValue and maxValue 
in the persistent format. It seems a redundant to have both, but I might be 
missing something here? Either way, the bitsPerValue is ambiguous as it does 
not translate to memory usage the same way for packed, aligned32 or aligned64. 
Should I choose, I'd go for maxValue.

What about a header stating
{code}
format (String "packed", "aligned32" or "aligned64")
valueCount (vInt)
maxValue (vLong)
{code}
?

I have working code for packed32 and packed64 and am currently fitting it into 
Michael's patch. I hope to finish it this weekend.

> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
>                 Key: LUCENE-1990
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1990
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1990.patch, 
> LUCENE-1990_PerformanceMeasurements20100104.zip
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl.  EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage.  FieldCache.StringIndex could as well.  And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs  {
>   long get(long index);
>   void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting.  If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
>   PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to