[ 
https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802542#action_12802542
 ] 

Toke Eskildsen commented on LUCENE-1990:
----------------------------------------

Introducing yet another level of indirection and making a 
byte/short/int/long-prvider detached from the implementation of the packed 
values it tempting. I'm fairly afraid of the overhead of the extra 
method-calls, but I'll try it and see what happens.

I've read your (Michael McCandless) code an I can see that the tiny interfaces 
for Reader and Writer works well for your scenario. However, as the Reader must 
have (fast) random access, wouldn't it make sense to make it possible to update 
values? That way the same code can be used to hold ords for sorting and similar 
structures.

Instead of Reader, we could use

{code}
abstract class Mutator {
  public abstract long get(int index);
  public abstract long set(int index, long value);
}
{code}

...should the index also be a long? No need to be bound by Java's 31-bit limit 
on array-length, although I might very well be over-engineering here.

The whole 32bit vs. 64bit as backing array does present a bit of a problem with 
persistence. We'll be in a situation where the index will be optimized for the 
architecture used for building, not the one used for searching. Leaving the 
option of a future mmap open means that it is not possible to do a conversion 
when retrieving the bits, so I have no solution for this (other than doing 
memory-only).

> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
>                 Key: LUCENE-1990
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1990
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1990.patch, 
> LUCENE-1990_PerformanceMeasurements20100104.zip
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl.  EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage.  FieldCache.StringIndex could as well.  And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs  {
>   long get(long index);
>   void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting.  If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
>   PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to