[ 
https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804809#action_12804809
 ] 

Toke Eskildsen commented on LUCENE-1990:
----------------------------------------

{quote}
I think we should remove getMaxValue() from the Reader interface?
{quote}

Yes. I only left maxValue in the code because I ran out of time.

{quote}
Why create the IMPLEMENTATION enum? Why not simply return an
[anonymous] instance of Writer?
{quote}

The IMPLEMENTATION enum is only used internally and is package private. It was 
introduced to separate decision-making from specific implementations - e.g. the 
packed writer is the same for packed32 and packed64, although the reader 
differs. But it could very well be that it confuses more than it helps.

{quote}
Why not store bitsPerValue in the header instead of maxValue?
{quote}

As above - I did not have the time to fix it and wanted to push the patch in 
order to move the discussion along.

{quote}
Also, the maxValue at write time should not
have to be known - eg the factory API should let me ask for a
direct short writer without declaring the maxValue I will store.
{quote}

Since Packed and Aligned needs maxValue (or bitsPerValue), this would require 
two distinct methods in the factory, each returning a subset of the possible 
implementations. I find that rather confusing.

{quote}
I wonder if we should add an optional Object
getDirectBackingArray(). The packed/aligned impls would return
null, but the direct byte/short/int/long impls would return their
array. [...] 
{quote}

Speaking of API additions, I find that
{code}
public int getBitsPerValue();
public int size();
public void set(long value);
public void clear();
{code}
are trivial to implement for the known implementations. They open up for things 
like auto-growing to fit higher values by using a delegating wrapper, re-using 
the structure for counting purposes and sorting in-place.

{quote}
I think we shouldn't put a getWriter on every Reader
impl... because it's a one to many mapping? Eg the format written
by PackedWriter can be read by direct byte/short/int/long,
Packed32/64.
{quote}

Quite right.

{quote}
For starters I don't think we should make reader impls that can
read nbits > 31 bits with an int[] backing array. I think long[]
backing array is fine.
{quote}
The current patch limits nbits to 32 for Packed32. I am confident that an 
int-backed reader with nbits > 32 will be slower than a long-backed reader on a 
32 bit machine.

{quote}
I don't think we need separate PRIORITY and BLOCK_PREFERENCE?
Can't we have a single enum (STORAGE?) with: packed, aligned32,
aligned64? "Direct" is really just packed with nbits rounded up
to 8,16,32,64.
{quote}
I agree that it does complicate matters somewhat to have them separated. When 
calling getReader the BLOCK_PREFERENCE should also be removed, as the block 
preference will always be the same as that architecture. Removing the "direct" 
option would require the caller to do some of the logic in some cases: If low 
processing requirements is a priority, direct is preferably and when the 
bitsPerValue is calculated, the caller would have to do the if (bitsPerValue > 
32) bitsPerValue = 64 and so on.

> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
>                 Key: LUCENE-1990
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1990
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1990-te20100122.patch, LUCENE-1990.patch, 
> LUCENE-1990_PerformanceMeasurements20100104.zip
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl.  EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage.  FieldCache.StringIndex could as well.  And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs  {
>   long get(long index);
>   void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting.  If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
>   PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to