About the encoders package - there are several encoders there besides
VInt, so I wouldn't dispose of it so quickly. That said, I think we
should definitely explore consolidating VInt with the core classes,
and maybe write an encoder which delegate to them.

Or, come up w/ a different approach for allowing to plug in different
Encoders. I don't rule out anything, as long as we preserve
functionality and capabilities.

Shai

On Friday, July 1, 2011, Michael McCandless <luc...@mikemccandless.com> wrote:
> On Fri, Jul 1, 2011 at 2:33 AM, Uwe Schindler <u...@thetaphi.de> wrote:
>> Hi,
>>
>> I don't understand the whole discussion here, so please compare these two 
>> implementations and tell me which one is faster. Please don't hurt me, if 
>> you don't want to see src.jar code from OpenJDK Java6 - just delete this 
>> mail if you don’t want to (the code here is licensed under GPL):
>
> This is the source code for a specific version of one specific Java
> impl.  If we knew all Java impls simply implemented the primitive case
> using System.arraycopy (admittedly it's hard to imagine that they
> wouldn't!) then we are fine.
>
>> This is our implementation, simon replaced and Robert reverted 
>> (UnsafeByteArrayOutputStream):
>>
>>  private void grow(int newLength) {
>>    // It actually should be: (Java 1.7, when its intrinsic on all machines)
>>    // buffer = Arrays.copyOf(buffer, newLength);
>>    byte[] newBuffer = new byte[newLength];
>>    System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
>>    buffer = newBuffer;
>>  }
>>
>> So please look at the code, where is a difference that could slow down, 
>> except the Math.min() call which is an intrinsic in almost every JDK on 
>> earth?
>
> Right, in this case (if you used OpenJDK 6) we are obviously OK.  Not
> sure about other cases...
>
>> The problem we are talking about here is only about the generic Object[] 
>> copyOf method and also affects e.g. *all* Collection.toArray() methods - 
>> they all use this code, so whenever you use ArrayList.toArray() or similar, 
>> the slow code is executed. This is why we replaced Collections.sort() by 
>> CollectionUtil.sort, that does no array copy. Simon & me were not willing to 
>> replace the reallocations in FST code (Mike you remember, we reverted that 
>> on your GIT repo when we did perf tests) and other parts in Lucene (there 
>> are only few of them). The idea was only to replace primitive type code to 
>> make it easier readable. And with later JDK code it could even get faster 
>> (not slower), if Oracle starts to add intrinsics for those new methods (and 
>> that’s Dawid and mine reason to change to copyTo for primitive types). In 
>> general, if you use Java SDK methods, that are as fast as ours, they always 
>> have a chance to get faster in later JDKs. So we should always prefer Java 
>> SDK methods, unless they are slower because their default impl is too 
>> generic or has too much safety checks or uses reflection.
>
> OK I'm convinced (I think!) that for primitive types only, let's use
> Arrays.copyOf!
>
>> To come back to UnsafeByteArrayOutputStream:
>>
>> I would change the whole code, as I don’t like the allocation strategy in it 
>> (it's exponential, on every grow it doubles its size). We should change that 
>> to use ArrayUtils.grow() and ArrayUtils.oversize(), to have a similar 
>> allocation strategy like in trunk. Then we can discuss about this problem 
>> again when Simon & me wants to change ArrayUtils.grow methods to use 
>> Arrays.copyOf... *g* [just joking, I will never ask again, because this 
>> discussion here is endless and does not bring us forward].
>
> Well, it sounds like for primitive types, we can cutover
> ArrayUtils.grow methods.  Then we can look @ the nightly bench the
> next day ;)
>
> But I agree we should fix UnsafeByteArrayOutputStream... or, isn't it
> (almost) a dup of ByteArrayDataOutput?
>
>> The other thing I don’t like in the new faceting module is duplication of 
>> vint code. Why don’t we change it to use DataInput/DataOutput and use 
>> Dawid's new In/OutStream wrapper for DataOutput everywhere. This would be 
>> much more streamlined with all the code we currently have. Then we can 
>> encode the payloads (or later docvalues) using the new 
>> UnsafeByteArrayOutputStream, wrapped with a OutputStreamDataOutput wrapper? 
>> Or maybe add a ByteArrayDataOutput class.
>
> That sounds good!
>
> Uwe can you commit TODOs to the code w/ these ideas?
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to