About the encoders package - there are several encoders there besides VInt, so I wouldn't dispose of it so quickly. That said, I think we should definitely explore consolidating VInt with the core classes, and maybe write an encoder which delegate to them.
Or, come up w/ a different approach for allowing to plug in different Encoders. I don't rule out anything, as long as we preserve functionality and capabilities. Shai On Friday, July 1, 2011, Michael McCandless <luc...@mikemccandless.com> wrote: > On Fri, Jul 1, 2011 at 2:33 AM, Uwe Schindler <u...@thetaphi.de> wrote: >> Hi, >> >> I don't understand the whole discussion here, so please compare these two >> implementations and tell me which one is faster. Please don't hurt me, if >> you don't want to see src.jar code from OpenJDK Java6 - just delete this >> mail if you don’t want to (the code here is licensed under GPL): > > This is the source code for a specific version of one specific Java > impl. If we knew all Java impls simply implemented the primitive case > using System.arraycopy (admittedly it's hard to imagine that they > wouldn't!) then we are fine. > >> This is our implementation, simon replaced and Robert reverted >> (UnsafeByteArrayOutputStream): >> >> private void grow(int newLength) { >> // It actually should be: (Java 1.7, when its intrinsic on all machines) >> // buffer = Arrays.copyOf(buffer, newLength); >> byte[] newBuffer = new byte[newLength]; >> System.arraycopy(buffer, 0, newBuffer, 0, buffer.length); >> buffer = newBuffer; >> } >> >> So please look at the code, where is a difference that could slow down, >> except the Math.min() call which is an intrinsic in almost every JDK on >> earth? > > Right, in this case (if you used OpenJDK 6) we are obviously OK. Not > sure about other cases... > >> The problem we are talking about here is only about the generic Object[] >> copyOf method and also affects e.g. *all* Collection.toArray() methods - >> they all use this code, so whenever you use ArrayList.toArray() or similar, >> the slow code is executed. This is why we replaced Collections.sort() by >> CollectionUtil.sort, that does no array copy. Simon & me were not willing to >> replace the reallocations in FST code (Mike you remember, we reverted that >> on your GIT repo when we did perf tests) and other parts in Lucene (there >> are only few of them). The idea was only to replace primitive type code to >> make it easier readable. And with later JDK code it could even get faster >> (not slower), if Oracle starts to add intrinsics for those new methods (and >> that’s Dawid and mine reason to change to copyTo for primitive types). In >> general, if you use Java SDK methods, that are as fast as ours, they always >> have a chance to get faster in later JDKs. So we should always prefer Java >> SDK methods, unless they are slower because their default impl is too >> generic or has too much safety checks or uses reflection. > > OK I'm convinced (I think!) that for primitive types only, let's use > Arrays.copyOf! > >> To come back to UnsafeByteArrayOutputStream: >> >> I would change the whole code, as I don’t like the allocation strategy in it >> (it's exponential, on every grow it doubles its size). We should change that >> to use ArrayUtils.grow() and ArrayUtils.oversize(), to have a similar >> allocation strategy like in trunk. Then we can discuss about this problem >> again when Simon & me wants to change ArrayUtils.grow methods to use >> Arrays.copyOf... *g* [just joking, I will never ask again, because this >> discussion here is endless and does not bring us forward]. > > Well, it sounds like for primitive types, we can cutover > ArrayUtils.grow methods. Then we can look @ the nightly bench the > next day ;) > > But I agree we should fix UnsafeByteArrayOutputStream... or, isn't it > (almost) a dup of ByteArrayDataOutput? > >> The other thing I don’t like in the new faceting module is duplication of >> vint code. Why don’t we change it to use DataInput/DataOutput and use >> Dawid's new In/OutStream wrapper for DataOutput everywhere. This would be >> much more streamlined with all the code we currently have. Then we can >> encode the payloads (or later docvalues) using the new >> UnsafeByteArrayOutputStream, wrapped with a OutputStreamDataOutput wrapper? >> Or maybe add a ByteArrayDataOutput class. > > That sounds good! > > Uwe can you commit TODOs to the code w/ these ideas? > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org