This is a pretty massive change to the apis. I wonder how nasty it would be to just support both paths. Have you evaluated how complex that would be?
On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > After more investigation, it looks like Float8Benchmarks at least on my > machine are within the range of noise. > > For BitVectorHelper I pushed a new commit [1], seems to bring the > BitVectorHelper benchmarks back inline (and even with some improvement for > getNullCountBenchmark). > > Benchmark Mode Cnt Score Error > Units > BitVectorHelperBenchmarks.allBitsNullBenchmark avgt 5 3.821 ± 0.031 > ns/op > BitVectorHelperBenchmarks.getNullCountBenchmark avgt 5 14.884 ± 0.141 > ns/op > > I applied the same pattern to other loops that I could find, and for any > "for (long" loop on the critical path, I broke it up into two loops. the > first loop does iteration by integer, the second finishes off for any long > values. As a side note it seems like optimization for loops using long > counters at least have a semi-recent open bug for the JVM [2] > > Thanks, > Micah > > [1] > > https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797 > [2] https://bugs.openjdk.java.net/browse/JDK-8223051 > > On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > Indeed, the BoundChecking and CheckNullForGet variables can make a big > > difference. I didn't initially run the benchmarks with these turned on > > (you can see the result from above with Float8Benchmarks). Here are new > > numbers including with the flags enabled. It looks like using longs > might > > be a little bit slower, I'll see what I can do to mitigate this. > > > > Ravindra also volunteered to try to benchmark the changes with Dremio's > > code on today's sync call. > > > > New > > > > Benchmark Mode Cnt Score > Error > > Units > > > > BitVectorHelperBenchmarks.allBitsNullBenchmark avgt 5 4.176 ± > 1.292 > > ns/op > > > > BitVectorHelperBenchmarks.getNullCountBenchmark avgt 5 26.102 ± > 0.700 > > ns/op > > > > Float8Benchmarks.copyFromBenchmark avgt 5 7.398 ± 0.084 us/op > > > > Float8Benchmarks.readWriteBenchmark avgt 5 2.711 ± 0.057 us/op > > > > > > > > old > > > > BitVectorHelperBenchmarks.allBitsNullBenchmark avgt 5 3.828 ± > 0.030 > > ns/op > > > > BitVectorHelperBenchmarks.getNullCountBenchmark avgt 5 20.611 ± > 0.188 > > ns/op > > > > Float8Benchmarks.copyFromBenchmark avgt 5 6.597 ± 0.462 us/op > > > > Float8Benchmarks.readWriteBenchmark avgt 5 2.615 ± 0.027 us/op > > > > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya <liya.fa...@gmail.com> wrote: > > > >> Hi Gonzalo, > >> > >> Thanks for sharing the performance results. > >> I am wondering if you have turned off the flag > >> BoundsChecking#BOUNDS_CHECKING_ENABLED. > >> If not, the lower throughput should be expected. > >> > >> Best, > >> Liya Fan > >> > >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield <emkornfi...@gmail.com> > >> wrote: > >> > >>> Hi Gonzalo, > >>> Thank you for the feedback. I wasn't aware of the JIT implications. > At > >>> least on the benchmark run they don't seem to have an impact. > >>> > >>> If there are other benchmarks that people have that can validate if > this > >>> change will be problematic I would appreciate trying to run them with > the > >>> PR. I will try to run the ones for zeroing/popcnt tonight to see if > >>> there > >>> is a change in those. > >>> > >>> -Micah > >>> > >>> > >>> > >>> On Wednesday, August 7, 2019, Gonzalo Ortiz Jaureguizar < > >>> golthir...@gmail.com> wrote: > >>> > >>> > I would recommend to take care with this kind of changes. > >>> > > >>> > I didn't try Arrow in more than one year, but by then the performance > >>> was > >>> > quite bad in comparison with plain byte buffer access > >>> > (see http://git.net/apache-arrow-development/msg02353.html *) and > >>> > there are several optimizations that the JVM (specifically, C2) does > >>> not > >>> > apply when dealing with int instead of longs. One of the > >>> > most commons is the loop unrolling and vectorization. > >>> > > >>> > * It doesn't seem the best way to reference an old email on the list, > >>> but > >>> > it is the only result shown by Google > >>> > > >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (<liya.fa...@gmail.com>) > >>> > escribió: > >>> > > >>> >> Hi Micah, > >>> >> > >>> >> Thanks for your effort. The performance result looks good. > >>> >> > >>> >> As you indicated, ArrowBuf will take additional 12 bytes (4 bytes > for > >>> each > >>> >> of length, write index, and read index). > >>> >> Similar overheads also exist for vectors like BaseFixedWidthVector, > >>> >> BaseVariableWidthVector, etc. > >>> >> > >>> >> IMO, such overheads are small enough to justify the change. > >>> >> Let's check if there are other overheads. > >>> >> > >>> >> Best, > >>> >> Liya Fan > >>> >> > >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield < > emkornfi...@gmail.com > >>> > > >>> >> wrote: > >>> >> > >>> >> > Hi Liya Fan, > >>> >> > Based on the Float8Benchmark there does not seem to be any > >>> meaningful > >>> >> > performance difference on my machine. At least for me, the > >>> benchmarks > >>> >> are > >>> >> > not stable enough to say one is faster than the other (I've pasted > >>> >> results > >>> >> > below). That being said my machine isn't necessarily the most > >>> reliable > >>> >> for > >>> >> > benchmarking. > >>> >> > > >>> >> > On an intuitive level, this makes sense to me, for the most part > it > >>> >> seems > >>> >> > like the change just moves casting from "int" to "long" further up > >>> the > >>> >> > stack for "PlatformDepdendent" operations. If there are other > >>> >> benchmarks > >>> >> > that you think are worth running let me know. > >>> >> > > >>> >> > One downside performance wise I think for his change is it > >>> increases the > >>> >> > size of ArrowBuf objects, which I suppose could influence cache > >>> misses > >>> >> at > >>> >> > some level or increase the size of call-stacks, but this doesn't > >>> seem to > >>> >> > show up in the benchmark.. > >>> >> > > >>> >> > Thanks, > >>> >> > Micah > >>> >> > > >>> >> > Sample benchmark numbers: > >>> >> > > >>> >> > [New Code] > >>> >> > Benchmark Mode Cnt Score Error > >>> Units > >>> >> > Float8Benchmarks.copyFromBenchmark avgt 5 15.441 ± 0.469 > >>> us/op > >>> >> > Float8Benchmarks.readWriteBenchmark avgt 5 14.057 ± 0.115 > >>> us/op > >>> >> > > >>> >> > > >>> >> > [Old code] > >>> >> > Benchmark Mode Cnt Score Error > >>> Units > >>> >> > Float8Benchmarks.copyFromBenchmark avgt 5 16.248 ± 1.409 > >>> us/op > >>> >> > Float8Benchmarks.readWriteBenchmark avgt 5 14.150 ± 0.084 > >>> us/op > >>> >> > > >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya <liya.fa...@gmail.com> > >>> wrote: > >>> >> > > >>> >> >> Hi Micah, > >>> >> >> > >>> >> >> Thanks a lot for doing this. > >>> >> >> > >>> >> >> I am a little concerned about if there is any negative > performance > >>> >> impact > >>> >> >> on the current 32-bit-length based applications. > >>> >> >> Can we do some performance comparison on our existing benchmarks? > >>> >> >> > >>> >> >> Best, > >>> >> >> Liya Fan > >>> >> >> > >>> >> >> > >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield < > >>> emkornfi...@gmail.com> > >>> >> >> wrote: > >>> >> >> > >>> >> >>> There have been some previous discussions on the mailing about > >>> >> supporting > >>> >> >>> 64-bit lengths for Java ValueVectors (this is what the IPC > >>> >> specification > >>> >> >>> and C++ support). I created a PR [1] that changes all APIs > that I > >>> >> could > >>> >> >>> find that take an index to take an "long" instead of an "int" > (and > >>> >> >>> similarly change "size/rowcount" APIs). > >>> >> >>> > >>> >> >>> It is a big change, so I think it is worth discussing if it is > >>> >> something > >>> >> >>> we > >>> >> >>> still want to move forward with. It would be nice to come to a > >>> >> >>> conclusion > >>> >> >>> quickly, ideally in the next few days, to avoid a lot of merge > >>> >> conflicts. > >>> >> >>> > >>> >> >>> The reason I did this work now is the C++ implementation has > added > >>> >> >>> support > >>> >> >>> for LargeList, LargeBinary and LargeString arrays and based on > >>> prior > >>> >> >>> discussions we need to have similar support in Java before our > >>> next > >>> >> >>> release. Support 64-bit indexes means we can have full > >>> compatibility > >>> >> and > >>> >> >>> make the most use of the types in Java. > >>> >> >>> > >>> >> >>> Look forward to hearing feedback. > >>> >> >>> > >>> >> >>> Thanks, > >>> >> >>> Micah > >>> >> >>> > >>> >> >>> [1] https://github.com/apache/arrow/pull/5020 > >>> >> >>> > >>> >> >> > >>> >> > >>> > > >>> > >> >