Indeed, the BoundChecking and CheckNullForGet variables can make a big difference. I didn't initially run the benchmarks with these turned on (you can see the result from above with Float8Benchmarks). Here are new numbers including with the flags enabled. It looks like using longs might be a little bit slower, I'll see what I can do to mitigate this.
Ravindra also volunteered to try to benchmark the changes with Dremio's code on today's sync call. New Benchmark Mode Cnt Score Error Units BitVectorHelperBenchmarks.allBitsNullBenchmark avgt 5 4.176 ± 1.292 ns/op BitVectorHelperBenchmarks.getNullCountBenchmark avgt 5 26.102 ± 0.700 ns/op Float8Benchmarks.copyFromBenchmark avgt 5 7.398 ± 0.084 us/op Float8Benchmarks.readWriteBenchmark avgt 5 2.711 ± 0.057 us/op old BitVectorHelperBenchmarks.allBitsNullBenchmark avgt 5 3.828 ± 0.030 ns/op BitVectorHelperBenchmarks.getNullCountBenchmark avgt 5 20.611 ± 0.188 ns/op Float8Benchmarks.copyFromBenchmark avgt 5 6.597 ± 0.462 us/op Float8Benchmarks.readWriteBenchmark avgt 5 2.615 ± 0.027 us/op On Wed, Aug 7, 2019 at 7:13 PM Fan Liya <liya.fa...@gmail.com> wrote: > Hi Gonzalo, > > Thanks for sharing the performance results. > I am wondering if you have turned off the flag > BoundsChecking#BOUNDS_CHECKING_ENABLED. > If not, the lower throughput should be expected. > > Best, > Liya Fan > > On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > >> Hi Gonzalo, >> Thank you for the feedback. I wasn't aware of the JIT implications. At >> least on the benchmark run they don't seem to have an impact. >> >> If there are other benchmarks that people have that can validate if this >> change will be problematic I would appreciate trying to run them with the >> PR. I will try to run the ones for zeroing/popcnt tonight to see if there >> is a change in those. >> >> -Micah >> >> >> >> On Wednesday, August 7, 2019, Gonzalo Ortiz Jaureguizar < >> golthir...@gmail.com> wrote: >> >> > I would recommend to take care with this kind of changes. >> > >> > I didn't try Arrow in more than one year, but by then the performance >> was >> > quite bad in comparison with plain byte buffer access >> > (see http://git.net/apache-arrow-development/msg02353.html *) and >> > there are several optimizations that the JVM (specifically, C2) does not >> > apply when dealing with int instead of longs. One of the >> > most commons is the loop unrolling and vectorization. >> > >> > * It doesn't seem the best way to reference an old email on the list, >> but >> > it is the only result shown by Google >> > >> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (<liya.fa...@gmail.com>) >> > escribió: >> > >> >> Hi Micah, >> >> >> >> Thanks for your effort. The performance result looks good. >> >> >> >> As you indicated, ArrowBuf will take additional 12 bytes (4 bytes for >> each >> >> of length, write index, and read index). >> >> Similar overheads also exist for vectors like BaseFixedWidthVector, >> >> BaseVariableWidthVector, etc. >> >> >> >> IMO, such overheads are small enough to justify the change. >> >> Let's check if there are other overheads. >> >> >> >> Best, >> >> Liya Fan >> >> >> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield <emkornfi...@gmail.com> >> >> wrote: >> >> >> >> > Hi Liya Fan, >> >> > Based on the Float8Benchmark there does not seem to be any meaningful >> >> > performance difference on my machine. At least for me, the >> benchmarks >> >> are >> >> > not stable enough to say one is faster than the other (I've pasted >> >> results >> >> > below). That being said my machine isn't necessarily the most >> reliable >> >> for >> >> > benchmarking. >> >> > >> >> > On an intuitive level, this makes sense to me, for the most part it >> >> seems >> >> > like the change just moves casting from "int" to "long" further up >> the >> >> > stack for "PlatformDepdendent" operations. If there are other >> >> benchmarks >> >> > that you think are worth running let me know. >> >> > >> >> > One downside performance wise I think for his change is it increases >> the >> >> > size of ArrowBuf objects, which I suppose could influence cache >> misses >> >> at >> >> > some level or increase the size of call-stacks, but this doesn't >> seem to >> >> > show up in the benchmark.. >> >> > >> >> > Thanks, >> >> > Micah >> >> > >> >> > Sample benchmark numbers: >> >> > >> >> > [New Code] >> >> > Benchmark Mode Cnt Score Error Units >> >> > Float8Benchmarks.copyFromBenchmark avgt 5 15.441 ± 0.469 us/op >> >> > Float8Benchmarks.readWriteBenchmark avgt 5 14.057 ± 0.115 us/op >> >> > >> >> > >> >> > [Old code] >> >> > Benchmark Mode Cnt Score Error Units >> >> > Float8Benchmarks.copyFromBenchmark avgt 5 16.248 ± 1.409 us/op >> >> > Float8Benchmarks.readWriteBenchmark avgt 5 14.150 ± 0.084 us/op >> >> > >> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya <liya.fa...@gmail.com> >> wrote: >> >> > >> >> >> Hi Micah, >> >> >> >> >> >> Thanks a lot for doing this. >> >> >> >> >> >> I am a little concerned about if there is any negative performance >> >> impact >> >> >> on the current 32-bit-length based applications. >> >> >> Can we do some performance comparison on our existing benchmarks? >> >> >> >> >> >> Best, >> >> >> Liya Fan >> >> >> >> >> >> >> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield < >> emkornfi...@gmail.com> >> >> >> wrote: >> >> >> >> >> >>> There have been some previous discussions on the mailing about >> >> supporting >> >> >>> 64-bit lengths for Java ValueVectors (this is what the IPC >> >> specification >> >> >>> and C++ support). I created a PR [1] that changes all APIs that I >> >> could >> >> >>> find that take an index to take an "long" instead of an "int" (and >> >> >>> similarly change "size/rowcount" APIs). >> >> >>> >> >> >>> It is a big change, so I think it is worth discussing if it is >> >> something >> >> >>> we >> >> >>> still want to move forward with. It would be nice to come to a >> >> >>> conclusion >> >> >>> quickly, ideally in the next few days, to avoid a lot of merge >> >> conflicts. >> >> >>> >> >> >>> The reason I did this work now is the C++ implementation has added >> >> >>> support >> >> >>> for LargeList, LargeBinary and LargeString arrays and based on >> prior >> >> >>> discussions we need to have similar support in Java before our next >> >> >>> release. Support 64-bit indexes means we can have full >> compatibility >> >> and >> >> >>> make the most use of the types in Java. >> >> >>> >> >> >>> Look forward to hearing feedback. >> >> >>> >> >> >>> Thanks, >> >> >>> Micah >> >> >>> >> >> >>> [1] https://github.com/apache/arrow/pull/5020 >> >> >>> >> >> >> >> >> >> > >> >