After more investigation, it looks like Float8Benchmarks at least on my
machine are within the range of noise.

For BitVectorHelper I pushed a new commit [1], seems to bring the
BitVectorHelper benchmarks back inline (and even with some improvement for
getNullCountBenchmark).

Benchmark                                        Mode  Cnt   Score   Error
 Units
BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt    5   3.821 ± 0.031
 ns/op
BitVectorHelperBenchmarks.getNullCountBenchmark  avgt    5  14.884 ± 0.141
 ns/op

I applied the same pattern to other loops that I could find, and for any
"for (long" loop on the critical path, I broke it up into two loops.  the
first loop does iteration by integer, the second finishes off for any long
values.  As a side note it seems like optimization for loops using long
counters at least have a semi-recent open bug for the JVM [2]

Thanks,
Micah

[1]
https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797
[2] https://bugs.openjdk.java.net/browse/JDK-8223051

On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> Indeed, the BoundChecking and CheckNullForGet variables can make a big
> difference.  I didn't initially run the benchmarks with these turned on
> (you can see the result from above with Float8Benchmarks).  Here are new
> numbers including with the flags enabled.  It looks like using longs might
> be a little bit slower, I'll see what I can do to mitigate this.
>
> Ravindra also volunteered to try to benchmark the changes with Dremio's
> code on today's sync call.
>
> New
>
> Benchmark                                        Mode  Cnt   Score   Error
> Units
>
> BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt    5   4.176 ± 1.292
> ns/op
>
> BitVectorHelperBenchmarks.getNullCountBenchmark  avgt    5  26.102 ± 0.700
> ns/op
>
> Float8Benchmarks.copyFromBenchmark   avgt    5  7.398 ± 0.084  us/op
>
> Float8Benchmarks.readWriteBenchmark  avgt    5  2.711 ± 0.057  us/op
>
>
>
> old
>
> BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt    5   3.828 ± 0.030
> ns/op
>
> BitVectorHelperBenchmarks.getNullCountBenchmark  avgt    5  20.611 ± 0.188
> ns/op
>
> Float8Benchmarks.copyFromBenchmark   avgt    5  6.597 ± 0.462  us/op
>
> Float8Benchmarks.readWriteBenchmark  avgt    5  2.615 ± 0.027  us/op
>
> On Wed, Aug 7, 2019 at 7:13 PM Fan Liya <liya.fa...@gmail.com> wrote:
>
>> Hi Gonzalo,
>>
>> Thanks for sharing the performance results.
>> I am wondering if you have turned off the flag
>> BoundsChecking#BOUNDS_CHECKING_ENABLED.
>> If not, the lower throughput should be expected.
>>
>> Best,
>> Liya Fan
>>
>> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield <emkornfi...@gmail.com>
>> wrote:
>>
>>> Hi Gonzalo,
>>> Thank you for the feedback.  I wasn't aware of the JIT implications.   At
>>> least on the benchmark run they don't seem to have an impact.
>>>
>>> If there are other benchmarks that people have that can validate if this
>>> change will be problematic I would appreciate trying to run them with the
>>> PR.  I will try to run the ones for zeroing/popcnt tonight to see if
>>> there
>>> is a change in those.
>>>
>>> -Micah
>>>
>>>
>>>
>>> On Wednesday, August 7, 2019, Gonzalo Ortiz Jaureguizar <
>>> golthir...@gmail.com> wrote:
>>>
>>> > I would recommend to take care with this kind of changes.
>>> >
>>> > I didn't try Arrow in more than one year, but by then the performance
>>> was
>>> > quite bad in comparison with plain byte buffer access
>>> > (see http://git.net/apache-arrow-development/msg02353.html *) and
>>> > there are several optimizations that the JVM (specifically, C2) does
>>> not
>>> > apply when dealing with int instead of longs. One of the
>>> > most commons is the loop unrolling and vectorization.
>>> >
>>> > * It doesn't seem the best way to reference an old email on the list,
>>> but
>>> > it is the only result shown by Google
>>> >
>>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (<liya.fa...@gmail.com>)
>>> > escribió:
>>> >
>>> >> Hi Micah,
>>> >>
>>> >> Thanks for your effort. The performance result looks good.
>>> >>
>>> >> As you indicated, ArrowBuf will take additional 12 bytes (4 bytes for
>>> each
>>> >> of length, write index, and read index).
>>> >> Similar overheads also exist for vectors like BaseFixedWidthVector,
>>> >> BaseVariableWidthVector, etc.
>>> >>
>>> >> IMO, such overheads are small enough to justify the change.
>>> >> Let's check if there are other overheads.
>>> >>
>>> >> Best,
>>> >> Liya Fan
>>> >>
>>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield <emkornfi...@gmail.com
>>> >
>>> >> wrote:
>>> >>
>>> >> > Hi Liya Fan,
>>> >> > Based on the Float8Benchmark there does not seem to be any
>>> meaningful
>>> >> > performance difference on my machine.  At least for me, the
>>> benchmarks
>>> >> are
>>> >> > not stable enough to say one is faster than the other (I've pasted
>>> >> results
>>> >> > below).  That being said my machine isn't necessarily the most
>>> reliable
>>> >> for
>>> >> > benchmarking.
>>> >> >
>>> >> > On an intuitive level, this makes sense to me,  for the most part it
>>> >> seems
>>> >> > like the change just moves casting from "int" to "long" further up
>>> the
>>> >> > stack  for  "PlatformDepdendent" operations.  If there are other
>>> >> benchmarks
>>> >> > that you think are worth running let me know.
>>> >> >
>>> >> > One downside performance wise I think for his change is it
>>> increases the
>>> >> > size of ArrowBuf objects, which I suppose could influence cache
>>> misses
>>> >> at
>>> >> > some level or increase the size of call-stacks, but this doesn't
>>> seem to
>>> >> > show up in the benchmark..
>>> >> >
>>> >> > Thanks,
>>> >> > Micah
>>> >> >
>>> >> > Sample benchmark numbers:
>>> >> >
>>> >> > [New Code]
>>> >> > Benchmark                            Mode  Cnt   Score   Error
>>> Units
>>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5  15.441 ± 0.469
>>> us/op
>>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5  14.057 ± 0.115
>>> us/op
>>> >> >
>>> >> >
>>> >> > [Old code]
>>> >> > Benchmark                            Mode  Cnt   Score   Error
>>> Units
>>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5  16.248 ± 1.409
>>> us/op
>>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5  14.150 ± 0.084
>>> us/op
>>> >> >
>>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya <liya.fa...@gmail.com>
>>> wrote:
>>> >> >
>>> >> >> Hi Micah,
>>> >> >>
>>> >> >> Thanks a lot for doing this.
>>> >> >>
>>> >> >> I am a little concerned about if there is any negative performance
>>> >> impact
>>> >> >> on the current 32-bit-length based applications.
>>> >> >> Can we do some performance comparison on our existing benchmarks?
>>> >> >>
>>> >> >> Best,
>>> >> >> Liya Fan
>>> >> >>
>>> >> >>
>>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield <
>>> emkornfi...@gmail.com>
>>> >> >> wrote:
>>> >> >>
>>> >> >>> There have been some previous discussions on the mailing about
>>> >> supporting
>>> >> >>> 64-bit lengths for  Java ValueVectors (this is what the IPC
>>> >> specification
>>> >> >>> and C++ support).  I created a PR [1] that changes all APIs that I
>>> >> could
>>> >> >>> find that take an index to take an "long" instead of an "int" (and
>>> >> >>> similarly change "size/rowcount" APIs).
>>> >> >>>
>>> >> >>> It is a big change, so I think it is worth discussing if it is
>>> >> something
>>> >> >>> we
>>> >> >>> still want to move forward with.  It would be nice to come to a
>>> >> >>> conclusion
>>> >> >>> quickly, ideally in the next few days, to avoid a lot of merge
>>> >> conflicts.
>>> >> >>>
>>> >> >>> The reason I did this work now is the C++ implementation has added
>>> >> >>> support
>>> >> >>> for LargeList, LargeBinary and LargeString arrays and based on
>>> prior
>>> >> >>> discussions we need to have similar support in Java before our
>>> next
>>> >> >>> release. Support 64-bit indexes means we can have full
>>> compatibility
>>> >> and
>>> >> >>> make the most use of the types in Java.
>>> >> >>>
>>> >> >>> Look forward to hearing feedback.
>>> >> >>>
>>> >> >>> Thanks,
>>> >> >>> Micah
>>> >> >>>
>>> >> >>> [1] https://github.com/apache/arrow/pull/5020
>>> >> >>>
>>> >> >>
>>> >>
>>> >
>>>
>>

Reply via email to