This is a pretty massive change to the apis. I wonder how nasty it would be
to just support both paths. Have you evaluated how complex that would be?

On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> After more investigation, it looks like Float8Benchmarks at least on my
> machine are within the range of noise.
>
> For BitVectorHelper I pushed a new commit [1], seems to bring the
> BitVectorHelper benchmarks back inline (and even with some improvement for
> getNullCountBenchmark).
>
> Benchmark                                        Mode  Cnt   Score   Error
>  Units
> BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt    5   3.821 ± 0.031
>  ns/op
> BitVectorHelperBenchmarks.getNullCountBenchmark  avgt    5  14.884 ± 0.141
>  ns/op
>
> I applied the same pattern to other loops that I could find, and for any
> "for (long" loop on the critical path, I broke it up into two loops.  the
> first loop does iteration by integer, the second finishes off for any long
> values.  As a side note it seems like optimization for loops using long
> counters at least have a semi-recent open bug for the JVM [2]
>
> Thanks,
> Micah
>
> [1]
>
> https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797
> [2] https://bugs.openjdk.java.net/browse/JDK-8223051
>
> On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
> > Indeed, the BoundChecking and CheckNullForGet variables can make a big
> > difference.  I didn't initially run the benchmarks with these turned on
> > (you can see the result from above with Float8Benchmarks).  Here are new
> > numbers including with the flags enabled.  It looks like using longs
> might
> > be a little bit slower, I'll see what I can do to mitigate this.
> >
> > Ravindra also volunteered to try to benchmark the changes with Dremio's
> > code on today's sync call.
> >
> > New
> >
> > Benchmark                                        Mode  Cnt   Score
>  Error
> > Units
> >
> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt    5   4.176 ±
> 1.292
> > ns/op
> >
> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt    5  26.102 ±
> 0.700
> > ns/op
> >
> > Float8Benchmarks.copyFromBenchmark   avgt    5  7.398 ± 0.084  us/op
> >
> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.711 ± 0.057  us/op
> >
> >
> >
> > old
> >
> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt    5   3.828 ±
> 0.030
> > ns/op
> >
> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt    5  20.611 ±
> 0.188
> > ns/op
> >
> > Float8Benchmarks.copyFromBenchmark   avgt    5  6.597 ± 0.462  us/op
> >
> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.615 ± 0.027  us/op
> >
> > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya <liya.fa...@gmail.com> wrote:
> >
> >> Hi Gonzalo,
> >>
> >> Thanks for sharing the performance results.
> >> I am wondering if you have turned off the flag
> >> BoundsChecking#BOUNDS_CHECKING_ENABLED.
> >> If not, the lower throughput should be expected.
> >>
> >> Best,
> >> Liya Fan
> >>
> >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield <emkornfi...@gmail.com>
> >> wrote:
> >>
> >>> Hi Gonzalo,
> >>> Thank you for the feedback.  I wasn't aware of the JIT implications.
>  At
> >>> least on the benchmark run they don't seem to have an impact.
> >>>
> >>> If there are other benchmarks that people have that can validate if
> this
> >>> change will be problematic I would appreciate trying to run them with
> the
> >>> PR.  I will try to run the ones for zeroing/popcnt tonight to see if
> >>> there
> >>> is a change in those.
> >>>
> >>> -Micah
> >>>
> >>>
> >>>
> >>> On Wednesday, August 7, 2019, Gonzalo Ortiz Jaureguizar <
> >>> golthir...@gmail.com> wrote:
> >>>
> >>> > I would recommend to take care with this kind of changes.
> >>> >
> >>> > I didn't try Arrow in more than one year, but by then the performance
> >>> was
> >>> > quite bad in comparison with plain byte buffer access
> >>> > (see http://git.net/apache-arrow-development/msg02353.html *) and
> >>> > there are several optimizations that the JVM (specifically, C2) does
> >>> not
> >>> > apply when dealing with int instead of longs. One of the
> >>> > most commons is the loop unrolling and vectorization.
> >>> >
> >>> > * It doesn't seem the best way to reference an old email on the list,
> >>> but
> >>> > it is the only result shown by Google
> >>> >
> >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (<liya.fa...@gmail.com>)
> >>> > escribió:
> >>> >
> >>> >> Hi Micah,
> >>> >>
> >>> >> Thanks for your effort. The performance result looks good.
> >>> >>
> >>> >> As you indicated, ArrowBuf will take additional 12 bytes (4 bytes
> for
> >>> each
> >>> >> of length, write index, and read index).
> >>> >> Similar overheads also exist for vectors like BaseFixedWidthVector,
> >>> >> BaseVariableWidthVector, etc.
> >>> >>
> >>> >> IMO, such overheads are small enough to justify the change.
> >>> >> Let's check if there are other overheads.
> >>> >>
> >>> >> Best,
> >>> >> Liya Fan
> >>> >>
> >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield <
> emkornfi...@gmail.com
> >>> >
> >>> >> wrote:
> >>> >>
> >>> >> > Hi Liya Fan,
> >>> >> > Based on the Float8Benchmark there does not seem to be any
> >>> meaningful
> >>> >> > performance difference on my machine.  At least for me, the
> >>> benchmarks
> >>> >> are
> >>> >> > not stable enough to say one is faster than the other (I've pasted
> >>> >> results
> >>> >> > below).  That being said my machine isn't necessarily the most
> >>> reliable
> >>> >> for
> >>> >> > benchmarking.
> >>> >> >
> >>> >> > On an intuitive level, this makes sense to me,  for the most part
> it
> >>> >> seems
> >>> >> > like the change just moves casting from "int" to "long" further up
> >>> the
> >>> >> > stack  for  "PlatformDepdendent" operations.  If there are other
> >>> >> benchmarks
> >>> >> > that you think are worth running let me know.
> >>> >> >
> >>> >> > One downside performance wise I think for his change is it
> >>> increases the
> >>> >> > size of ArrowBuf objects, which I suppose could influence cache
> >>> misses
> >>> >> at
> >>> >> > some level or increase the size of call-stacks, but this doesn't
> >>> seem to
> >>> >> > show up in the benchmark..
> >>> >> >
> >>> >> > Thanks,
> >>> >> > Micah
> >>> >> >
> >>> >> > Sample benchmark numbers:
> >>> >> >
> >>> >> > [New Code]
> >>> >> > Benchmark                            Mode  Cnt   Score   Error
> >>> Units
> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5  15.441 ± 0.469
> >>> us/op
> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5  14.057 ± 0.115
> >>> us/op
> >>> >> >
> >>> >> >
> >>> >> > [Old code]
> >>> >> > Benchmark                            Mode  Cnt   Score   Error
> >>> Units
> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5  16.248 ± 1.409
> >>> us/op
> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5  14.150 ± 0.084
> >>> us/op
> >>> >> >
> >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya <liya.fa...@gmail.com>
> >>> wrote:
> >>> >> >
> >>> >> >> Hi Micah,
> >>> >> >>
> >>> >> >> Thanks a lot for doing this.
> >>> >> >>
> >>> >> >> I am a little concerned about if there is any negative
> performance
> >>> >> impact
> >>> >> >> on the current 32-bit-length based applications.
> >>> >> >> Can we do some performance comparison on our existing benchmarks?
> >>> >> >>
> >>> >> >> Best,
> >>> >> >> Liya Fan
> >>> >> >>
> >>> >> >>
> >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield <
> >>> emkornfi...@gmail.com>
> >>> >> >> wrote:
> >>> >> >>
> >>> >> >>> There have been some previous discussions on the mailing about
> >>> >> supporting
> >>> >> >>> 64-bit lengths for  Java ValueVectors (this is what the IPC
> >>> >> specification
> >>> >> >>> and C++ support).  I created a PR [1] that changes all APIs
> that I
> >>> >> could
> >>> >> >>> find that take an index to take an "long" instead of an "int"
> (and
> >>> >> >>> similarly change "size/rowcount" APIs).
> >>> >> >>>
> >>> >> >>> It is a big change, so I think it is worth discussing if it is
> >>> >> something
> >>> >> >>> we
> >>> >> >>> still want to move forward with.  It would be nice to come to a
> >>> >> >>> conclusion
> >>> >> >>> quickly, ideally in the next few days, to avoid a lot of merge
> >>> >> conflicts.
> >>> >> >>>
> >>> >> >>> The reason I did this work now is the C++ implementation has
> added
> >>> >> >>> support
> >>> >> >>> for LargeList, LargeBinary and LargeString arrays and based on
> >>> prior
> >>> >> >>> discussions we need to have similar support in Java before our
> >>> next
> >>> >> >>> release. Support 64-bit indexes means we can have full
> >>> compatibility
> >>> >> and
> >>> >> >>> make the most use of the types in Java.
> >>> >> >>>
> >>> >> >>> Look forward to hearing feedback.
> >>> >> >>>
> >>> >> >>> Thanks,
> >>> >> >>> Micah
> >>> >> >>>
> >>> >> >>> [1] https://github.com/apache/arrow/pull/5020
> >>> >> >>>
> >>> >> >>
> >>> >>
> >>> >
> >>>
> >>
>

Reply via email to