Re: [Discuss][Java] 64-bit lengths for ValueVectors

Micah Kornfield Sat, 10 Aug 2019 17:39:48 -0700

Hi Jacques,
I definitely understand these concerns and this change is risky because it
is so large.  Perhaps, creating a new hierarchy, might be the cleanest way
of dealing with this.  This could have other benefits like cleaning up some
cruft around dictionary encode and "orphaned" method.   Per past e-mail
threads I agree it is beneficial to have 2 separate reference
implementations that can communicate fully, and my intent here was to close
that gap.


Trying to
> determine the ramifications of these changes would be challenging and time
> consuming against all the different ways we interact with the Arrow Java
> library.


Understood.  I took a quick look at Dremio-OSS it seems like it has a
simple java build system?  If it is helpful, I can try to get a fork
running that at least compiles against this PR.  My plan would be to cast
any place that was changed to return a long back to an int, so in essence
the Dremio algorithms would reman 32-bit implementations.

I don't  have the infrastructure to test this change properly from a
distributed systems perspective, so it would still take some time from
Dremio to validate for regressions.

I'm not saying I'm against this but want to make sure we've
> explored all less disruptive options before considering changing something
> this fundamental (especially when I generally hold the view that large cell
> counts against massive contiguous memory is an anti pattern to scalable
> analytical processing--purely subjective of course).


I'm open to other ideas here, as well. I don't think it is out of the
question to leave the Java implementation as 32-bit, but if we do, then I
think we should consider a different strategy for reference implementations.

Thanks,
Micah

On Sat, Aug 10, 2019 at 5:09 PM Jacques Nadeau <jacq...@apache.org> wrote:

> Hey Micah, I didn't have a particular path in mind. Was thinking more along
> the lines of extra methods as opposed to separate classes.
>
> Arrow hasn't historically been a place where we're writing algorithms in
> Java so the fact that they aren't there doesn't mean they don't exist. We
> have a large amount of code that depends on the current behavior that is
> deployed in hundreds of customer clusters (you can peruse our dremio repo
> to see how extensively we leverage Arrow if interested). Trying to
> determine the ramifications of these changes would be challenging and time
> consuming against all the different ways we interact with the Arrow Java
> library. I'm not saying I'm against this but want to make sure we've
> explored all less disruptive options before considering changing something
> this fundamental (especially when I generally hold the view that large cell
> counts against massive contiguous memory is an anti pattern to scalable
> analytical processing--purely subjective of course).
>
> On Sat, Aug 10, 2019, 4:17 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
> > Hi Jacques,
> > What avenue were you thinking for supporting both paths?   I didn't want
> > to pursue a different class hierarchy, because I felt like that would
> > effectively fork the code base, but that is potentially an option that
> > would allow us to have a complete reference implementation in Java that
> can
> > fully interact with C++, without major changes to this code.
> >
> > For supporting both APIs on the same classes/interfaces, I think they
> > roughly fall into three categories, changes to input parameters, changes
> to
> > output parameters and algorithm changes.
> >
> > For inputs, changing from int to long is essentially a no-op from the
> > compiler perspective.  From the limited micro-benchmarking this also
> > doesn't seem to have a performance impact.  So we could keep two versions
> > of the methods that only differ on inputs, but it is not clear what the
> > value of that would be.
> >
> > For outputs, we can't support methods "long getLength()" and "int
> > getLength()" in the same class, so we would be forced into something like
> > "long getLength(boolean dummy)" which I think is a less desirable.
> >
> > For algorithm changes, there did not appear to be too many places where
> we
> > actually loop over all elements (it is quite possible I missed something
> > here), the ones that I did find I was able to mitigate performance
> > penalties as noted above.  Some of the current implementation will get a
> > lot slower for "large arrays", but we can likely fix those later or in
> this
> > PR with a nested while loop instead of 2 for loops.
> >
> > Thanks,
> > Micah
> >
> >
> > On Saturday, August 10, 2019, Jacques Nadeau <jacq...@apache.org> wrote:
> >
> >> This is a pretty massive change to the apis. I wonder how nasty it would
> >> be to just support both paths. Have you evaluated how complex that
> would be?
> >>
> >> On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield <emkornfi...@gmail.com>
> >> wrote:
> >>
> >>> After more investigation, it looks like Float8Benchmarks at least on my
> >>> machine are within the range of noise.
> >>>
> >>> For BitVectorHelper I pushed a new commit [1], seems to bring the
> >>> BitVectorHelper benchmarks back inline (and even with some improvement
> >>> for
> >>> getNullCountBenchmark).
> >>>
> >>> Benchmark                                        Mode  Cnt   Score
> >>>  Error
> >>>  Units
> >>> BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt    5   3.821 ±
> >>> 0.031
> >>>  ns/op
> >>> BitVectorHelperBenchmarks.getNullCountBenchmark  avgt    5  14.884 ±
> >>> 0.141
> >>>  ns/op
> >>>
> >>> I applied the same pattern to other loops that I could find, and for
> any
> >>> "for (long" loop on the critical path, I broke it up into two loops.
> the
> >>> first loop does iteration by integer, the second finishes off for any
> >>> long
> >>> values.  As a side note it seems like optimization for loops using long
> >>> counters at least have a semi-recent open bug for the JVM [2]
> >>>
> >>> Thanks,
> >>> Micah
> >>>
> >>> [1]
> >>>
> >>>
> https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797
> >>> [2] https://bugs.openjdk.java.net/browse/JDK-8223051
> >>>
> >>> On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield <emkornfi...@gmail.com>
> >>> wrote:
> >>>
> >>> > Indeed, the BoundChecking and CheckNullForGet variables can make a
> big
> >>> > difference.  I didn't initially run the benchmarks with these turned
> on
> >>> > (you can see the result from above with Float8Benchmarks).  Here are
> >>> new
> >>> > numbers including with the flags enabled.  It looks like using longs
> >>> might
> >>> > be a little bit slower, I'll see what I can do to mitigate this.
> >>> >
> >>> > Ravindra also volunteered to try to benchmark the changes with
> Dremio's
> >>> > code on today's sync call.
> >>> >
> >>> > New
> >>> >
> >>> > Benchmark                                        Mode  Cnt   Score
> >>>  Error
> >>> > Units
> >>> >
> >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt    5   4.176 ±
> >>> 1.292
> >>> > ns/op
> >>> >
> >>> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt    5  26.102 ±
> >>> 0.700
> >>> > ns/op
> >>> >
> >>> > Float8Benchmarks.copyFromBenchmark   avgt    5  7.398 ± 0.084  us/op
> >>> >
> >>> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.711 ± 0.057  us/op
> >>> >
> >>> >
> >>> >
> >>> > old
> >>> >
> >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt    5   3.828 ±
> >>> 0.030
> >>> > ns/op
> >>> >
> >>> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt    5  20.611 ±
> >>> 0.188
> >>> > ns/op
> >>> >
> >>> > Float8Benchmarks.copyFromBenchmark   avgt    5  6.597 ± 0.462  us/op
> >>> >
> >>> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.615 ± 0.027  us/op
> >>> >
> >>> > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya <liya.fa...@gmail.com>
> wrote:
> >>> >
> >>> >> Hi Gonzalo,
> >>> >>
> >>> >> Thanks for sharing the performance results.
> >>> >> I am wondering if you have turned off the flag
> >>> >> BoundsChecking#BOUNDS_CHECKING_ENABLED.
> >>> >> If not, the lower throughput should be expected.
> >>> >>
> >>> >> Best,
> >>> >> Liya Fan
> >>> >>
> >>> >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield <
> >>> emkornfi...@gmail.com>
> >>> >> wrote:
> >>> >>
> >>> >>> Hi Gonzalo,
> >>> >>> Thank you for the feedback.  I wasn't aware of the JIT
> >>> implications.   At
> >>> >>> least on the benchmark run they don't seem to have an impact.
> >>> >>>
> >>> >>> If there are other benchmarks that people have that can validate if
> >>> this
> >>> >>> change will be problematic I would appreciate trying to run them
> >>> with the
> >>> >>> PR.  I will try to run the ones for zeroing/popcnt tonight to see
> if
> >>> >>> there
> >>> >>> is a change in those.
> >>> >>>
> >>> >>> -Micah
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> On Wednesday, August 7, 2019, Gonzalo Ortiz Jaureguizar <
> >>> >>> golthir...@gmail.com> wrote:
> >>> >>>
> >>> >>> > I would recommend to take care with this kind of changes.
> >>> >>> >
> >>> >>> > I didn't try Arrow in more than one year, but by then the
> >>> performance
> >>> >>> was
> >>> >>> > quite bad in comparison with plain byte buffer access
> >>> >>> > (see http://git.net/apache-arrow-development/msg02353.html *)
> and
> >>> >>> > there are several optimizations that the JVM (specifically, C2)
> >>> does
> >>> >>> not
> >>> >>> > apply when dealing with int instead of longs. One of the
> >>> >>> > most commons is the loop unrolling and vectorization.
> >>> >>> >
> >>> >>> > * It doesn't seem the best way to reference an old email on the
> >>> list,
> >>> >>> but
> >>> >>> > it is the only result shown by Google
> >>> >>> >
> >>> >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (<
> liya.fa...@gmail.com
> >>> >)
> >>> >>> > escribió:
> >>> >>> >
> >>> >>> >> Hi Micah,
> >>> >>> >>
> >>> >>> >> Thanks for your effort. The performance result looks good.
> >>> >>> >>
> >>> >>> >> As you indicated, ArrowBuf will take additional 12 bytes (4
> bytes
> >>> for
> >>> >>> each
> >>> >>> >> of length, write index, and read index).
> >>> >>> >> Similar overheads also exist for vectors like
> >>> BaseFixedWidthVector,
> >>> >>> >> BaseVariableWidthVector, etc.
> >>> >>> >>
> >>> >>> >> IMO, such overheads are small enough to justify the change.
> >>> >>> >> Let's check if there are other overheads.
> >>> >>> >>
> >>> >>> >> Best,
> >>> >>> >> Liya Fan
> >>> >>> >>
> >>> >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield <
> >>> emkornfi...@gmail.com
> >>> >>> >
> >>> >>> >> wrote:
> >>> >>> >>
> >>> >>> >> > Hi Liya Fan,
> >>> >>> >> > Based on the Float8Benchmark there does not seem to be any
> >>> >>> meaningful
> >>> >>> >> > performance difference on my machine.  At least for me, the
> >>> >>> benchmarks
> >>> >>> >> are
> >>> >>> >> > not stable enough to say one is faster than the other (I've
> >>> pasted
> >>> >>> >> results
> >>> >>> >> > below).  That being said my machine isn't necessarily the most
> >>> >>> reliable
> >>> >>> >> for
> >>> >>> >> > benchmarking.
> >>> >>> >> >
> >>> >>> >> > On an intuitive level, this makes sense to me,  for the most
> >>> part it
> >>> >>> >> seems
> >>> >>> >> > like the change just moves casting from "int" to "long"
> further
> >>> up
> >>> >>> the
> >>> >>> >> > stack  for  "PlatformDepdendent" operations.  If there are
> other
> >>> >>> >> benchmarks
> >>> >>> >> > that you think are worth running let me know.
> >>> >>> >> >
> >>> >>> >> > One downside performance wise I think for his change is it
> >>> >>> increases the
> >>> >>> >> > size of ArrowBuf objects, which I suppose could influence
> cache
> >>> >>> misses
> >>> >>> >> at
> >>> >>> >> > some level or increase the size of call-stacks, but this
> doesn't
> >>> >>> seem to
> >>> >>> >> > show up in the benchmark..
> >>> >>> >> >
> >>> >>> >> > Thanks,
> >>> >>> >> > Micah
> >>> >>> >> >
> >>> >>> >> > Sample benchmark numbers:
> >>> >>> >> >
> >>> >>> >> > [New Code]
> >>> >>> >> > Benchmark                            Mode  Cnt   Score   Error
> >>> >>> Units
> >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5  15.441 ± 0.469
> >>> >>> us/op
> >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5  14.057 ± 0.115
> >>> >>> us/op
> >>> >>> >> >
> >>> >>> >> >
> >>> >>> >> > [Old code]
> >>> >>> >> > Benchmark                            Mode  Cnt   Score   Error
> >>> >>> Units
> >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5  16.248 ± 1.409
> >>> >>> us/op
> >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5  14.150 ± 0.084
> >>> >>> us/op
> >>> >>> >> >
> >>> >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya <liya.fa...@gmail.com
> >
> >>> >>> wrote:
> >>> >>> >> >
> >>> >>> >> >> Hi Micah,
> >>> >>> >> >>
> >>> >>> >> >> Thanks a lot for doing this.
> >>> >>> >> >>
> >>> >>> >> >> I am a little concerned about if there is any negative
> >>> performance
> >>> >>> >> impact
> >>> >>> >> >> on the current 32-bit-length based applications.
> >>> >>> >> >> Can we do some performance comparison on our existing
> >>> benchmarks?
> >>> >>> >> >>
> >>> >>> >> >> Best,
> >>> >>> >> >> Liya Fan
> >>> >>> >> >>
> >>> >>> >> >>
> >>> >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield <
> >>> >>> emkornfi...@gmail.com>
> >>> >>> >> >> wrote:
> >>> >>> >> >>
> >>> >>> >> >>> There have been some previous discussions on the mailing
> about
> >>> >>> >> supporting
> >>> >>> >> >>> 64-bit lengths for  Java ValueVectors (this is what the IPC
> >>> >>> >> specification
> >>> >>> >> >>> and C++ support).  I created a PR [1] that changes all APIs
> >>> that I
> >>> >>> >> could
> >>> >>> >> >>> find that take an index to take an "long" instead of an
> "int"
> >>> (and
> >>> >>> >> >>> similarly change "size/rowcount" APIs).
> >>> >>> >> >>>
> >>> >>> >> >>> It is a big change, so I think it is worth discussing if it
> is
> >>> >>> >> something
> >>> >>> >> >>> we
> >>> >>> >> >>> still want to move forward with.  It would be nice to come
> to
> >>> a
> >>> >>> >> >>> conclusion
> >>> >>> >> >>> quickly, ideally in the next few days, to avoid a lot of
> merge
> >>> >>> >> conflicts.
> >>> >>> >> >>>
> >>> >>> >> >>> The reason I did this work now is the C++ implementation has
> >>> added
> >>> >>> >> >>> support
> >>> >>> >> >>> for LargeList, LargeBinary and LargeString arrays and based
> on
> >>> >>> prior
> >>> >>> >> >>> discussions we need to have similar support in Java before
> our
> >>> >>> next
> >>> >>> >> >>> release. Support 64-bit indexes means we can have full
> >>> >>> compatibility
> >>> >>> >> and
> >>> >>> >> >>> make the most use of the types in Java.
> >>> >>> >> >>>
> >>> >>> >> >>> Look forward to hearing feedback.
> >>> >>> >> >>>
> >>> >>> >> >>> Thanks,
> >>> >>> >> >>> Micah
> >>> >>> >> >>>
> >>> >>> >> >>> [1] https://github.com/apache/arrow/pull/5020
> >>> >>> >> >>>
> >>> >>> >> >>
> >>> >>> >>
> >>> >>> >
> >>> >>>
> >>> >>
> >>>
> >>
>

Re: [Discuss][Java] 64-bit lengths for ValueVectors

Reply via email to