Hi Jacques, I definitely understand these concerns and this change is risky because it is so large. Perhaps, creating a new hierarchy, might be the cleanest way of dealing with this. This could have other benefits like cleaning up some cruft around dictionary encode and "orphaned" method. Per past e-mail threads I agree it is beneficial to have 2 separate reference implementations that can communicate fully, and my intent here was to close that gap.
Trying to > determine the ramifications of these changes would be challenging and time > consuming against all the different ways we interact with the Arrow Java > library. Understood. I took a quick look at Dremio-OSS it seems like it has a simple java build system? If it is helpful, I can try to get a fork running that at least compiles against this PR. My plan would be to cast any place that was changed to return a long back to an int, so in essence the Dremio algorithms would reman 32-bit implementations. I don't have the infrastructure to test this change properly from a distributed systems perspective, so it would still take some time from Dremio to validate for regressions. I'm not saying I'm against this but want to make sure we've > explored all less disruptive options before considering changing something > this fundamental (especially when I generally hold the view that large cell > counts against massive contiguous memory is an anti pattern to scalable > analytical processing--purely subjective of course). I'm open to other ideas here, as well. I don't think it is out of the question to leave the Java implementation as 32-bit, but if we do, then I think we should consider a different strategy for reference implementations. Thanks, Micah On Sat, Aug 10, 2019 at 5:09 PM Jacques Nadeau <jacq...@apache.org> wrote: > Hey Micah, I didn't have a particular path in mind. Was thinking more along > the lines of extra methods as opposed to separate classes. > > Arrow hasn't historically been a place where we're writing algorithms in > Java so the fact that they aren't there doesn't mean they don't exist. We > have a large amount of code that depends on the current behavior that is > deployed in hundreds of customer clusters (you can peruse our dremio repo > to see how extensively we leverage Arrow if interested). Trying to > determine the ramifications of these changes would be challenging and time > consuming against all the different ways we interact with the Arrow Java > library. I'm not saying I'm against this but want to make sure we've > explored all less disruptive options before considering changing something > this fundamental (especially when I generally hold the view that large cell > counts against massive contiguous memory is an anti pattern to scalable > analytical processing--purely subjective of course). > > On Sat, Aug 10, 2019, 4:17 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > Hi Jacques, > > What avenue were you thinking for supporting both paths? I didn't want > > to pursue a different class hierarchy, because I felt like that would > > effectively fork the code base, but that is potentially an option that > > would allow us to have a complete reference implementation in Java that > can > > fully interact with C++, without major changes to this code. > > > > For supporting both APIs on the same classes/interfaces, I think they > > roughly fall into three categories, changes to input parameters, changes > to > > output parameters and algorithm changes. > > > > For inputs, changing from int to long is essentially a no-op from the > > compiler perspective. From the limited micro-benchmarking this also > > doesn't seem to have a performance impact. So we could keep two versions > > of the methods that only differ on inputs, but it is not clear what the > > value of that would be. > > > > For outputs, we can't support methods "long getLength()" and "int > > getLength()" in the same class, so we would be forced into something like > > "long getLength(boolean dummy)" which I think is a less desirable. > > > > For algorithm changes, there did not appear to be too many places where > we > > actually loop over all elements (it is quite possible I missed something > > here), the ones that I did find I was able to mitigate performance > > penalties as noted above. Some of the current implementation will get a > > lot slower for "large arrays", but we can likely fix those later or in > this > > PR with a nested while loop instead of 2 for loops. > > > > Thanks, > > Micah > > > > > > On Saturday, August 10, 2019, Jacques Nadeau <jacq...@apache.org> wrote: > > > >> This is a pretty massive change to the apis. I wonder how nasty it would > >> be to just support both paths. Have you evaluated how complex that > would be? > >> > >> On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield <emkornfi...@gmail.com> > >> wrote: > >> > >>> After more investigation, it looks like Float8Benchmarks at least on my > >>> machine are within the range of noise. > >>> > >>> For BitVectorHelper I pushed a new commit [1], seems to bring the > >>> BitVectorHelper benchmarks back inline (and even with some improvement > >>> for > >>> getNullCountBenchmark). > >>> > >>> Benchmark Mode Cnt Score > >>> Error > >>> Units > >>> BitVectorHelperBenchmarks.allBitsNullBenchmark avgt 5 3.821 ± > >>> 0.031 > >>> ns/op > >>> BitVectorHelperBenchmarks.getNullCountBenchmark avgt 5 14.884 ± > >>> 0.141 > >>> ns/op > >>> > >>> I applied the same pattern to other loops that I could find, and for > any > >>> "for (long" loop on the critical path, I broke it up into two loops. > the > >>> first loop does iteration by integer, the second finishes off for any > >>> long > >>> values. As a side note it seems like optimization for loops using long > >>> counters at least have a semi-recent open bug for the JVM [2] > >>> > >>> Thanks, > >>> Micah > >>> > >>> [1] > >>> > >>> > https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797 > >>> [2] https://bugs.openjdk.java.net/browse/JDK-8223051 > >>> > >>> On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield <emkornfi...@gmail.com> > >>> wrote: > >>> > >>> > Indeed, the BoundChecking and CheckNullForGet variables can make a > big > >>> > difference. I didn't initially run the benchmarks with these turned > on > >>> > (you can see the result from above with Float8Benchmarks). Here are > >>> new > >>> > numbers including with the flags enabled. It looks like using longs > >>> might > >>> > be a little bit slower, I'll see what I can do to mitigate this. > >>> > > >>> > Ravindra also volunteered to try to benchmark the changes with > Dremio's > >>> > code on today's sync call. > >>> > > >>> > New > >>> > > >>> > Benchmark Mode Cnt Score > >>> Error > >>> > Units > >>> > > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark avgt 5 4.176 ± > >>> 1.292 > >>> > ns/op > >>> > > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark avgt 5 26.102 ± > >>> 0.700 > >>> > ns/op > >>> > > >>> > Float8Benchmarks.copyFromBenchmark avgt 5 7.398 ± 0.084 us/op > >>> > > >>> > Float8Benchmarks.readWriteBenchmark avgt 5 2.711 ± 0.057 us/op > >>> > > >>> > > >>> > > >>> > old > >>> > > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark avgt 5 3.828 ± > >>> 0.030 > >>> > ns/op > >>> > > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark avgt 5 20.611 ± > >>> 0.188 > >>> > ns/op > >>> > > >>> > Float8Benchmarks.copyFromBenchmark avgt 5 6.597 ± 0.462 us/op > >>> > > >>> > Float8Benchmarks.readWriteBenchmark avgt 5 2.615 ± 0.027 us/op > >>> > > >>> > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya <liya.fa...@gmail.com> > wrote: > >>> > > >>> >> Hi Gonzalo, > >>> >> > >>> >> Thanks for sharing the performance results. > >>> >> I am wondering if you have turned off the flag > >>> >> BoundsChecking#BOUNDS_CHECKING_ENABLED. > >>> >> If not, the lower throughput should be expected. > >>> >> > >>> >> Best, > >>> >> Liya Fan > >>> >> > >>> >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield < > >>> emkornfi...@gmail.com> > >>> >> wrote: > >>> >> > >>> >>> Hi Gonzalo, > >>> >>> Thank you for the feedback. I wasn't aware of the JIT > >>> implications. At > >>> >>> least on the benchmark run they don't seem to have an impact. > >>> >>> > >>> >>> If there are other benchmarks that people have that can validate if > >>> this > >>> >>> change will be problematic I would appreciate trying to run them > >>> with the > >>> >>> PR. I will try to run the ones for zeroing/popcnt tonight to see > if > >>> >>> there > >>> >>> is a change in those. > >>> >>> > >>> >>> -Micah > >>> >>> > >>> >>> > >>> >>> > >>> >>> On Wednesday, August 7, 2019, Gonzalo Ortiz Jaureguizar < > >>> >>> golthir...@gmail.com> wrote: > >>> >>> > >>> >>> > I would recommend to take care with this kind of changes. > >>> >>> > > >>> >>> > I didn't try Arrow in more than one year, but by then the > >>> performance > >>> >>> was > >>> >>> > quite bad in comparison with plain byte buffer access > >>> >>> > (see http://git.net/apache-arrow-development/msg02353.html *) > and > >>> >>> > there are several optimizations that the JVM (specifically, C2) > >>> does > >>> >>> not > >>> >>> > apply when dealing with int instead of longs. One of the > >>> >>> > most commons is the loop unrolling and vectorization. > >>> >>> > > >>> >>> > * It doesn't seem the best way to reference an old email on the > >>> list, > >>> >>> but > >>> >>> > it is the only result shown by Google > >>> >>> > > >>> >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (< > liya.fa...@gmail.com > >>> >) > >>> >>> > escribió: > >>> >>> > > >>> >>> >> Hi Micah, > >>> >>> >> > >>> >>> >> Thanks for your effort. The performance result looks good. > >>> >>> >> > >>> >>> >> As you indicated, ArrowBuf will take additional 12 bytes (4 > bytes > >>> for > >>> >>> each > >>> >>> >> of length, write index, and read index). > >>> >>> >> Similar overheads also exist for vectors like > >>> BaseFixedWidthVector, > >>> >>> >> BaseVariableWidthVector, etc. > >>> >>> >> > >>> >>> >> IMO, such overheads are small enough to justify the change. > >>> >>> >> Let's check if there are other overheads. > >>> >>> >> > >>> >>> >> Best, > >>> >>> >> Liya Fan > >>> >>> >> > >>> >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield < > >>> emkornfi...@gmail.com > >>> >>> > > >>> >>> >> wrote: > >>> >>> >> > >>> >>> >> > Hi Liya Fan, > >>> >>> >> > Based on the Float8Benchmark there does not seem to be any > >>> >>> meaningful > >>> >>> >> > performance difference on my machine. At least for me, the > >>> >>> benchmarks > >>> >>> >> are > >>> >>> >> > not stable enough to say one is faster than the other (I've > >>> pasted > >>> >>> >> results > >>> >>> >> > below). That being said my machine isn't necessarily the most > >>> >>> reliable > >>> >>> >> for > >>> >>> >> > benchmarking. > >>> >>> >> > > >>> >>> >> > On an intuitive level, this makes sense to me, for the most > >>> part it > >>> >>> >> seems > >>> >>> >> > like the change just moves casting from "int" to "long" > further > >>> up > >>> >>> the > >>> >>> >> > stack for "PlatformDepdendent" operations. If there are > other > >>> >>> >> benchmarks > >>> >>> >> > that you think are worth running let me know. > >>> >>> >> > > >>> >>> >> > One downside performance wise I think for his change is it > >>> >>> increases the > >>> >>> >> > size of ArrowBuf objects, which I suppose could influence > cache > >>> >>> misses > >>> >>> >> at > >>> >>> >> > some level or increase the size of call-stacks, but this > doesn't > >>> >>> seem to > >>> >>> >> > show up in the benchmark.. > >>> >>> >> > > >>> >>> >> > Thanks, > >>> >>> >> > Micah > >>> >>> >> > > >>> >>> >> > Sample benchmark numbers: > >>> >>> >> > > >>> >>> >> > [New Code] > >>> >>> >> > Benchmark Mode Cnt Score Error > >>> >>> Units > >>> >>> >> > Float8Benchmarks.copyFromBenchmark avgt 5 15.441 ± 0.469 > >>> >>> us/op > >>> >>> >> > Float8Benchmarks.readWriteBenchmark avgt 5 14.057 ± 0.115 > >>> >>> us/op > >>> >>> >> > > >>> >>> >> > > >>> >>> >> > [Old code] > >>> >>> >> > Benchmark Mode Cnt Score Error > >>> >>> Units > >>> >>> >> > Float8Benchmarks.copyFromBenchmark avgt 5 16.248 ± 1.409 > >>> >>> us/op > >>> >>> >> > Float8Benchmarks.readWriteBenchmark avgt 5 14.150 ± 0.084 > >>> >>> us/op > >>> >>> >> > > >>> >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya <liya.fa...@gmail.com > > > >>> >>> wrote: > >>> >>> >> > > >>> >>> >> >> Hi Micah, > >>> >>> >> >> > >>> >>> >> >> Thanks a lot for doing this. > >>> >>> >> >> > >>> >>> >> >> I am a little concerned about if there is any negative > >>> performance > >>> >>> >> impact > >>> >>> >> >> on the current 32-bit-length based applications. > >>> >>> >> >> Can we do some performance comparison on our existing > >>> benchmarks? > >>> >>> >> >> > >>> >>> >> >> Best, > >>> >>> >> >> Liya Fan > >>> >>> >> >> > >>> >>> >> >> > >>> >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield < > >>> >>> emkornfi...@gmail.com> > >>> >>> >> >> wrote: > >>> >>> >> >> > >>> >>> >> >>> There have been some previous discussions on the mailing > about > >>> >>> >> supporting > >>> >>> >> >>> 64-bit lengths for Java ValueVectors (this is what the IPC > >>> >>> >> specification > >>> >>> >> >>> and C++ support). I created a PR [1] that changes all APIs > >>> that I > >>> >>> >> could > >>> >>> >> >>> find that take an index to take an "long" instead of an > "int" > >>> (and > >>> >>> >> >>> similarly change "size/rowcount" APIs). > >>> >>> >> >>> > >>> >>> >> >>> It is a big change, so I think it is worth discussing if it > is > >>> >>> >> something > >>> >>> >> >>> we > >>> >>> >> >>> still want to move forward with. It would be nice to come > to > >>> a > >>> >>> >> >>> conclusion > >>> >>> >> >>> quickly, ideally in the next few days, to avoid a lot of > merge > >>> >>> >> conflicts. > >>> >>> >> >>> > >>> >>> >> >>> The reason I did this work now is the C++ implementation has > >>> added > >>> >>> >> >>> support > >>> >>> >> >>> for LargeList, LargeBinary and LargeString arrays and based > on > >>> >>> prior > >>> >>> >> >>> discussions we need to have similar support in Java before > our > >>> >>> next > >>> >>> >> >>> release. Support 64-bit indexes means we can have full > >>> >>> compatibility > >>> >>> >> and > >>> >>> >> >>> make the most use of the types in Java. > >>> >>> >> >>> > >>> >>> >> >>> Look forward to hearing feedback. > >>> >>> >> >>> > >>> >>> >> >>> Thanks, > >>> >>> >> >>> Micah > >>> >>> >> >>> > >>> >>> >> >>> [1] https://github.com/apache/arrow/pull/5020 > >>> >>> >> >>> > >>> >>> >> >> > >>> >>> >> > >>> >>> > > >>> >>> > >>> >> > >>> > >> >