> > > With regards to the reference implementation point. It is a good point. > I'm on vacation this week. Unless you're pushing hard on this, can we pick > this up and discuss more next week?
Hi Jacques, I hope you had a good rest. Any more thoughts on the reference implementation aspect of this? > To copy the sentiments from the 0.15.0 release thread, I think it > would be best to decouple this discussion from the release timeline > given how many people we have relying on regular releases coming out. > We can keep continue making major 0.x releases until we're ready to > release 1.0.0. I'm OK with it as long as other stakeholders are. Timed releases are the way to go. As stated on the release thread [1] we need a better mechanism to avoid this type of issue arising again. The release thread also had some more discussion on compatibility. Thanks, Micah [1] https://lists.apache.org/thread.html/d70feeceaf2570906ade117030b29887af7c77ca5c4a976e6d555920@%3Cdev.arrow.apache.org%3E On Wed, Aug 14, 2019 at 3:23 PM Wes McKinney <wesmck...@gmail.com> wrote: > On Sun, Aug 11, 2019 at 9:40 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > > Hi Wes and Jacques, > > See responses below. > > > > With regards to the reference implementation point. It is a good point. > I'm > > > on vacation this week. Unless you're pushing hard on this, can we pick > this > > > up and discuss more next week? > > > > > > Sure thing, enjoy your vacation. I think the only practical implications > > are it delays choices around implementing LargeList, LargeBinary, > > LargeString in Java, which in turn might push out the 0.15.0 release. > > > > To copy the sentiments from the 0.15.0 release thread, I think it > would be best to decouple this discussion from the release timeline > given how many people we have relying on regular releases coming out. > We can keep continue making major 0.x releases until we're ready to > release 1.0.0. > > > My stance on this is that I don't know how important it is for Java to > > > support vectors over INT32_MAX elements. The use cases enabled by > > > having very large arrays seem to be concentrated in the native code > > > world (e.g. C/C++/Rust) -- that could just be implementation-centrism > > > on my part, though. > > > > > > A data point against this view is Spark has done work to eliminate 2GB > > memory limits on its block sizes [1]. I don't claim to understand the > > implications of this. Bryan might you have any thoughts here? I'm OK > with > > INT32_MAX, as well, I think we should think about what this means for > > adding Large types to Java and implications for reference > implementations. > > > > Thanks, > > Micah > > > > [1] https://issues.apache.org/jira/browse/SPARK-6235 > > > > On Sun, Aug 11, 2019 at 6:31 PM Jacques Nadeau <jacq...@apache.org> > wrote: > > > > > Hey Micah, > > > > > > Appreciate the offer on the compiling. The reality is I'm more > concerned > > > about the unknowns than the compiling issue itself. Any time you've > been > > > tuning for a while, changing something like this could be totally fine > or > > > cause a couple of major issues. For example, we've done a very large > amount > > > of work reducing heap memory footprint of the vectors. Are target is to > > > actually get it down to 24 bytes per ArrowBuf and 24 bytes heap per > vector > > > (not including arrow bufs). > > > > > > With regards to the reference implementation point. It is a good point. > > > I'm on vacation this week. Unless you're pushing hard on this, can we > pick > > > this up and discuss more next week? > > > > > > thanks, > > > Jacques > > > > > > On Sat, Aug 10, 2019 at 7:39 PM Micah Kornfield <emkornfi...@gmail.com > > > > > wrote: > > > > > >> Hi Jacques, > > >> I definitely understand these concerns and this change is risky > because it > > >> is so large. Perhaps, creating a new hierarchy, might be the > cleanest way > > >> of dealing with this. This could have other benefits like cleaning up > > >> some > > >> cruft around dictionary encode and "orphaned" method. Per past > e-mail > > >> threads I agree it is beneficial to have 2 separate reference > > >> implementations that can communicate fully, and my intent here was to > > >> close > > >> that gap. > > >> > > >> Trying to > > >> > determine the ramifications of these changes would be challenging > and > > >> time > > >> > consuming against all the different ways we interact with the Arrow > Java > > >> > library. > > >> > > >> > > >> Understood. I took a quick look at Dremio-OSS it seems like it has a > > >> simple java build system? If it is helpful, I can try to get a fork > > >> running that at least compiles against this PR. My plan would be to > cast > > >> any place that was changed to return a long back to an int, so in > essence > > >> the Dremio algorithms would reman 32-bit implementations. > > >> > > >> I don't have the infrastructure to test this change properly from a > > >> distributed systems perspective, so it would still take some time from > > >> Dremio to validate for regressions. > > >> > > >> I'm not saying I'm against this but want to make sure we've > > >> > explored all less disruptive options before considering changing > > >> something > > >> > this fundamental (especially when I generally hold the view that > large > > >> cell > > >> > counts against massive contiguous memory is an anti pattern to > scalable > > >> > analytical processing--purely subjective of course). > > >> > > >> > > >> I'm open to other ideas here, as well. I don't think it is out of the > > >> question to leave the Java implementation as 32-bit, but if we do, > then I > > >> think we should consider a different strategy for reference > > >> implementations. > > >> > > >> Thanks, > > >> Micah > > >> > > >> On Sat, Aug 10, 2019 at 5:09 PM Jacques Nadeau <jacq...@apache.org> > > >> wrote: > > >> > > >> > Hey Micah, I didn't have a particular path in mind. Was thinking > more > > >> along > > >> > the lines of extra methods as opposed to separate classes. > > >> > > > >> > Arrow hasn't historically been a place where we're writing > algorithms in > > >> > Java so the fact that they aren't there doesn't mean they don't > exist. > > >> We > > >> > have a large amount of code that depends on the current behavior > that is > > >> > deployed in hundreds of customer clusters (you can peruse our dremio > > >> repo > > >> > to see how extensively we leverage Arrow if interested). Trying to > > >> > determine the ramifications of these changes would be challenging > and > > >> time > > >> > consuming against all the different ways we interact with the Arrow > Java > > >> > library. I'm not saying I'm against this but want to make sure we've > > >> > explored all less disruptive options before considering changing > > >> something > > >> > this fundamental (especially when I generally hold the view that > large > > >> cell > > >> > counts against massive contiguous memory is an anti pattern to > scalable > > >> > analytical processing--purely subjective of course). > > >> > > > >> > On Sat, Aug 10, 2019, 4:17 PM Micah Kornfield < > emkornfi...@gmail.com> > > >> > wrote: > > >> > > > >> > > Hi Jacques, > > >> > > What avenue were you thinking for supporting both paths? I > didn't > > >> want > > >> > > to pursue a different class hierarchy, because I felt like that > would > > >> > > effectively fork the code base, but that is potentially an option > that > > >> > > would allow us to have a complete reference implementation in Java > > >> that > > >> > can > > >> > > fully interact with C++, without major changes to this code. > > >> > > > > >> > > For supporting both APIs on the same classes/interfaces, I think > they > > >> > > roughly fall into three categories, changes to input parameters, > > >> changes > > >> > to > > >> > > output parameters and algorithm changes. > > >> > > > > >> > > For inputs, changing from int to long is essentially a no-op from > the > > >> > > compiler perspective. From the limited micro-benchmarking this > also > > >> > > doesn't seem to have a performance impact. So we could keep two > > >> versions > > >> > > of the methods that only differ on inputs, but it is not clear > what > > >> the > > >> > > value of that would be. > > >> > > > > >> > > For outputs, we can't support methods "long getLength()" and "int > > >> > > getLength()" in the same class, so we would be forced into > something > > >> like > > >> > > "long getLength(boolean dummy)" which I think is a less desirable. > > >> > > > > >> > > For algorithm changes, there did not appear to be too many places > > >> where > > >> > we > > >> > > actually loop over all elements (it is quite possible I missed > > >> something > > >> > > here), the ones that I did find I was able to mitigate performance > > >> > > penalties as noted above. Some of the current implementation will > > >> get a > > >> > > lot slower for "large arrays", but we can likely fix those later > or in > > >> > this > > >> > > PR with a nested while loop instead of 2 for loops. > > >> > > > > >> > > Thanks, > > >> > > Micah > > >> > > > > >> > > > > >> > > On Saturday, August 10, 2019, Jacques Nadeau <jacq...@apache.org> > > >> wrote: > > >> > > > > >> > >> This is a pretty massive change to the apis. I wonder how nasty > it > > >> would > > >> > >> be to just support both paths. Have you evaluated how complex > that > > >> > would be? > > >> > >> > > >> > >> On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield < > > >> emkornfi...@gmail.com> > > >> > >> wrote: > > >> > >> > > >> > >>> After more investigation, it looks like Float8Benchmarks at > least > > >> on my > > >> > >>> machine are within the range of noise. > > >> > >>> > > >> > >>> For BitVectorHelper I pushed a new commit [1], seems to bring > the > > >> > >>> BitVectorHelper benchmarks back inline (and even with some > > >> improvement > > >> > >>> for > > >> > >>> getNullCountBenchmark). > > >> > >>> > > >> > >>> Benchmark Mode Cnt > Score > > >> > >>> Error > > >> > >>> Units > > >> > >>> BitVectorHelperBenchmarks.allBitsNullBenchmark avgt 5 > 3.821 ± > > >> > >>> 0.031 > > >> > >>> ns/op > > >> > >>> BitVectorHelperBenchmarks.getNullCountBenchmark avgt 5 > 14.884 ± > > >> > >>> 0.141 > > >> > >>> ns/op > > >> > >>> > > >> > >>> I applied the same pattern to other loops that I could find, > and for > > >> > any > > >> > >>> "for (long" loop on the critical path, I broke it up into two > loops. > > >> > the > > >> > >>> first loop does iteration by integer, the second finishes off > for > > >> any > > >> > >>> long > > >> > >>> values. As a side note it seems like optimization for loops > using > > >> long > > >> > >>> counters at least have a semi-recent open bug for the JVM [2] > > >> > >>> > > >> > >>> Thanks, > > >> > >>> Micah > > >> > >>> > > >> > >>> [1] > > >> > >>> > > >> > >>> > > >> > > > >> > https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797 > > >> > >>> [2] https://bugs.openjdk.java.net/browse/JDK-8223051 > > >> > >>> > > >> > >>> On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield < > > >> emkornfi...@gmail.com> > > >> > >>> wrote: > > >> > >>> > > >> > >>> > Indeed, the BoundChecking and CheckNullForGet variables can > make a > > >> > big > > >> > >>> > difference. I didn't initially run the benchmarks with these > > >> turned > > >> > on > > >> > >>> > (you can see the result from above with Float8Benchmarks). > Here > > >> are > > >> > >>> new > > >> > >>> > numbers including with the flags enabled. It looks like using > > >> longs > > >> > >>> might > > >> > >>> > be a little bit slower, I'll see what I can do to mitigate > this. > > >> > >>> > > > >> > >>> > Ravindra also volunteered to try to benchmark the changes with > > >> > Dremio's > > >> > >>> > code on today's sync call. > > >> > >>> > > > >> > >>> > New > > >> > >>> > > > >> > >>> > Benchmark Mode Cnt > Score > > >> > >>> Error > > >> > >>> > Units > > >> > >>> > > > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark avgt 5 > > >> 4.176 ± > > >> > >>> 1.292 > > >> > >>> > ns/op > > >> > >>> > > > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark avgt 5 > > >> 26.102 ± > > >> > >>> 0.700 > > >> > >>> > ns/op > > >> > >>> > > > >> > >>> > Float8Benchmarks.copyFromBenchmark avgt 5 7.398 ± 0.084 > > >> us/op > > >> > >>> > > > >> > >>> > Float8Benchmarks.readWriteBenchmark avgt 5 2.711 ± 0.057 > > >> us/op > > >> > >>> > > > >> > >>> > > > >> > >>> > > > >> > >>> > old > > >> > >>> > > > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark avgt 5 > > >> 3.828 ± > > >> > >>> 0.030 > > >> > >>> > ns/op > > >> > >>> > > > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark avgt 5 > > >> 20.611 ± > > >> > >>> 0.188 > > >> > >>> > ns/op > > >> > >>> > > > >> > >>> > Float8Benchmarks.copyFromBenchmark avgt 5 6.597 ± 0.462 > > >> us/op > > >> > >>> > > > >> > >>> > Float8Benchmarks.readWriteBenchmark avgt 5 2.615 ± 0.027 > > >> us/op > > >> > >>> > > > >> > >>> > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya <liya.fa...@gmail.com > > > > >> > wrote: > > >> > >>> > > > >> > >>> >> Hi Gonzalo, > > >> > >>> >> > > >> > >>> >> Thanks for sharing the performance results. > > >> > >>> >> I am wondering if you have turned off the flag > > >> > >>> >> BoundsChecking#BOUNDS_CHECKING_ENABLED. > > >> > >>> >> If not, the lower throughput should be expected. > > >> > >>> >> > > >> > >>> >> Best, > > >> > >>> >> Liya Fan > > >> > >>> >> > > >> > >>> >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield < > > >> > >>> emkornfi...@gmail.com> > > >> > >>> >> wrote: > > >> > >>> >> > > >> > >>> >>> Hi Gonzalo, > > >> > >>> >>> Thank you for the feedback. I wasn't aware of the JIT > > >> > >>> implications. At > > >> > >>> >>> least on the benchmark run they don't seem to have an > impact. > > >> > >>> >>> > > >> > >>> >>> If there are other benchmarks that people have that can > > >> validate if > > >> > >>> this > > >> > >>> >>> change will be problematic I would appreciate trying to run > them > > >> > >>> with the > > >> > >>> >>> PR. I will try to run the ones for zeroing/popcnt tonight > to > > >> see > > >> > if > > >> > >>> >>> there > > >> > >>> >>> is a change in those. > > >> > >>> >>> > > >> > >>> >>> -Micah > > >> > >>> >>> > > >> > >>> >>> > > >> > >>> >>> > > >> > >>> >>> On Wednesday, August 7, 2019, Gonzalo Ortiz Jaureguizar < > > >> > >>> >>> golthir...@gmail.com> wrote: > > >> > >>> >>> > > >> > >>> >>> > I would recommend to take care with this kind of changes. > > >> > >>> >>> > > > >> > >>> >>> > I didn't try Arrow in more than one year, but by then the > > >> > >>> performance > > >> > >>> >>> was > > >> > >>> >>> > quite bad in comparison with plain byte buffer access > > >> > >>> >>> > (see > http://git.net/apache-arrow-development/msg02353.html *) > > >> > and > > >> > >>> >>> > there are several optimizations that the JVM > (specifically, > > >> C2) > > >> > >>> does > > >> > >>> >>> not > > >> > >>> >>> > apply when dealing with int instead of longs. One of the > > >> > >>> >>> > most commons is the loop unrolling and vectorization. > > >> > >>> >>> > > > >> > >>> >>> > * It doesn't seem the best way to reference an old email > on > > >> the > > >> > >>> list, > > >> > >>> >>> but > > >> > >>> >>> > it is the only result shown by Google > > >> > >>> >>> > > > >> > >>> >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (< > > >> > liya.fa...@gmail.com > > >> > >>> >) > > >> > >>> >>> > escribió: > > >> > >>> >>> > > > >> > >>> >>> >> Hi Micah, > > >> > >>> >>> >> > > >> > >>> >>> >> Thanks for your effort. The performance result looks > good. > > >> > >>> >>> >> > > >> > >>> >>> >> As you indicated, ArrowBuf will take additional 12 bytes > (4 > > >> > bytes > > >> > >>> for > > >> > >>> >>> each > > >> > >>> >>> >> of length, write index, and read index). > > >> > >>> >>> >> Similar overheads also exist for vectors like > > >> > >>> BaseFixedWidthVector, > > >> > >>> >>> >> BaseVariableWidthVector, etc. > > >> > >>> >>> >> > > >> > >>> >>> >> IMO, such overheads are small enough to justify the > change. > > >> > >>> >>> >> Let's check if there are other overheads. > > >> > >>> >>> >> > > >> > >>> >>> >> Best, > > >> > >>> >>> >> Liya Fan > > >> > >>> >>> >> > > >> > >>> >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield < > > >> > >>> emkornfi...@gmail.com > > >> > >>> >>> > > > >> > >>> >>> >> wrote: > > >> > >>> >>> >> > > >> > >>> >>> >> > Hi Liya Fan, > > >> > >>> >>> >> > Based on the Float8Benchmark there does not seem to be > any > > >> > >>> >>> meaningful > > >> > >>> >>> >> > performance difference on my machine. At least for > me, the > > >> > >>> >>> benchmarks > > >> > >>> >>> >> are > > >> > >>> >>> >> > not stable enough to say one is faster than the other > (I've > > >> > >>> pasted > > >> > >>> >>> >> results > > >> > >>> >>> >> > below). That being said my machine isn't necessarily > the > > >> most > > >> > >>> >>> reliable > > >> > >>> >>> >> for > > >> > >>> >>> >> > benchmarking. > > >> > >>> >>> >> > > > >> > >>> >>> >> > On an intuitive level, this makes sense to me, for the > > >> most > > >> > >>> part it > > >> > >>> >>> >> seems > > >> > >>> >>> >> > like the change just moves casting from "int" to "long" > > >> > further > > >> > >>> up > > >> > >>> >>> the > > >> > >>> >>> >> > stack for "PlatformDepdendent" operations. If there > are > > >> > other > > >> > >>> >>> >> benchmarks > > >> > >>> >>> >> > that you think are worth running let me know. > > >> > >>> >>> >> > > > >> > >>> >>> >> > One downside performance wise I think for his change > is it > > >> > >>> >>> increases the > > >> > >>> >>> >> > size of ArrowBuf objects, which I suppose could > influence > > >> > cache > > >> > >>> >>> misses > > >> > >>> >>> >> at > > >> > >>> >>> >> > some level or increase the size of call-stacks, but > this > > >> > doesn't > > >> > >>> >>> seem to > > >> > >>> >>> >> > show up in the benchmark.. > > >> > >>> >>> >> > > > >> > >>> >>> >> > Thanks, > > >> > >>> >>> >> > Micah > > >> > >>> >>> >> > > > >> > >>> >>> >> > Sample benchmark numbers: > > >> > >>> >>> >> > > > >> > >>> >>> >> > [New Code] > > >> > >>> >>> >> > Benchmark Mode Cnt Score > > >> Error > > >> > >>> >>> Units > > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark avgt 5 15.441 > ± > > >> 0.469 > > >> > >>> >>> us/op > > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark avgt 5 14.057 > ± > > >> 0.115 > > >> > >>> >>> us/op > > >> > >>> >>> >> > > > >> > >>> >>> >> > > > >> > >>> >>> >> > [Old code] > > >> > >>> >>> >> > Benchmark Mode Cnt Score > > >> Error > > >> > >>> >>> Units > > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark avgt 5 16.248 > ± > > >> 1.409 > > >> > >>> >>> us/op > > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark avgt 5 14.150 > ± > > >> 0.084 > > >> > >>> >>> us/op > > >> > >>> >>> >> > > > >> > >>> >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya < > > >> liya.fa...@gmail.com > > >> > > > > >> > >>> >>> wrote: > > >> > >>> >>> >> > > > >> > >>> >>> >> >> Hi Micah, > > >> > >>> >>> >> >> > > >> > >>> >>> >> >> Thanks a lot for doing this. > > >> > >>> >>> >> >> > > >> > >>> >>> >> >> I am a little concerned about if there is any negative > > >> > >>> performance > > >> > >>> >>> >> impact > > >> > >>> >>> >> >> on the current 32-bit-length based applications. > > >> > >>> >>> >> >> Can we do some performance comparison on our existing > > >> > >>> benchmarks? > > >> > >>> >>> >> >> > > >> > >>> >>> >> >> Best, > > >> > >>> >>> >> >> Liya Fan > > >> > >>> >>> >> >> > > >> > >>> >>> >> >> > > >> > >>> >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield < > > >> > >>> >>> emkornfi...@gmail.com> > > >> > >>> >>> >> >> wrote: > > >> > >>> >>> >> >> > > >> > >>> >>> >> >>> There have been some previous discussions on the > mailing > > >> > about > > >> > >>> >>> >> supporting > > >> > >>> >>> >> >>> 64-bit lengths for Java ValueVectors (this is what > the > > >> IPC > > >> > >>> >>> >> specification > > >> > >>> >>> >> >>> and C++ support). I created a PR [1] that changes > all > > >> APIs > > >> > >>> that I > > >> > >>> >>> >> could > > >> > >>> >>> >> >>> find that take an index to take an "long" instead of > an > > >> > "int" > > >> > >>> (and > > >> > >>> >>> >> >>> similarly change "size/rowcount" APIs). > > >> > >>> >>> >> >>> > > >> > >>> >>> >> >>> It is a big change, so I think it is worth > discussing if > > >> it > > >> > is > > >> > >>> >>> >> something > > >> > >>> >>> >> >>> we > > >> > >>> >>> >> >>> still want to move forward with. It would be nice to > > >> come > > >> > to > > >> > >>> a > > >> > >>> >>> >> >>> conclusion > > >> > >>> >>> >> >>> quickly, ideally in the next few days, to avoid a > lot of > > >> > merge > > >> > >>> >>> >> conflicts. > > >> > >>> >>> >> >>> > > >> > >>> >>> >> >>> The reason I did this work now is the C++ > implementation > > >> has > > >> > >>> added > > >> > >>> >>> >> >>> support > > >> > >>> >>> >> >>> for LargeList, LargeBinary and LargeString arrays and > > >> based > > >> > on > > >> > >>> >>> prior > > >> > >>> >>> >> >>> discussions we need to have similar support in Java > > >> before > > >> > our > > >> > >>> >>> next > > >> > >>> >>> >> >>> release. Support 64-bit indexes means we can have > full > > >> > >>> >>> compatibility > > >> > >>> >>> >> and > > >> > >>> >>> >> >>> make the most use of the types in Java. > > >> > >>> >>> >> >>> > > >> > >>> >>> >> >>> Look forward to hearing feedback. > > >> > >>> >>> >> >>> > > >> > >>> >>> >> >>> Thanks, > > >> > >>> >>> >> >>> Micah > > >> > >>> >>> >> >>> > > >> > >>> >>> >> >>> [1] https://github.com/apache/arrow/pull/5020 > > >> > >>> >>> >> >>> > > >> > >>> >>> >> >> > > >> > >>> >>> >> > > >> > >>> >>> > > > >> > >>> >>> > > >> > >>> >> > > >> > >>> > > >> > >> > > >> > > > >> > > > >