hi Micah -- it makes sense to limit the scope for the time being to permitting LargeString/Binary work to proceed. Jacques, have you had a chance to look at this?
On Fri, Nov 15, 2019 at 3:07 AM Micah Kornfield <emkornfi...@gmail.com> wrote: > > Apologies for the long delay, I chose to do the minimal work of limiting this > change [1] to allowing ArrowBuf to 64-bit lengths. This would unblock work > on LargeString and LargeBinary. If this change looks OK, I think there is > some follow-up work to add more thorough unit/integration tests. > > As an aside, it does seem like the 2GB limit is affecting some users in Spark > [2][3], so hopefully LargeString would help with this. > > Allowing more than MAX_INT elements is Vectors/array still a blocker for > making LargeList useful. > > Thanks, > Micah > > [1] https://github.com/apache/arrow/pull/5020 > [2] > https://stackoverflow.com/questions/58739888/spark-is-it-possible-to-increase-pyarrow-buffer#comment103812119_58739888 > [3] https://issues.apache.org/jira/browse/ARROW-4890 > > On Fri, Aug 23, 2019 at 8:33 AM Jacques Nadeau <jacq...@apache.org> wrote: >> >> >> >> On Fri, Aug 23, 2019, 8:55 PM Micah Kornfield <emkornfi...@gmail.com> wrote: >>>> >>>> The vector indexes being limited to 32 bits doesn't limit the addressing >>>> to 32 bit chunks of memory. For example, you're prime example before was >>>> image data. Having 2 billion images of 1mb images would still be supported >>>> without changing the index addressing. >>> >>> This might be pre-coffee math, but I think we are limited to approximately >>> 2000 images because an ArrowBuf only holds up-to 2 billion bytes [1]. >>> While we have plenty of room for the offsets, we quickly run out of >>> contiguous data storage space. For LargeString and LargeBinary this could >>> be fixed by changing ArrowBuf. >>> >>> LargeArray faces the same problem only it applies to its child vectors. >>> Supporting LargeArray properly is really what drove the large-scale >>> interface change. >> >> >> My expressed concern about these changes was specifically about the use of >> long for get/set in the vector interfaces. I'm not saying that we constrain >> memory/ArrowBufs to 32bits. >> >>> >>>> Rebase would help if possible. >>> >>> I'll try to get to this in the next few days. >>> >>> [1] >>> https://github.com/apache/arrow/blob/95175fe7cb8439eebe6d2f6e0495f551d6864380/java/memory/src/main/java/io/netty/buffer/ArrowBuf.java#L164 >>> >>> On Fri, Aug 23, 2019 at 4:55 AM Jacques Nadeau <jacq...@apache.org> wrote: >>>> >>>> >>>> >>>> On Fri, Aug 23, 2019, 11:49 AM Micah Kornfield <emkornfi...@gmail.com> >>>> wrote: >>>>>> >>>>>> I don't think we should couple this discussion with the implementation >>>>>> of large list, etc since I think those two concepts are independent. >>>>> >>>>> I'm still trying to balance in my mind which is a worse experience for >>>>> consumers of the libraries for these types. Claiming that Java supports >>>>> these types but throwing an exception when the Vectors exceed 32-bits or >>>>> just say they aren't supported until we have 64-bit support in Java. >>>> >>>> >>>> The vector indexes being limited to 32 bits doesn't limit the addressing >>>> to 32 bit chunks of memory. For example, you're prime example before was >>>> image data. Having 2 billion images of 1mb images would still be supported >>>> without changing the index addressing. >>>> >>>> >>>>> >>>>>> >>>>>> I've asked some others on my team their opinions on the risk here. I >>>>>> think we should probably review some our more complex vector >>>>>> interactions and see how the jvm's assembly changes with this kind of >>>>>> change. Using microbenchmarking is good but I think we also need to see >>>>>> whether we're constantly inserting additional instructions or if in most >>>>>> cases, this actually doesn't impact instruction count. >>>>> >>>>> >>>>> Is this something that your team will take on? >>>> >>>> >>>> >>>> Yeah, we need to look at this I think. >>>> >>>>> Do you need a rebased version of the PR or is the existing one sufficient? >>>> >>>> >>>> Rebase would help if possible. >>>> >>>> >>>>> >>>>> Thanks, >>>>> Micah >>>>> >>>>> On Thu, Aug 22, 2019 at 8:56 PM Jacques Nadeau <jacq...@apache.org> wrote: >>>>>> >>>>>> I don't think we should couple this discussion with the implementation >>>>>> of large list, etc since I think those two concepts are independent. >>>>>> >>>>>> I've asked some others on my team their opinions on the risk here. I >>>>>> think we should probably review some our more complex vector >>>>>> interactions and see how the jvm's assembly changes with this kind of >>>>>> change. Using microbenchmarking is good but I think we also need to see >>>>>> whether we're constantly inserting additional instructions or if in most >>>>>> cases, this actually doesn't impact instruction count. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Aug 21, 2019 at 12:18 PM Micah Kornfield <emkornfi...@gmail.com> >>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> With regards to the reference implementation point. It is a good >>>>>>>> point. I'm on vacation this week. Unless you're pushing hard on this, >>>>>>>> can we pick this up and discuss more next week? >>>>>>> >>>>>>> >>>>>>> Hi Jacques, I hope you had a good rest. Any more thoughts on the >>>>>>> reference implementation aspect of this? >>>>>>> >>>>>>>> >>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it >>>>>>>> would be best to decouple this discussion from the release timeline >>>>>>>> given how many people we have relying on regular releases coming out. >>>>>>>> We can keep continue making major 0.x releases until we're ready to >>>>>>>> release 1.0.0. >>>>>>> >>>>>>> >>>>>>> I'm OK with it as long as other stakeholders are. Timed releases are >>>>>>> the way to go. As stated on the release thread [1] we need a better >>>>>>> mechanism to avoid this type of issue arising again. The release >>>>>>> thread also had some more discussion on compatibility. >>>>>>> >>>>>>> Thanks, >>>>>>> Micah >>>>>>> >>>>>>> [1] >>>>>>> https://lists.apache.org/thread.html/d70feeceaf2570906ade117030b29887af7c77ca5c4a976e6d555920@%3Cdev.arrow.apache.org%3E >>>>>>> >>>>>>> >>>>>>> On Wed, Aug 14, 2019 at 3:23 PM Wes McKinney <wesmck...@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> On Sun, Aug 11, 2019 at 9:40 PM Micah Kornfield >>>>>>>> <emkornfi...@gmail.com> wrote: >>>>>>>> > >>>>>>>> > Hi Wes and Jacques, >>>>>>>> > See responses below. >>>>>>>> > >>>>>>>> > With regards to the reference implementation point. It is a good >>>>>>>> > point. I'm >>>>>>>> > > on vacation this week. Unless you're pushing hard on this, can we >>>>>>>> > > pick this >>>>>>>> > > up and discuss more next week? >>>>>>>> > >>>>>>>> > >>>>>>>> > Sure thing, enjoy your vacation. I think the only practical >>>>>>>> > implications >>>>>>>> > are it delays choices around implementing LargeList, LargeBinary, >>>>>>>> > LargeString in Java, which in turn might push out the 0.15.0 release. >>>>>>>> > >>>>>>>> >>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it >>>>>>>> would be best to decouple this discussion from the release timeline >>>>>>>> given how many people we have relying on regular releases coming out. >>>>>>>> We can keep continue making major 0.x releases until we're ready to >>>>>>>> release 1.0.0. >>>>>>>> >>>>>>>> > My stance on this is that I don't know how important it is for Java >>>>>>>> > to >>>>>>>> > > support vectors over INT32_MAX elements. The use cases enabled by >>>>>>>> > > having very large arrays seem to be concentrated in the native code >>>>>>>> > > world (e.g. C/C++/Rust) -- that could just be >>>>>>>> > > implementation-centrism >>>>>>>> > > on my part, though. >>>>>>>> > >>>>>>>> > >>>>>>>> > A data point against this view is Spark has done work to eliminate >>>>>>>> > 2GB >>>>>>>> > memory limits on its block sizes [1]. I don't claim to understand >>>>>>>> > the >>>>>>>> > implications of this. Bryan might you have any thoughts here? I'm >>>>>>>> > OK with >>>>>>>> > INT32_MAX, as well, I think we should think about what this means for >>>>>>>> > adding Large types to Java and implications for reference >>>>>>>> > implementations. >>>>>>>> > >>>>>>>> > Thanks, >>>>>>>> > Micah >>>>>>>> > >>>>>>>> > [1] https://issues.apache.org/jira/browse/SPARK-6235 >>>>>>>> > >>>>>>>> > On Sun, Aug 11, 2019 at 6:31 PM Jacques Nadeau <jacq...@apache.org> >>>>>>>> > wrote: >>>>>>>> > >>>>>>>> > > Hey Micah, >>>>>>>> > > >>>>>>>> > > Appreciate the offer on the compiling. The reality is I'm more >>>>>>>> > > concerned >>>>>>>> > > about the unknowns than the compiling issue itself. Any time >>>>>>>> > > you've been >>>>>>>> > > tuning for a while, changing something like this could be totally >>>>>>>> > > fine or >>>>>>>> > > cause a couple of major issues. For example, we've done a very >>>>>>>> > > large amount >>>>>>>> > > of work reducing heap memory footprint of the vectors. Are target >>>>>>>> > > is to >>>>>>>> > > actually get it down to 24 bytes per ArrowBuf and 24 bytes heap >>>>>>>> > > per vector >>>>>>>> > > (not including arrow bufs). >>>>>>>> > > >>>>>>>> > > With regards to the reference implementation point. It is a good >>>>>>>> > > point. >>>>>>>> > > I'm on vacation this week. Unless you're pushing hard on this, can >>>>>>>> > > we pick >>>>>>>> > > this up and discuss more next week? >>>>>>>> > > >>>>>>>> > > thanks, >>>>>>>> > > Jacques >>>>>>>> > > >>>>>>>> > > On Sat, Aug 10, 2019 at 7:39 PM Micah Kornfield >>>>>>>> > > <emkornfi...@gmail.com> >>>>>>>> > > wrote: >>>>>>>> > > >>>>>>>> > >> Hi Jacques, >>>>>>>> > >> I definitely understand these concerns and this change is risky >>>>>>>> > >> because it >>>>>>>> > >> is so large. Perhaps, creating a new hierarchy, might be the >>>>>>>> > >> cleanest way >>>>>>>> > >> of dealing with this. This could have other benefits like >>>>>>>> > >> cleaning up >>>>>>>> > >> some >>>>>>>> > >> cruft around dictionary encode and "orphaned" method. Per past >>>>>>>> > >> e-mail >>>>>>>> > >> threads I agree it is beneficial to have 2 separate reference >>>>>>>> > >> implementations that can communicate fully, and my intent here >>>>>>>> > >> was to >>>>>>>> > >> close >>>>>>>> > >> that gap. >>>>>>>> > >> >>>>>>>> > >> Trying to >>>>>>>> > >> > determine the ramifications of these changes would be >>>>>>>> > >> > challenging and >>>>>>>> > >> time >>>>>>>> > >> > consuming against all the different ways we interact with the >>>>>>>> > >> > Arrow Java >>>>>>>> > >> > library. >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> > >> Understood. I took a quick look at Dremio-OSS it seems like it >>>>>>>> > >> has a >>>>>>>> > >> simple java build system? If it is helpful, I can try to get a >>>>>>>> > >> fork >>>>>>>> > >> running that at least compiles against this PR. My plan would be >>>>>>>> > >> to cast >>>>>>>> > >> any place that was changed to return a long back to an int, so in >>>>>>>> > >> essence >>>>>>>> > >> the Dremio algorithms would reman 32-bit implementations. >>>>>>>> > >> >>>>>>>> > >> I don't have the infrastructure to test this change properly >>>>>>>> > >> from a >>>>>>>> > >> distributed systems perspective, so it would still take some time >>>>>>>> > >> from >>>>>>>> > >> Dremio to validate for regressions. >>>>>>>> > >> >>>>>>>> > >> I'm not saying I'm against this but want to make sure we've >>>>>>>> > >> > explored all less disruptive options before considering changing >>>>>>>> > >> something >>>>>>>> > >> > this fundamental (especially when I generally hold the view >>>>>>>> > >> > that large >>>>>>>> > >> cell >>>>>>>> > >> > counts against massive contiguous memory is an anti pattern to >>>>>>>> > >> > scalable >>>>>>>> > >> > analytical processing--purely subjective of course). >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> > >> I'm open to other ideas here, as well. I don't think it is out of >>>>>>>> > >> the >>>>>>>> > >> question to leave the Java implementation as 32-bit, but if we >>>>>>>> > >> do, then I >>>>>>>> > >> think we should consider a different strategy for reference >>>>>>>> > >> implementations. >>>>>>>> > >> >>>>>>>> > >> Thanks, >>>>>>>> > >> Micah >>>>>>>> > >> >>>>>>>> > >> On Sat, Aug 10, 2019 at 5:09 PM Jacques Nadeau >>>>>>>> > >> <jacq...@apache.org> >>>>>>>> > >> wrote: >>>>>>>> > >> >>>>>>>> > >> > Hey Micah, I didn't have a particular path in mind. Was >>>>>>>> > >> > thinking more >>>>>>>> > >> along >>>>>>>> > >> > the lines of extra methods as opposed to separate classes. >>>>>>>> > >> > >>>>>>>> > >> > Arrow hasn't historically been a place where we're writing >>>>>>>> > >> > algorithms in >>>>>>>> > >> > Java so the fact that they aren't there doesn't mean they don't >>>>>>>> > >> > exist. >>>>>>>> > >> We >>>>>>>> > >> > have a large amount of code that depends on the current >>>>>>>> > >> > behavior that is >>>>>>>> > >> > deployed in hundreds of customer clusters (you can peruse our >>>>>>>> > >> > dremio >>>>>>>> > >> repo >>>>>>>> > >> > to see how extensively we leverage Arrow if interested). Trying >>>>>>>> > >> > to >>>>>>>> > >> > determine the ramifications of these changes would be >>>>>>>> > >> > challenging and >>>>>>>> > >> time >>>>>>>> > >> > consuming against all the different ways we interact with the >>>>>>>> > >> > Arrow Java >>>>>>>> > >> > library. I'm not saying I'm against this but want to make sure >>>>>>>> > >> > we've >>>>>>>> > >> > explored all less disruptive options before considering changing >>>>>>>> > >> something >>>>>>>> > >> > this fundamental (especially when I generally hold the view >>>>>>>> > >> > that large >>>>>>>> > >> cell >>>>>>>> > >> > counts against massive contiguous memory is an anti pattern to >>>>>>>> > >> > scalable >>>>>>>> > >> > analytical processing--purely subjective of course). >>>>>>>> > >> > >>>>>>>> > >> > On Sat, Aug 10, 2019, 4:17 PM Micah Kornfield >>>>>>>> > >> > <emkornfi...@gmail.com> >>>>>>>> > >> > wrote: >>>>>>>> > >> > >>>>>>>> > >> > > Hi Jacques, >>>>>>>> > >> > > What avenue were you thinking for supporting both paths? I >>>>>>>> > >> > > didn't >>>>>>>> > >> want >>>>>>>> > >> > > to pursue a different class hierarchy, because I felt like >>>>>>>> > >> > > that would >>>>>>>> > >> > > effectively fork the code base, but that is potentially an >>>>>>>> > >> > > option that >>>>>>>> > >> > > would allow us to have a complete reference implementation in >>>>>>>> > >> > > Java >>>>>>>> > >> that >>>>>>>> > >> > can >>>>>>>> > >> > > fully interact with C++, without major changes to this code. >>>>>>>> > >> > > >>>>>>>> > >> > > For supporting both APIs on the same classes/interfaces, I >>>>>>>> > >> > > think they >>>>>>>> > >> > > roughly fall into three categories, changes to input >>>>>>>> > >> > > parameters, >>>>>>>> > >> changes >>>>>>>> > >> > to >>>>>>>> > >> > > output parameters and algorithm changes. >>>>>>>> > >> > > >>>>>>>> > >> > > For inputs, changing from int to long is essentially a no-op >>>>>>>> > >> > > from the >>>>>>>> > >> > > compiler perspective. From the limited micro-benchmarking >>>>>>>> > >> > > this also >>>>>>>> > >> > > doesn't seem to have a performance impact. So we could keep >>>>>>>> > >> > > two >>>>>>>> > >> versions >>>>>>>> > >> > > of the methods that only differ on inputs, but it is not >>>>>>>> > >> > > clear what >>>>>>>> > >> the >>>>>>>> > >> > > value of that would be. >>>>>>>> > >> > > >>>>>>>> > >> > > For outputs, we can't support methods "long getLength()" and >>>>>>>> > >> > > "int >>>>>>>> > >> > > getLength()" in the same class, so we would be forced into >>>>>>>> > >> > > something >>>>>>>> > >> like >>>>>>>> > >> > > "long getLength(boolean dummy)" which I think is a less >>>>>>>> > >> > > desirable. >>>>>>>> > >> > > >>>>>>>> > >> > > For algorithm changes, there did not appear to be too many >>>>>>>> > >> > > places >>>>>>>> > >> where >>>>>>>> > >> > we >>>>>>>> > >> > > actually loop over all elements (it is quite possible I missed >>>>>>>> > >> something >>>>>>>> > >> > > here), the ones that I did find I was able to mitigate >>>>>>>> > >> > > performance >>>>>>>> > >> > > penalties as noted above. Some of the current implementation >>>>>>>> > >> > > will >>>>>>>> > >> get a >>>>>>>> > >> > > lot slower for "large arrays", but we can likely fix those >>>>>>>> > >> > > later or in >>>>>>>> > >> > this >>>>>>>> > >> > > PR with a nested while loop instead of 2 for loops. >>>>>>>> > >> > > >>>>>>>> > >> > > Thanks, >>>>>>>> > >> > > Micah >>>>>>>> > >> > > >>>>>>>> > >> > > >>>>>>>> > >> > > On Saturday, August 10, 2019, Jacques Nadeau >>>>>>>> > >> > > <jacq...@apache.org> >>>>>>>> > >> wrote: >>>>>>>> > >> > > >>>>>>>> > >> > >> This is a pretty massive change to the apis. I wonder how >>>>>>>> > >> > >> nasty it >>>>>>>> > >> would >>>>>>>> > >> > >> be to just support both paths. Have you evaluated how >>>>>>>> > >> > >> complex that >>>>>>>> > >> > would be? >>>>>>>> > >> > >> >>>>>>>> > >> > >> On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield < >>>>>>>> > >> emkornfi...@gmail.com> >>>>>>>> > >> > >> wrote: >>>>>>>> > >> > >> >>>>>>>> > >> > >>> After more investigation, it looks like Float8Benchmarks at >>>>>>>> > >> > >>> least >>>>>>>> > >> on my >>>>>>>> > >> > >>> machine are within the range of noise. >>>>>>>> > >> > >>> >>>>>>>> > >> > >>> For BitVectorHelper I pushed a new commit [1], seems to >>>>>>>> > >> > >>> bring the >>>>>>>> > >> > >>> BitVectorHelper benchmarks back inline (and even with some >>>>>>>> > >> improvement >>>>>>>> > >> > >>> for >>>>>>>> > >> > >>> getNullCountBenchmark). >>>>>>>> > >> > >>> >>>>>>>> > >> > >>> Benchmark Mode Cnt >>>>>>>> > >> > >>> Score >>>>>>>> > >> > >>> Error >>>>>>>> > >> > >>> Units >>>>>>>> > >> > >>> BitVectorHelperBenchmarks.allBitsNullBenchmark avgt 5 >>>>>>>> > >> > >>> 3.821 ± >>>>>>>> > >> > >>> 0.031 >>>>>>>> > >> > >>> ns/op >>>>>>>> > >> > >>> BitVectorHelperBenchmarks.getNullCountBenchmark avgt 5 >>>>>>>> > >> > >>> 14.884 ± >>>>>>>> > >> > >>> 0.141 >>>>>>>> > >> > >>> ns/op >>>>>>>> > >> > >>> >>>>>>>> > >> > >>> I applied the same pattern to other loops that I could >>>>>>>> > >> > >>> find, and for >>>>>>>> > >> > any >>>>>>>> > >> > >>> "for (long" loop on the critical path, I broke it up into >>>>>>>> > >> > >>> two loops. >>>>>>>> > >> > the >>>>>>>> > >> > >>> first loop does iteration by integer, the second finishes >>>>>>>> > >> > >>> off for >>>>>>>> > >> any >>>>>>>> > >> > >>> long >>>>>>>> > >> > >>> values. As a side note it seems like optimization for >>>>>>>> > >> > >>> loops using >>>>>>>> > >> long >>>>>>>> > >> > >>> counters at least have a semi-recent open bug for the JVM >>>>>>>> > >> > >>> [2] >>>>>>>> > >> > >>> >>>>>>>> > >> > >>> Thanks, >>>>>>>> > >> > >>> Micah >>>>>>>> > >> > >>> >>>>>>>> > >> > >>> [1] >>>>>>>> > >> > >>> >>>>>>>> > >> > >>> >>>>>>>> > >> > >>>>>>>> > >> https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797 >>>>>>>> > >> > >>> [2] https://bugs.openjdk.java.net/browse/JDK-8223051 >>>>>>>> > >> > >>> >>>>>>>> > >> > >>> On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield < >>>>>>>> > >> emkornfi...@gmail.com> >>>>>>>> > >> > >>> wrote: >>>>>>>> > >> > >>> >>>>>>>> > >> > >>> > Indeed, the BoundChecking and CheckNullForGet variables >>>>>>>> > >> > >>> > can make a >>>>>>>> > >> > big >>>>>>>> > >> > >>> > difference. I didn't initially run the benchmarks with >>>>>>>> > >> > >>> > these >>>>>>>> > >> turned >>>>>>>> > >> > on >>>>>>>> > >> > >>> > (you can see the result from above with >>>>>>>> > >> > >>> > Float8Benchmarks). Here >>>>>>>> > >> are >>>>>>>> > >> > >>> new >>>>>>>> > >> > >>> > numbers including with the flags enabled. It looks like >>>>>>>> > >> > >>> > using >>>>>>>> > >> longs >>>>>>>> > >> > >>> might >>>>>>>> > >> > >>> > be a little bit slower, I'll see what I can do to >>>>>>>> > >> > >>> > mitigate this. >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> > Ravindra also volunteered to try to benchmark the changes >>>>>>>> > >> > >>> > with >>>>>>>> > >> > Dremio's >>>>>>>> > >> > >>> > code on today's sync call. >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> > New >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> > Benchmark Mode >>>>>>>> > >> > >>> > Cnt Score >>>>>>>> > >> > >>> Error >>>>>>>> > >> > >>> > Units >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark avgt 5 >>>>>>>> > >> 4.176 ± >>>>>>>> > >> > >>> 1.292 >>>>>>>> > >> > >>> > ns/op >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark avgt 5 >>>>>>>> > >> 26.102 ± >>>>>>>> > >> > >>> 0.700 >>>>>>>> > >> > >>> > ns/op >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark avgt 5 7.398 ± >>>>>>>> > >> > >>> > 0.084 >>>>>>>> > >> us/op >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark avgt 5 2.711 ± >>>>>>>> > >> > >>> > 0.057 >>>>>>>> > >> us/op >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> > old >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark avgt 5 >>>>>>>> > >> 3.828 ± >>>>>>>> > >> > >>> 0.030 >>>>>>>> > >> > >>> > ns/op >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark avgt 5 >>>>>>>> > >> 20.611 ± >>>>>>>> > >> > >>> 0.188 >>>>>>>> > >> > >>> > ns/op >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark avgt 5 6.597 ± >>>>>>>> > >> > >>> > 0.462 >>>>>>>> > >> us/op >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark avgt 5 2.615 ± >>>>>>>> > >> > >>> > 0.027 >>>>>>>> > >> us/op >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya >>>>>>>> > >> > >>> > <liya.fa...@gmail.com> >>>>>>>> > >> > wrote: >>>>>>>> > >> > >>> > >>>>>>>> > >> > >>> >> Hi Gonzalo, >>>>>>>> > >> > >>> >> >>>>>>>> > >> > >>> >> Thanks for sharing the performance results. >>>>>>>> > >> > >>> >> I am wondering if you have turned off the flag >>>>>>>> > >> > >>> >> BoundsChecking#BOUNDS_CHECKING_ENABLED. >>>>>>>> > >> > >>> >> If not, the lower throughput should be expected. >>>>>>>> > >> > >>> >> >>>>>>>> > >> > >>> >> Best, >>>>>>>> > >> > >>> >> Liya Fan >>>>>>>> > >> > >>> >> >>>>>>>> > >> > >>> >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield < >>>>>>>> > >> > >>> emkornfi...@gmail.com> >>>>>>>> > >> > >>> >> wrote: >>>>>>>> > >> > >>> >> >>>>>>>> > >> > >>> >>> Hi Gonzalo, >>>>>>>> > >> > >>> >>> Thank you for the feedback. I wasn't aware of the JIT >>>>>>>> > >> > >>> implications. At >>>>>>>> > >> > >>> >>> least on the benchmark run they don't seem to have an >>>>>>>> > >> > >>> >>> impact. >>>>>>>> > >> > >>> >>> >>>>>>>> > >> > >>> >>> If there are other benchmarks that people have that can >>>>>>>> > >> validate if >>>>>>>> > >> > >>> this >>>>>>>> > >> > >>> >>> change will be problematic I would appreciate trying to >>>>>>>> > >> > >>> >>> run them >>>>>>>> > >> > >>> with the >>>>>>>> > >> > >>> >>> PR. I will try to run the ones for zeroing/popcnt >>>>>>>> > >> > >>> >>> tonight to >>>>>>>> > >> see >>>>>>>> > >> > if >>>>>>>> > >> > >>> >>> there >>>>>>>> > >> > >>> >>> is a change in those. >>>>>>>> > >> > >>> >>> >>>>>>>> > >> > >>> >>> -Micah >>>>>>>> > >> > >>> >>> >>>>>>>> > >> > >>> >>> >>>>>>>> > >> > >>> >>> >>>>>>>> > >> > >>> >>> On Wednesday, August 7, 2019, Gonzalo Ortiz Jaureguizar >>>>>>>> > >> > >>> >>> < >>>>>>>> > >> > >>> >>> golthir...@gmail.com> wrote: >>>>>>>> > >> > >>> >>> >>>>>>>> > >> > >>> >>> > I would recommend to take care with this kind of >>>>>>>> > >> > >>> >>> > changes. >>>>>>>> > >> > >>> >>> > >>>>>>>> > >> > >>> >>> > I didn't try Arrow in more than one year, but by then >>>>>>>> > >> > >>> >>> > the >>>>>>>> > >> > >>> performance >>>>>>>> > >> > >>> >>> was >>>>>>>> > >> > >>> >>> > quite bad in comparison with plain byte buffer access >>>>>>>> > >> > >>> >>> > (see >>>>>>>> > >> > >>> >>> > http://git.net/apache-arrow-development/msg02353.html >>>>>>>> > >> > >>> >>> > *) >>>>>>>> > >> > and >>>>>>>> > >> > >>> >>> > there are several optimizations that the JVM >>>>>>>> > >> > >>> >>> > (specifically, >>>>>>>> > >> C2) >>>>>>>> > >> > >>> does >>>>>>>> > >> > >>> >>> not >>>>>>>> > >> > >>> >>> > apply when dealing with int instead of longs. One of >>>>>>>> > >> > >>> >>> > the >>>>>>>> > >> > >>> >>> > most commons is the loop unrolling and vectorization. >>>>>>>> > >> > >>> >>> > >>>>>>>> > >> > >>> >>> > * It doesn't seem the best way to reference an old >>>>>>>> > >> > >>> >>> > email on >>>>>>>> > >> the >>>>>>>> > >> > >>> list, >>>>>>>> > >> > >>> >>> but >>>>>>>> > >> > >>> >>> > it is the only result shown by Google >>>>>>>> > >> > >>> >>> > >>>>>>>> > >> > >>> >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (< >>>>>>>> > >> > liya.fa...@gmail.com >>>>>>>> > >> > >>> >) >>>>>>>> > >> > >>> >>> > escribió: >>>>>>>> > >> > >>> >>> > >>>>>>>> > >> > >>> >>> >> Hi Micah, >>>>>>>> > >> > >>> >>> >> >>>>>>>> > >> > >>> >>> >> Thanks for your effort. The performance result looks >>>>>>>> > >> > >>> >>> >> good. >>>>>>>> > >> > >>> >>> >> >>>>>>>> > >> > >>> >>> >> As you indicated, ArrowBuf will take additional 12 >>>>>>>> > >> > >>> >>> >> bytes (4 >>>>>>>> > >> > bytes >>>>>>>> > >> > >>> for >>>>>>>> > >> > >>> >>> each >>>>>>>> > >> > >>> >>> >> of length, write index, and read index). >>>>>>>> > >> > >>> >>> >> Similar overheads also exist for vectors like >>>>>>>> > >> > >>> BaseFixedWidthVector, >>>>>>>> > >> > >>> >>> >> BaseVariableWidthVector, etc. >>>>>>>> > >> > >>> >>> >> >>>>>>>> > >> > >>> >>> >> IMO, such overheads are small enough to justify the >>>>>>>> > >> > >>> >>> >> change. >>>>>>>> > >> > >>> >>> >> Let's check if there are other overheads. >>>>>>>> > >> > >>> >>> >> >>>>>>>> > >> > >>> >>> >> Best, >>>>>>>> > >> > >>> >>> >> Liya Fan >>>>>>>> > >> > >>> >>> >> >>>>>>>> > >> > >>> >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield < >>>>>>>> > >> > >>> emkornfi...@gmail.com >>>>>>>> > >> > >>> >>> > >>>>>>>> > >> > >>> >>> >> wrote: >>>>>>>> > >> > >>> >>> >> >>>>>>>> > >> > >>> >>> >> > Hi Liya Fan, >>>>>>>> > >> > >>> >>> >> > Based on the Float8Benchmark there does not seem >>>>>>>> > >> > >>> >>> >> > to be any >>>>>>>> > >> > >>> >>> meaningful >>>>>>>> > >> > >>> >>> >> > performance difference on my machine. At least >>>>>>>> > >> > >>> >>> >> > for me, the >>>>>>>> > >> > >>> >>> benchmarks >>>>>>>> > >> > >>> >>> >> are >>>>>>>> > >> > >>> >>> >> > not stable enough to say one is faster than the >>>>>>>> > >> > >>> >>> >> > other (I've >>>>>>>> > >> > >>> pasted >>>>>>>> > >> > >>> >>> >> results >>>>>>>> > >> > >>> >>> >> > below). That being said my machine isn't >>>>>>>> > >> > >>> >>> >> > necessarily the >>>>>>>> > >> most >>>>>>>> > >> > >>> >>> reliable >>>>>>>> > >> > >>> >>> >> for >>>>>>>> > >> > >>> >>> >> > benchmarking. >>>>>>>> > >> > >>> >>> >> > >>>>>>>> > >> > >>> >>> >> > On an intuitive level, this makes sense to me, >>>>>>>> > >> > >>> >>> >> > for the >>>>>>>> > >> most >>>>>>>> > >> > >>> part it >>>>>>>> > >> > >>> >>> >> seems >>>>>>>> > >> > >>> >>> >> > like the change just moves casting from "int" to >>>>>>>> > >> > >>> >>> >> > "long" >>>>>>>> > >> > further >>>>>>>> > >> > >>> up >>>>>>>> > >> > >>> >>> the >>>>>>>> > >> > >>> >>> >> > stack for "PlatformDepdendent" operations. If >>>>>>>> > >> > >>> >>> >> > there are >>>>>>>> > >> > other >>>>>>>> > >> > >>> >>> >> benchmarks >>>>>>>> > >> > >>> >>> >> > that you think are worth running let me know. >>>>>>>> > >> > >>> >>> >> > >>>>>>>> > >> > >>> >>> >> > One downside performance wise I think for his >>>>>>>> > >> > >>> >>> >> > change is it >>>>>>>> > >> > >>> >>> increases the >>>>>>>> > >> > >>> >>> >> > size of ArrowBuf objects, which I suppose could >>>>>>>> > >> > >>> >>> >> > influence >>>>>>>> > >> > cache >>>>>>>> > >> > >>> >>> misses >>>>>>>> > >> > >>> >>> >> at >>>>>>>> > >> > >>> >>> >> > some level or increase the size of call-stacks, >>>>>>>> > >> > >>> >>> >> > but this >>>>>>>> > >> > doesn't >>>>>>>> > >> > >>> >>> seem to >>>>>>>> > >> > >>> >>> >> > show up in the benchmark.. >>>>>>>> > >> > >>> >>> >> > >>>>>>>> > >> > >>> >>> >> > Thanks, >>>>>>>> > >> > >>> >>> >> > Micah >>>>>>>> > >> > >>> >>> >> > >>>>>>>> > >> > >>> >>> >> > Sample benchmark numbers: >>>>>>>> > >> > >>> >>> >> > >>>>>>>> > >> > >>> >>> >> > [New Code] >>>>>>>> > >> > >>> >>> >> > Benchmark Mode Cnt >>>>>>>> > >> > >>> >>> >> > Score >>>>>>>> > >> Error >>>>>>>> > >> > >>> >>> Units >>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark avgt 5 >>>>>>>> > >> > >>> >>> >> > 15.441 ± >>>>>>>> > >> 0.469 >>>>>>>> > >> > >>> >>> us/op >>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark avgt 5 >>>>>>>> > >> > >>> >>> >> > 14.057 ± >>>>>>>> > >> 0.115 >>>>>>>> > >> > >>> >>> us/op >>>>>>>> > >> > >>> >>> >> > >>>>>>>> > >> > >>> >>> >> > >>>>>>>> > >> > >>> >>> >> > [Old code] >>>>>>>> > >> > >>> >>> >> > Benchmark Mode Cnt >>>>>>>> > >> > >>> >>> >> > Score >>>>>>>> > >> Error >>>>>>>> > >> > >>> >>> Units >>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark avgt 5 >>>>>>>> > >> > >>> >>> >> > 16.248 ± >>>>>>>> > >> 1.409 >>>>>>>> > >> > >>> >>> us/op >>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark avgt 5 >>>>>>>> > >> > >>> >>> >> > 14.150 ± >>>>>>>> > >> 0.084 >>>>>>>> > >> > >>> >>> us/op >>>>>>>> > >> > >>> >>> >> > >>>>>>>> > >> > >>> >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya < >>>>>>>> > >> liya.fa...@gmail.com >>>>>>>> > >> > > >>>>>>>> > >> > >>> >>> wrote: >>>>>>>> > >> > >>> >>> >> > >>>>>>>> > >> > >>> >>> >> >> Hi Micah, >>>>>>>> > >> > >>> >>> >> >> >>>>>>>> > >> > >>> >>> >> >> Thanks a lot for doing this. >>>>>>>> > >> > >>> >>> >> >> >>>>>>>> > >> > >>> >>> >> >> I am a little concerned about if there is any >>>>>>>> > >> > >>> >>> >> >> negative >>>>>>>> > >> > >>> performance >>>>>>>> > >> > >>> >>> >> impact >>>>>>>> > >> > >>> >>> >> >> on the current 32-bit-length based applications. >>>>>>>> > >> > >>> >>> >> >> Can we do some performance comparison on our >>>>>>>> > >> > >>> >>> >> >> existing >>>>>>>> > >> > >>> benchmarks? >>>>>>>> > >> > >>> >>> >> >> >>>>>>>> > >> > >>> >>> >> >> Best, >>>>>>>> > >> > >>> >>> >> >> Liya Fan >>>>>>>> > >> > >>> >>> >> >> >>>>>>>> > >> > >>> >>> >> >> >>>>>>>> > >> > >>> >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield < >>>>>>>> > >> > >>> >>> emkornfi...@gmail.com> >>>>>>>> > >> > >>> >>> >> >> wrote: >>>>>>>> > >> > >>> >>> >> >> >>>>>>>> > >> > >>> >>> >> >>> There have been some previous discussions on the >>>>>>>> > >> > >>> >>> >> >>> mailing >>>>>>>> > >> > about >>>>>>>> > >> > >>> >>> >> supporting >>>>>>>> > >> > >>> >>> >> >>> 64-bit lengths for Java ValueVectors (this is >>>>>>>> > >> > >>> >>> >> >>> what the >>>>>>>> > >> IPC >>>>>>>> > >> > >>> >>> >> specification >>>>>>>> > >> > >>> >>> >> >>> and C++ support). I created a PR [1] that >>>>>>>> > >> > >>> >>> >> >>> changes all >>>>>>>> > >> APIs >>>>>>>> > >> > >>> that I >>>>>>>> > >> > >>> >>> >> could >>>>>>>> > >> > >>> >>> >> >>> find that take an index to take an "long" >>>>>>>> > >> > >>> >>> >> >>> instead of an >>>>>>>> > >> > "int" >>>>>>>> > >> > >>> (and >>>>>>>> > >> > >>> >>> >> >>> similarly change "size/rowcount" APIs). >>>>>>>> > >> > >>> >>> >> >>> >>>>>>>> > >> > >>> >>> >> >>> It is a big change, so I think it is worth >>>>>>>> > >> > >>> >>> >> >>> discussing if >>>>>>>> > >> it >>>>>>>> > >> > is >>>>>>>> > >> > >>> >>> >> something >>>>>>>> > >> > >>> >>> >> >>> we >>>>>>>> > >> > >>> >>> >> >>> still want to move forward with. It would be >>>>>>>> > >> > >>> >>> >> >>> nice to >>>>>>>> > >> come >>>>>>>> > >> > to >>>>>>>> > >> > >>> a >>>>>>>> > >> > >>> >>> >> >>> conclusion >>>>>>>> > >> > >>> >>> >> >>> quickly, ideally in the next few days, to avoid >>>>>>>> > >> > >>> >>> >> >>> a lot of >>>>>>>> > >> > merge >>>>>>>> > >> > >>> >>> >> conflicts. >>>>>>>> > >> > >>> >>> >> >>> >>>>>>>> > >> > >>> >>> >> >>> The reason I did this work now is the C++ >>>>>>>> > >> > >>> >>> >> >>> implementation >>>>>>>> > >> has >>>>>>>> > >> > >>> added >>>>>>>> > >> > >>> >>> >> >>> support >>>>>>>> > >> > >>> >>> >> >>> for LargeList, LargeBinary and LargeString >>>>>>>> > >> > >>> >>> >> >>> arrays and >>>>>>>> > >> based >>>>>>>> > >> > on >>>>>>>> > >> > >>> >>> prior >>>>>>>> > >> > >>> >>> >> >>> discussions we need to have similar support in >>>>>>>> > >> > >>> >>> >> >>> Java >>>>>>>> > >> before >>>>>>>> > >> > our >>>>>>>> > >> > >>> >>> next >>>>>>>> > >> > >>> >>> >> >>> release. Support 64-bit indexes means we can >>>>>>>> > >> > >>> >>> >> >>> have full >>>>>>>> > >> > >>> >>> compatibility >>>>>>>> > >> > >>> >>> >> and >>>>>>>> > >> > >>> >>> >> >>> make the most use of the types in Java. >>>>>>>> > >> > >>> >>> >> >>> >>>>>>>> > >> > >>> >>> >> >>> Look forward to hearing feedback. >>>>>>>> > >> > >>> >>> >> >>> >>>>>>>> > >> > >>> >>> >> >>> Thanks, >>>>>>>> > >> > >>> >>> >> >>> Micah >>>>>>>> > >> > >>> >>> >> >>> >>>>>>>> > >> > >>> >>> >> >>> [1] https://github.com/apache/arrow/pull/5020 >>>>>>>> > >> > >>> >>> >> >>> >>>>>>>> > >> > >>> >>> >> >> >>>>>>>> > >> > >>> >>> >> >>>>>>>> > >> > >>> >>> > >>>>>>>> > >> > >>> >>> >>>>>>>> > >> > >>> >> >>>>>>>> > >> > >>> >>>>>>>> > >> > >> >>>>>>>> > >> > >>>>>>>> > >> >>>>>>>> > >