Re: [Discuss][Java] 64-bit lengths for ValueVectors

Wes McKinney Fri, 22 Nov 2019 11:37:05 -0800

hi Micah -- it makes sense to limit the scope for the time being to
permitting LargeString/Binary work to proceed. Jacques, have you had a
chance to look at this?


On Fri, Nov 15, 2019 at 3:07 AM Micah Kornfield <[email protected]> wrote:
>
> Apologies for the long delay, I chose to do the minimal work of limiting this 
> change [1] to allowing ArrowBuf to 64-bit lengths.  This would unblock work 
> on LargeString and LargeBinary.  If this change looks OK, I think there is 
> some follow-up work to add more thorough unit/integration tests.
>
> As an aside, it does seem like the 2GB limit is affecting some users in Spark 
> [2][3], so hopefully LargeString would help with this.
>
> Allowing more than MAX_INT elements is Vectors/array still a blocker for 
> making LargeList useful.
>
> Thanks,
> Micah
>
> [1]  https://github.com/apache/arrow/pull/5020
> [2] 
> https://stackoverflow.com/questions/58739888/spark-is-it-possible-to-increase-pyarrow-buffer#comment103812119_58739888
> [3] https://issues.apache.org/jira/browse/ARROW-4890
>
> On Fri, Aug 23, 2019 at 8:33 AM Jacques Nadeau <[email protected]> wrote:
>>
>>
>>
>> On Fri, Aug 23, 2019, 8:55 PM Micah Kornfield <[email protected]> wrote:
>>>>
>>>> The vector indexes being limited to 32 bits doesn't limit the addressing 
>>>> to 32 bit chunks of memory. For example, you're prime example before was 
>>>> image data. Having 2 billion images of 1mb images would still be supported 
>>>> without changing the index addressing.
>>>
>>> This might be pre-coffee math, but I think we are limited to approximately 
>>> 2000 images because an ArrowBuf only holds up-to 2 billion bytes [1].  
>>> While we have plenty of room for the offsets, we quickly run out of 
>>> contiguous data storage space. For LargeString and LargeBinary this could 
>>> be fixed by changing ArrowBuf.
>>>
>>> LargeArray faces the same problem only it applies to its child vectors.  
>>> Supporting LargeArray properly is really what drove the large-scale 
>>> interface change.
>>
>>
>> My expressed concern about these changes was specifically about the use of 
>> long for get/set in the vector interfaces. I'm not saying that we constrain 
>> memory/ArrowBufs to 32bits.
>>
>>>
>>>> Rebase would help if possible.
>>>
>>> I'll try to get to this in the next few days.
>>>
>>> [1] 
>>> https://github.com/apache/arrow/blob/95175fe7cb8439eebe6d2f6e0495f551d6864380/java/memory/src/main/java/io/netty/buffer/ArrowBuf.java#L164
>>>
>>> On Fri, Aug 23, 2019 at 4:55 AM Jacques Nadeau <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On Fri, Aug 23, 2019, 11:49 AM Micah Kornfield <[email protected]> 
>>>> wrote:
>>>>>>
>>>>>> I don't think we should couple this discussion with the implementation 
>>>>>> of large list, etc since I think those two concepts are independent.
>>>>>
>>>>> I'm still trying to balance in my mind which is a worse experience for 
>>>>> consumers of the libraries for these types.  Claiming that Java supports 
>>>>> these types but throwing an exception when the Vectors exceed 32-bits or 
>>>>> just say they aren't supported until we have 64-bit support in Java.
>>>>
>>>>
>>>> The vector indexes being limited to 32 bits doesn't limit the addressing 
>>>> to 32 bit chunks of memory. For example, you're prime example before was 
>>>> image data. Having 2 billion images of 1mb images would still be supported 
>>>> without changing the index addressing.
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> I've asked some others on my team their opinions on the risk here. I 
>>>>>> think we should probably review some our more complex vector 
>>>>>> interactions and see how the jvm's assembly changes with this kind of 
>>>>>> change. Using microbenchmarking is good but I think we also need to see 
>>>>>> whether we're constantly inserting additional instructions or if in most 
>>>>>> cases, this actually doesn't impact instruction count.
>>>>>
>>>>>
>>>>> Is this something that your team will take on?
>>>>
>>>>
>>>>
>>>> Yeah, we need to look at this I think.
>>>>
>>>>> Do you need a rebased version of the PR or is the existing one sufficient?
>>>>
>>>>
>>>> Rebase would help if possible.
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Micah
>>>>>
>>>>> On Thu, Aug 22, 2019 at 8:56 PM Jacques Nadeau <[email protected]> wrote:
>>>>>>
>>>>>> I don't think we should couple this discussion with the implementation 
>>>>>> of large list, etc since I think those two concepts are independent.
>>>>>>
>>>>>> I've asked some others on my team their opinions on the risk here. I 
>>>>>> think we should probably review some our more complex vector 
>>>>>> interactions and see how the jvm's assembly changes with this kind of 
>>>>>> change. Using microbenchmarking is good but I think we also need to see 
>>>>>> whether we're constantly inserting additional instructions or if in most 
>>>>>> cases, this actually doesn't impact instruction count.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 21, 2019 at 12:18 PM Micah Kornfield <[email protected]> 
>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> With regards to the reference implementation point. It is a good 
>>>>>>>> point. I'm on vacation this week. Unless you're pushing hard on this, 
>>>>>>>> can we pick this up and discuss more next week?
>>>>>>>
>>>>>>>
>>>>>>> Hi Jacques, I hope you had a good rest.  Any more thoughts on the 
>>>>>>> reference implementation aspect of this?
>>>>>>>
>>>>>>>>
>>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it
>>>>>>>> would be best to decouple this discussion from the release timeline
>>>>>>>> given how many people we have relying on regular releases coming out.
>>>>>>>> We can keep continue making major 0.x releases until we're ready to
>>>>>>>> release 1.0.0.
>>>>>>>
>>>>>>>
>>>>>>> I'm OK with it as long as other stakeholders are. Timed releases are 
>>>>>>> the way to go.  As stated on the release thread [1] we need a better 
>>>>>>> mechanism to avoid this type of issue arising again.  The release 
>>>>>>> thread also had some more discussion on compatibility.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Micah
>>>>>>>
>>>>>>> [1] 
>>>>>>> https://lists.apache.org/thread.html/d70feeceaf2570906ade117030b29887af7c77ca5c4a976e6d555920@%3Cdev.arrow.apache.org%3E
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 14, 2019 at 3:23 PM Wes McKinney <[email protected]> 
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Sun, Aug 11, 2019 at 9:40 PM Micah Kornfield 
>>>>>>>> <[email protected]> wrote:
>>>>>>>> >
>>>>>>>> > Hi Wes and Jacques,
>>>>>>>> > See responses below.
>>>>>>>> >
>>>>>>>> > With regards to the reference implementation point. It is a good 
>>>>>>>> > point. I'm
>>>>>>>> > > on vacation this week. Unless you're pushing hard on this, can we 
>>>>>>>> > > pick this
>>>>>>>> > > up and discuss more next week?
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Sure thing, enjoy your vacation.  I think the only practical 
>>>>>>>> > implications
>>>>>>>> > are it delays choices around implementing LargeList, LargeBinary,
>>>>>>>> > LargeString in Java, which in turn might push out the 0.15.0 release.
>>>>>>>> >
>>>>>>>>
>>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it
>>>>>>>> would be best to decouple this discussion from the release timeline
>>>>>>>> given how many people we have relying on regular releases coming out.
>>>>>>>> We can keep continue making major 0.x releases until we're ready to
>>>>>>>> release 1.0.0.
>>>>>>>>
>>>>>>>> > My stance on this is that I don't know how important it is for Java 
>>>>>>>> > to
>>>>>>>> > > support vectors over INT32_MAX elements. The use cases enabled by
>>>>>>>> > > having very large arrays seem to be concentrated in the native code
>>>>>>>> > > world (e.g. C/C++/Rust) -- that could just be 
>>>>>>>> > > implementation-centrism
>>>>>>>> > > on my part, though.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > A data point against this view is Spark has done work to eliminate 
>>>>>>>> > 2GB
>>>>>>>> > memory limits on its block sizes [1].  I don't claim to understand 
>>>>>>>> > the
>>>>>>>> > implications of this. Bryan might you have any thoughts here?  I'm 
>>>>>>>> > OK with
>>>>>>>> > INT32_MAX, as well, I think we should think about what this means for
>>>>>>>> > adding Large types to Java and implications for reference 
>>>>>>>> > implementations.
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> > Micah
>>>>>>>> >
>>>>>>>> > [1] https://issues.apache.org/jira/browse/SPARK-6235
>>>>>>>> >
>>>>>>>> > On Sun, Aug 11, 2019 at 6:31 PM Jacques Nadeau <[email protected]> 
>>>>>>>> > wrote:
>>>>>>>> >
>>>>>>>> > > Hey Micah,
>>>>>>>> > >
>>>>>>>> > > Appreciate the offer on the compiling. The reality is I'm more 
>>>>>>>> > > concerned
>>>>>>>> > > about the unknowns than the compiling issue itself. Any time 
>>>>>>>> > > you've been
>>>>>>>> > > tuning for a while, changing something like this could be totally 
>>>>>>>> > > fine or
>>>>>>>> > > cause a couple of major issues. For example, we've done a very 
>>>>>>>> > > large amount
>>>>>>>> > > of work reducing heap memory footprint of the vectors. Are target 
>>>>>>>> > > is to
>>>>>>>> > > actually get it down to 24 bytes per ArrowBuf and 24 bytes heap 
>>>>>>>> > > per vector
>>>>>>>> > > (not including arrow bufs).
>>>>>>>> > >
>>>>>>>> > > With regards to the reference implementation point. It is a good 
>>>>>>>> > > point.
>>>>>>>> > > I'm on vacation this week. Unless you're pushing hard on this, can 
>>>>>>>> > > we pick
>>>>>>>> > > this up and discuss more next week?
>>>>>>>> > >
>>>>>>>> > > thanks,
>>>>>>>> > > Jacques
>>>>>>>> > >
>>>>>>>> > > On Sat, Aug 10, 2019 at 7:39 PM Micah Kornfield 
>>>>>>>> > > <[email protected]>
>>>>>>>> > > wrote:
>>>>>>>> > >
>>>>>>>> > >> Hi Jacques,
>>>>>>>> > >> I definitely understand these concerns and this change is risky 
>>>>>>>> > >> because it
>>>>>>>> > >> is so large.  Perhaps, creating a new hierarchy, might be the 
>>>>>>>> > >> cleanest way
>>>>>>>> > >> of dealing with this.  This could have other benefits like 
>>>>>>>> > >> cleaning up
>>>>>>>> > >> some
>>>>>>>> > >> cruft around dictionary encode and "orphaned" method.   Per past 
>>>>>>>> > >> e-mail
>>>>>>>> > >> threads I agree it is beneficial to have 2 separate reference
>>>>>>>> > >> implementations that can communicate fully, and my intent here 
>>>>>>>> > >> was to
>>>>>>>> > >> close
>>>>>>>> > >> that gap.
>>>>>>>> > >>
>>>>>>>> > >> Trying to
>>>>>>>> > >> > determine the ramifications of these changes would be 
>>>>>>>> > >> > challenging and
>>>>>>>> > >> time
>>>>>>>> > >> > consuming against all the different ways we interact with the 
>>>>>>>> > >> > Arrow Java
>>>>>>>> > >> > library.
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> > >> Understood.  I took a quick look at Dremio-OSS it seems like it 
>>>>>>>> > >> has a
>>>>>>>> > >> simple java build system?  If it is helpful, I can try to get a 
>>>>>>>> > >> fork
>>>>>>>> > >> running that at least compiles against this PR.  My plan would be 
>>>>>>>> > >> to cast
>>>>>>>> > >> any place that was changed to return a long back to an int, so in 
>>>>>>>> > >> essence
>>>>>>>> > >> the Dremio algorithms would reman 32-bit implementations.
>>>>>>>> > >>
>>>>>>>> > >> I don't  have the infrastructure to test this change properly 
>>>>>>>> > >> from a
>>>>>>>> > >> distributed systems perspective, so it would still take some time 
>>>>>>>> > >> from
>>>>>>>> > >> Dremio to validate for regressions.
>>>>>>>> > >>
>>>>>>>> > >> I'm not saying I'm against this but want to make sure we've
>>>>>>>> > >> > explored all less disruptive options before considering changing
>>>>>>>> > >> something
>>>>>>>> > >> > this fundamental (especially when I generally hold the view 
>>>>>>>> > >> > that large
>>>>>>>> > >> cell
>>>>>>>> > >> > counts against massive contiguous memory is an anti pattern to 
>>>>>>>> > >> > scalable
>>>>>>>> > >> > analytical processing--purely subjective of course).
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> > >> I'm open to other ideas here, as well. I don't think it is out of 
>>>>>>>> > >> the
>>>>>>>> > >> question to leave the Java implementation as 32-bit, but if we 
>>>>>>>> > >> do, then I
>>>>>>>> > >> think we should consider a different strategy for reference
>>>>>>>> > >> implementations.
>>>>>>>> > >>
>>>>>>>> > >> Thanks,
>>>>>>>> > >> Micah
>>>>>>>> > >>
>>>>>>>> > >> On Sat, Aug 10, 2019 at 5:09 PM Jacques Nadeau 
>>>>>>>> > >> <[email protected]>
>>>>>>>> > >> wrote:
>>>>>>>> > >>
>>>>>>>> > >> > Hey Micah, I didn't have a particular path in mind. Was 
>>>>>>>> > >> > thinking more
>>>>>>>> > >> along
>>>>>>>> > >> > the lines of extra methods as opposed to separate classes.
>>>>>>>> > >> >
>>>>>>>> > >> > Arrow hasn't historically been a place where we're writing 
>>>>>>>> > >> > algorithms in
>>>>>>>> > >> > Java so the fact that they aren't there doesn't mean they don't 
>>>>>>>> > >> > exist.
>>>>>>>> > >> We
>>>>>>>> > >> > have a large amount of code that depends on the current 
>>>>>>>> > >> > behavior that is
>>>>>>>> > >> > deployed in hundreds of customer clusters (you can peruse our 
>>>>>>>> > >> > dremio
>>>>>>>> > >> repo
>>>>>>>> > >> > to see how extensively we leverage Arrow if interested). Trying 
>>>>>>>> > >> > to
>>>>>>>> > >> > determine the ramifications of these changes would be 
>>>>>>>> > >> > challenging and
>>>>>>>> > >> time
>>>>>>>> > >> > consuming against all the different ways we interact with the 
>>>>>>>> > >> > Arrow Java
>>>>>>>> > >> > library. I'm not saying I'm against this but want to make sure 
>>>>>>>> > >> > we've
>>>>>>>> > >> > explored all less disruptive options before considering changing
>>>>>>>> > >> something
>>>>>>>> > >> > this fundamental (especially when I generally hold the view 
>>>>>>>> > >> > that large
>>>>>>>> > >> cell
>>>>>>>> > >> > counts against massive contiguous memory is an anti pattern to 
>>>>>>>> > >> > scalable
>>>>>>>> > >> > analytical processing--purely subjective of course).
>>>>>>>> > >> >
>>>>>>>> > >> > On Sat, Aug 10, 2019, 4:17 PM Micah Kornfield 
>>>>>>>> > >> > <[email protected]>
>>>>>>>> > >> > wrote:
>>>>>>>> > >> >
>>>>>>>> > >> > > Hi Jacques,
>>>>>>>> > >> > > What avenue were you thinking for supporting both paths?   I 
>>>>>>>> > >> > > didn't
>>>>>>>> > >> want
>>>>>>>> > >> > > to pursue a different class hierarchy, because I felt like 
>>>>>>>> > >> > > that would
>>>>>>>> > >> > > effectively fork the code base, but that is potentially an 
>>>>>>>> > >> > > option that
>>>>>>>> > >> > > would allow us to have a complete reference implementation in 
>>>>>>>> > >> > > Java
>>>>>>>> > >> that
>>>>>>>> > >> > can
>>>>>>>> > >> > > fully interact with C++, without major changes to this code.
>>>>>>>> > >> > >
>>>>>>>> > >> > > For supporting both APIs on the same classes/interfaces, I 
>>>>>>>> > >> > > think they
>>>>>>>> > >> > > roughly fall into three categories, changes to input 
>>>>>>>> > >> > > parameters,
>>>>>>>> > >> changes
>>>>>>>> > >> > to
>>>>>>>> > >> > > output parameters and algorithm changes.
>>>>>>>> > >> > >
>>>>>>>> > >> > > For inputs, changing from int to long is essentially a no-op 
>>>>>>>> > >> > > from the
>>>>>>>> > >> > > compiler perspective.  From the limited micro-benchmarking 
>>>>>>>> > >> > > this also
>>>>>>>> > >> > > doesn't seem to have a performance impact.  So we could keep 
>>>>>>>> > >> > > two
>>>>>>>> > >> versions
>>>>>>>> > >> > > of the methods that only differ on inputs, but it is not 
>>>>>>>> > >> > > clear what
>>>>>>>> > >> the
>>>>>>>> > >> > > value of that would be.
>>>>>>>> > >> > >
>>>>>>>> > >> > > For outputs, we can't support methods "long getLength()" and 
>>>>>>>> > >> > > "int
>>>>>>>> > >> > > getLength()" in the same class, so we would be forced into 
>>>>>>>> > >> > > something
>>>>>>>> > >> like
>>>>>>>> > >> > > "long getLength(boolean dummy)" which I think is a less 
>>>>>>>> > >> > > desirable.
>>>>>>>> > >> > >
>>>>>>>> > >> > > For algorithm changes, there did not appear to be too many 
>>>>>>>> > >> > > places
>>>>>>>> > >> where
>>>>>>>> > >> > we
>>>>>>>> > >> > > actually loop over all elements (it is quite possible I missed
>>>>>>>> > >> something
>>>>>>>> > >> > > here), the ones that I did find I was able to mitigate 
>>>>>>>> > >> > > performance
>>>>>>>> > >> > > penalties as noted above.  Some of the current implementation 
>>>>>>>> > >> > > will
>>>>>>>> > >> get a
>>>>>>>> > >> > > lot slower for "large arrays", but we can likely fix those 
>>>>>>>> > >> > > later or in
>>>>>>>> > >> > this
>>>>>>>> > >> > > PR with a nested while loop instead of 2 for loops.
>>>>>>>> > >> > >
>>>>>>>> > >> > > Thanks,
>>>>>>>> > >> > > Micah
>>>>>>>> > >> > >
>>>>>>>> > >> > >
>>>>>>>> > >> > > On Saturday, August 10, 2019, Jacques Nadeau 
>>>>>>>> > >> > > <[email protected]>
>>>>>>>> > >> wrote:
>>>>>>>> > >> > >
>>>>>>>> > >> > >> This is a pretty massive change to the apis. I wonder how 
>>>>>>>> > >> > >> nasty it
>>>>>>>> > >> would
>>>>>>>> > >> > >> be to just support both paths. Have you evaluated how 
>>>>>>>> > >> > >> complex that
>>>>>>>> > >> > would be?
>>>>>>>> > >> > >>
>>>>>>>> > >> > >> On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield <
>>>>>>>> > >> [email protected]>
>>>>>>>> > >> > >> wrote:
>>>>>>>> > >> > >>
>>>>>>>> > >> > >>> After more investigation, it looks like Float8Benchmarks at 
>>>>>>>> > >> > >>> least
>>>>>>>> > >> on my
>>>>>>>> > >> > >>> machine are within the range of noise.
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> For BitVectorHelper I pushed a new commit [1], seems to 
>>>>>>>> > >> > >>> bring the
>>>>>>>> > >> > >>> BitVectorHelper benchmarks back inline (and even with some
>>>>>>>> > >> improvement
>>>>>>>> > >> > >>> for
>>>>>>>> > >> > >>> getNullCountBenchmark).
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> Benchmark                                        Mode  Cnt  
>>>>>>>> > >> > >>>  Score
>>>>>>>> > >> > >>>  Error
>>>>>>>> > >> > >>>  Units
>>>>>>>> > >> > >>> BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt    5  
>>>>>>>> > >> > >>>  3.821 ±
>>>>>>>> > >> > >>> 0.031
>>>>>>>> > >> > >>>  ns/op
>>>>>>>> > >> > >>> BitVectorHelperBenchmarks.getNullCountBenchmark  avgt    5  
>>>>>>>> > >> > >>> 14.884 ±
>>>>>>>> > >> > >>> 0.141
>>>>>>>> > >> > >>>  ns/op
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> I applied the same pattern to other loops that I could 
>>>>>>>> > >> > >>> find, and for
>>>>>>>> > >> > any
>>>>>>>> > >> > >>> "for (long" loop on the critical path, I broke it up into 
>>>>>>>> > >> > >>> two loops.
>>>>>>>> > >> > the
>>>>>>>> > >> > >>> first loop does iteration by integer, the second finishes 
>>>>>>>> > >> > >>> off for
>>>>>>>> > >> any
>>>>>>>> > >> > >>> long
>>>>>>>> > >> > >>> values.  As a side note it seems like optimization for 
>>>>>>>> > >> > >>> loops using
>>>>>>>> > >> long
>>>>>>>> > >> > >>> counters at least have a semi-recent open bug for the JVM 
>>>>>>>> > >> > >>> [2]
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> Thanks,
>>>>>>>> > >> > >>> Micah
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> [1]
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>>
>>>>>>>> > >> >
>>>>>>>> > >> https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797
>>>>>>>> > >> > >>> [2] https://bugs.openjdk.java.net/browse/JDK-8223051
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield <
>>>>>>>> > >> [email protected]>
>>>>>>>> > >> > >>> wrote:
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> > Indeed, the BoundChecking and CheckNullForGet variables 
>>>>>>>> > >> > >>> > can make a
>>>>>>>> > >> > big
>>>>>>>> > >> > >>> > difference.  I didn't initially run the benchmarks with 
>>>>>>>> > >> > >>> > these
>>>>>>>> > >> turned
>>>>>>>> > >> > on
>>>>>>>> > >> > >>> > (you can see the result from above with 
>>>>>>>> > >> > >>> > Float8Benchmarks).  Here
>>>>>>>> > >> are
>>>>>>>> > >> > >>> new
>>>>>>>> > >> > >>> > numbers including with the flags enabled.  It looks like 
>>>>>>>> > >> > >>> > using
>>>>>>>> > >> longs
>>>>>>>> > >> > >>> might
>>>>>>>> > >> > >>> > be a little bit slower, I'll see what I can do to 
>>>>>>>> > >> > >>> > mitigate this.
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Ravindra also volunteered to try to benchmark the changes 
>>>>>>>> > >> > >>> > with
>>>>>>>> > >> > Dremio's
>>>>>>>> > >> > >>> > code on today's sync call.
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > New
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Benchmark                                        Mode  
>>>>>>>> > >> > >>> > Cnt   Score
>>>>>>>> > >> > >>>  Error
>>>>>>>> > >> > >>> > Units
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt    5
>>>>>>>> > >>  4.176 ±
>>>>>>>> > >> > >>> 1.292
>>>>>>>> > >> > >>> > ns/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt    5
>>>>>>>> > >> 26.102 ±
>>>>>>>> > >> > >>> 0.700
>>>>>>>> > >> > >>> > ns/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5  7.398 ± 
>>>>>>>> > >> > >>> > 0.084
>>>>>>>> > >> us/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.711 ± 
>>>>>>>> > >> > >>> > 0.057
>>>>>>>> > >> us/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > old
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt    5
>>>>>>>> > >>  3.828 ±
>>>>>>>> > >> > >>> 0.030
>>>>>>>> > >> > >>> > ns/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt    5
>>>>>>>> > >> 20.611 ±
>>>>>>>> > >> > >>> 0.188
>>>>>>>> > >> > >>> > ns/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5  6.597 ± 
>>>>>>>> > >> > >>> > 0.462
>>>>>>>> > >> us/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.615 ± 
>>>>>>>> > >> > >>> > 0.027
>>>>>>>> > >> us/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya 
>>>>>>>> > >> > >>> > <[email protected]>
>>>>>>>> > >> > wrote:
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> >> Hi Gonzalo,
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>> >> Thanks for sharing the performance results.
>>>>>>>> > >> > >>> >> I am wondering if you have turned off the flag
>>>>>>>> > >> > >>> >> BoundsChecking#BOUNDS_CHECKING_ENABLED.
>>>>>>>> > >> > >>> >> If not, the lower throughput should be expected.
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>> >> Best,
>>>>>>>> > >> > >>> >> Liya Fan
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>> >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield <
>>>>>>>> > >> > >>> [email protected]>
>>>>>>>> > >> > >>> >> wrote:
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>> >>> Hi Gonzalo,
>>>>>>>> > >> > >>> >>> Thank you for the feedback.  I wasn't aware of the JIT
>>>>>>>> > >> > >>> implications.   At
>>>>>>>> > >> > >>> >>> least on the benchmark run they don't seem to have an 
>>>>>>>> > >> > >>> >>> impact.
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>> If there are other benchmarks that people have that can
>>>>>>>> > >> validate if
>>>>>>>> > >> > >>> this
>>>>>>>> > >> > >>> >>> change will be problematic I would appreciate trying to 
>>>>>>>> > >> > >>> >>> run them
>>>>>>>> > >> > >>> with the
>>>>>>>> > >> > >>> >>> PR.  I will try to run the ones for zeroing/popcnt 
>>>>>>>> > >> > >>> >>> tonight to
>>>>>>>> > >> see
>>>>>>>> > >> > if
>>>>>>>> > >> > >>> >>> there
>>>>>>>> > >> > >>> >>> is a change in those.
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>> -Micah
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>> On Wednesday, August 7, 2019, Gonzalo Ortiz Jaureguizar 
>>>>>>>> > >> > >>> >>> <
>>>>>>>> > >> > >>> >>> [email protected]> wrote:
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>> > I would recommend to take care with this kind of 
>>>>>>>> > >> > >>> >>> > changes.
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> > I didn't try Arrow in more than one year, but by then 
>>>>>>>> > >> > >>> >>> > the
>>>>>>>> > >> > >>> performance
>>>>>>>> > >> > >>> >>> was
>>>>>>>> > >> > >>> >>> > quite bad in comparison with plain byte buffer access
>>>>>>>> > >> > >>> >>> > (see 
>>>>>>>> > >> > >>> >>> > http://git.net/apache-arrow-development/msg02353.html 
>>>>>>>> > >> > >>> >>> > *)
>>>>>>>> > >> > and
>>>>>>>> > >> > >>> >>> > there are several optimizations that the JVM 
>>>>>>>> > >> > >>> >>> > (specifically,
>>>>>>>> > >> C2)
>>>>>>>> > >> > >>> does
>>>>>>>> > >> > >>> >>> not
>>>>>>>> > >> > >>> >>> > apply when dealing with int instead of longs. One of 
>>>>>>>> > >> > >>> >>> > the
>>>>>>>> > >> > >>> >>> > most commons is the loop unrolling and vectorization.
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> > * It doesn't seem the best way to reference an old 
>>>>>>>> > >> > >>> >>> > email on
>>>>>>>> > >> the
>>>>>>>> > >> > >>> list,
>>>>>>>> > >> > >>> >>> but
>>>>>>>> > >> > >>> >>> > it is the only result shown by Google
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (<
>>>>>>>> > >> > [email protected]
>>>>>>>> > >> > >>> >)
>>>>>>>> > >> > >>> >>> > escribió:
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> >> Hi Micah,
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> Thanks for your effort. The performance result looks 
>>>>>>>> > >> > >>> >>> >> good.
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> As you indicated, ArrowBuf will take additional 12 
>>>>>>>> > >> > >>> >>> >> bytes (4
>>>>>>>> > >> > bytes
>>>>>>>> > >> > >>> for
>>>>>>>> > >> > >>> >>> each
>>>>>>>> > >> > >>> >>> >> of length, write index, and read index).
>>>>>>>> > >> > >>> >>> >> Similar overheads also exist for vectors like
>>>>>>>> > >> > >>> BaseFixedWidthVector,
>>>>>>>> > >> > >>> >>> >> BaseVariableWidthVector, etc.
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> IMO, such overheads are small enough to justify the 
>>>>>>>> > >> > >>> >>> >> change.
>>>>>>>> > >> > >>> >>> >> Let's check if there are other overheads.
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> Best,
>>>>>>>> > >> > >>> >>> >> Liya Fan
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield <
>>>>>>>> > >> > >>> [email protected]
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> >> wrote:
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> > Hi Liya Fan,
>>>>>>>> > >> > >>> >>> >> > Based on the Float8Benchmark there does not seem 
>>>>>>>> > >> > >>> >>> >> > to be any
>>>>>>>> > >> > >>> >>> meaningful
>>>>>>>> > >> > >>> >>> >> > performance difference on my machine.  At least 
>>>>>>>> > >> > >>> >>> >> > for me, the
>>>>>>>> > >> > >>> >>> benchmarks
>>>>>>>> > >> > >>> >>> >> are
>>>>>>>> > >> > >>> >>> >> > not stable enough to say one is faster than the 
>>>>>>>> > >> > >>> >>> >> > other (I've
>>>>>>>> > >> > >>> pasted
>>>>>>>> > >> > >>> >>> >> results
>>>>>>>> > >> > >>> >>> >> > below).  That being said my machine isn't 
>>>>>>>> > >> > >>> >>> >> > necessarily the
>>>>>>>> > >> most
>>>>>>>> > >> > >>> >>> reliable
>>>>>>>> > >> > >>> >>> >> for
>>>>>>>> > >> > >>> >>> >> > benchmarking.
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > On an intuitive level, this makes sense to me,  
>>>>>>>> > >> > >>> >>> >> > for the
>>>>>>>> > >> most
>>>>>>>> > >> > >>> part it
>>>>>>>> > >> > >>> >>> >> seems
>>>>>>>> > >> > >>> >>> >> > like the change just moves casting from "int" to 
>>>>>>>> > >> > >>> >>> >> > "long"
>>>>>>>> > >> > further
>>>>>>>> > >> > >>> up
>>>>>>>> > >> > >>> >>> the
>>>>>>>> > >> > >>> >>> >> > stack  for  "PlatformDepdendent" operations.  If 
>>>>>>>> > >> > >>> >>> >> > there are
>>>>>>>> > >> > other
>>>>>>>> > >> > >>> >>> >> benchmarks
>>>>>>>> > >> > >>> >>> >> > that you think are worth running let me know.
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > One downside performance wise I think for his 
>>>>>>>> > >> > >>> >>> >> > change is it
>>>>>>>> > >> > >>> >>> increases the
>>>>>>>> > >> > >>> >>> >> > size of ArrowBuf objects, which I suppose could 
>>>>>>>> > >> > >>> >>> >> > influence
>>>>>>>> > >> > cache
>>>>>>>> > >> > >>> >>> misses
>>>>>>>> > >> > >>> >>> >> at
>>>>>>>> > >> > >>> >>> >> > some level or increase the size of call-stacks, 
>>>>>>>> > >> > >>> >>> >> > but this
>>>>>>>> > >> > doesn't
>>>>>>>> > >> > >>> >>> seem to
>>>>>>>> > >> > >>> >>> >> > show up in the benchmark..
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > Thanks,
>>>>>>>> > >> > >>> >>> >> > Micah
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > Sample benchmark numbers:
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > [New Code]
>>>>>>>> > >> > >>> >>> >> > Benchmark                            Mode  Cnt   
>>>>>>>> > >> > >>> >>> >> > Score
>>>>>>>> > >>  Error
>>>>>>>> > >> > >>> >>> Units
>>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5  
>>>>>>>> > >> > >>> >>> >> > 15.441 ±
>>>>>>>> > >> 0.469
>>>>>>>> > >> > >>> >>> us/op
>>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5  
>>>>>>>> > >> > >>> >>> >> > 14.057 ±
>>>>>>>> > >> 0.115
>>>>>>>> > >> > >>> >>> us/op
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > [Old code]
>>>>>>>> > >> > >>> >>> >> > Benchmark                            Mode  Cnt   
>>>>>>>> > >> > >>> >>> >> > Score
>>>>>>>> > >>  Error
>>>>>>>> > >> > >>> >>> Units
>>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5  
>>>>>>>> > >> > >>> >>> >> > 16.248 ±
>>>>>>>> > >> 1.409
>>>>>>>> > >> > >>> >>> us/op
>>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5  
>>>>>>>> > >> > >>> >>> >> > 14.150 ±
>>>>>>>> > >> 0.084
>>>>>>>> > >> > >>> >>> us/op
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya <
>>>>>>>> > >> [email protected]
>>>>>>>> > >> > >
>>>>>>>> > >> > >>> >>> wrote:
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> >> Hi Micah,
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >> Thanks a lot for doing this.
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >> I am a little concerned about if there is any 
>>>>>>>> > >> > >>> >>> >> >> negative
>>>>>>>> > >> > >>> performance
>>>>>>>> > >> > >>> >>> >> impact
>>>>>>>> > >> > >>> >>> >> >> on the current 32-bit-length based applications.
>>>>>>>> > >> > >>> >>> >> >> Can we do some performance comparison on our 
>>>>>>>> > >> > >>> >>> >> >> existing
>>>>>>>> > >> > >>> benchmarks?
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >> Best,
>>>>>>>> > >> > >>> >>> >> >> Liya Fan
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield <
>>>>>>>> > >> > >>> >>> [email protected]>
>>>>>>>> > >> > >>> >>> >> >> wrote:
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >>> There have been some previous discussions on the 
>>>>>>>> > >> > >>> >>> >> >>> mailing
>>>>>>>> > >> > about
>>>>>>>> > >> > >>> >>> >> supporting
>>>>>>>> > >> > >>> >>> >> >>> 64-bit lengths for  Java ValueVectors (this is 
>>>>>>>> > >> > >>> >>> >> >>> what the
>>>>>>>> > >> IPC
>>>>>>>> > >> > >>> >>> >> specification
>>>>>>>> > >> > >>> >>> >> >>> and C++ support).  I created a PR [1] that 
>>>>>>>> > >> > >>> >>> >> >>> changes all
>>>>>>>> > >> APIs
>>>>>>>> > >> > >>> that I
>>>>>>>> > >> > >>> >>> >> could
>>>>>>>> > >> > >>> >>> >> >>> find that take an index to take an "long" 
>>>>>>>> > >> > >>> >>> >> >>> instead of an
>>>>>>>> > >> > "int"
>>>>>>>> > >> > >>> (and
>>>>>>>> > >> > >>> >>> >> >>> similarly change "size/rowcount" APIs).
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> It is a big change, so I think it is worth 
>>>>>>>> > >> > >>> >>> >> >>> discussing if
>>>>>>>> > >> it
>>>>>>>> > >> > is
>>>>>>>> > >> > >>> >>> >> something
>>>>>>>> > >> > >>> >>> >> >>> we
>>>>>>>> > >> > >>> >>> >> >>> still want to move forward with.  It would be 
>>>>>>>> > >> > >>> >>> >> >>> nice to
>>>>>>>> > >> come
>>>>>>>> > >> > to
>>>>>>>> > >> > >>> a
>>>>>>>> > >> > >>> >>> >> >>> conclusion
>>>>>>>> > >> > >>> >>> >> >>> quickly, ideally in the next few days, to avoid 
>>>>>>>> > >> > >>> >>> >> >>> a lot of
>>>>>>>> > >> > merge
>>>>>>>> > >> > >>> >>> >> conflicts.
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> The reason I did this work now is the C++ 
>>>>>>>> > >> > >>> >>> >> >>> implementation
>>>>>>>> > >> has
>>>>>>>> > >> > >>> added
>>>>>>>> > >> > >>> >>> >> >>> support
>>>>>>>> > >> > >>> >>> >> >>> for LargeList, LargeBinary and LargeString 
>>>>>>>> > >> > >>> >>> >> >>> arrays and
>>>>>>>> > >> based
>>>>>>>> > >> > on
>>>>>>>> > >> > >>> >>> prior
>>>>>>>> > >> > >>> >>> >> >>> discussions we need to have similar support in 
>>>>>>>> > >> > >>> >>> >> >>> Java
>>>>>>>> > >> before
>>>>>>>> > >> > our
>>>>>>>> > >> > >>> >>> next
>>>>>>>> > >> > >>> >>> >> >>> release. Support 64-bit indexes means we can 
>>>>>>>> > >> > >>> >>> >> >>> have full
>>>>>>>> > >> > >>> >>> compatibility
>>>>>>>> > >> > >>> >>> >> and
>>>>>>>> > >> > >>> >>> >> >>> make the most use of the types in Java.
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> Look forward to hearing feedback.
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> Thanks,
>>>>>>>> > >> > >>> >>> >> >>> Micah
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> [1] https://github.com/apache/arrow/pull/5020
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>
>>>>>>>> > >> >
>>>>>>>> > >>
>>>>>>>> > >

Re: [Discuss][Java] 64-bit lengths for ValueVectors

Reply via email to