Re: [Discuss][Java] 64-bit lengths for ValueVectors

Fan Liya Fri, 15 Nov 2019 02:06:34 -0800

I think the 2GB limit is overly restrictive for modern computers.
This is a problem we must face anyway.


Best,
Liya Fan

On Fri, Nov 15, 2019 at 5:07 PM Micah Kornfield <[email protected]>
wrote:

> Apologies for the long delay, I chose to do the minimal work of limiting
> this change [1] to allowing ArrowBuf to 64-bit lengths.  This would unblock
> work on LargeString and LargeBinary.  If this change looks OK, I think
> there is some follow-up work to add more thorough unit/integration tests.
>
> As an aside, it does seem like the 2GB limit is affecting some users in
> Spark [2][3], so hopefully LargeString would help with this.
>
> Allowing more than MAX_INT elements is Vectors/array still a blocker for
> making LargeList useful.
>
> Thanks,
> Micah
>
> [1]  https://github.com/apache/arrow/pull/5020
> [2]
> https://stackoverflow.com/questions/58739888/spark-is-it-possible-to-increase-pyarrow-buffer#comment103812119_58739888
> [3] https://issues.apache.org/jira/browse/ARROW-4890
>
> On Fri, Aug 23, 2019 at 8:33 AM Jacques Nadeau <[email protected]> wrote:
>
>>
>>
>> On Fri, Aug 23, 2019, 8:55 PM Micah Kornfield <[email protected]>
>> wrote:
>>
>>> The vector indexes being limited to 32 bits doesn't limit the addressing
>>>> to 32 bit chunks of memory. For example, you're prime example before was
>>>> image data. Having 2 billion images of 1mb images would still be supported
>>>> without changing the index addressing.
>>>
>>> This might be pre-coffee math, but I think we are limited to
>>> approximately 2000 images because an ArrowBuf only holds up-to 2 billion
>>> bytes [1].  While we have plenty of room for the offsets, we quickly run
>>> out of contiguous data storage space. For LargeString and LargeBinary this
>>> could be fixed by changing ArrowBuf.
>>>
>>> LargeArray faces the same problem only it applies to its child vectors.
>>> Supporting LargeArray properly is really what drove the large-scale
>>> interface change.
>>>
>>
>> My expressed concern about these changes was specifically about the use
>> of long for get/set in the vector interfaces. I'm not saying that we
>> constrain memory/ArrowBufs to 32bits.
>>
>>
>>> Rebase would help if possible.
>>>
>>> I'll try to get to this in the next few days.
>>>
>>> [1]
>>> https://github.com/apache/arrow/blob/95175fe7cb8439eebe6d2f6e0495f551d6864380/java/memory/src/main/java/io/netty/buffer/ArrowBuf.java#L164
>>>
>>> On Fri, Aug 23, 2019 at 4:55 AM Jacques Nadeau <[email protected]>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Aug 23, 2019, 11:49 AM Micah Kornfield <[email protected]>
>>>> wrote:
>>>>
>>>>> I don't think we should couple this discussion with the implementation
>>>>>> of large list, etc since I think those two concepts are independent.
>>>>>
>>>>> I'm still trying to balance in my mind which is a worse experience for
>>>>> consumers of the libraries for these types.  Claiming that Java supports
>>>>> these types but throwing an exception when the Vectors exceed 32-bits or
>>>>> just say they aren't supported until we have 64-bit support in Java.
>>>>>
>>>>
>>>> The vector indexes being limited to 32 bits doesn't limit the
>>>> addressing to 32 bit chunks of memory. For example, you're prime example
>>>> before was image data. Having 2 billion images of 1mb images would still be
>>>> supported without changing the index addressing.
>>>>
>>>>
>>>>
>>>>>
>>>>>> I've asked some others on my team their opinions on the risk here. I
>>>>>> think we should probably review some our more complex vector interactions
>>>>>> and see how the jvm's assembly changes with this kind of change. Using
>>>>>> microbenchmarking is good but I think we also need to see whether we're
>>>>>> constantly inserting additional instructions or if in most cases, this
>>>>>> actually doesn't impact instruction count.
>>>>>
>>>>>
>>>>> Is this something that your team will take on?
>>>>>
>>>>
>>>>
>>>> Yeah, we need to look at this I think.
>>>>
>>>> Do you need a rebased version of the PR or is the existing one
>>>>> sufficient?
>>>>>
>>>>
>>>> Rebase would help if possible.
>>>>
>>>>
>>>>
>>>>> Thanks,
>>>>> Micah
>>>>>
>>>>> On Thu, Aug 22, 2019 at 8:56 PM Jacques Nadeau <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I don't think we should couple this discussion with the
>>>>>> implementation of large list, etc since I think those two concepts are
>>>>>> independent.
>>>>>>
>>>>>> I've asked some others on my team their opinions on the risk here. I
>>>>>> think we should probably review some our more complex vector interactions
>>>>>> and see how the jvm's assembly changes with this kind of change. Using
>>>>>> microbenchmarking is good but I think we also need to see whether we're
>>>>>> constantly inserting additional instructions or if in most cases, this
>>>>>> actually doesn't impact instruction count.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 21, 2019 at 12:18 PM Micah Kornfield <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>>
>>>>>>>> With regards to the reference implementation point. It is a good
>>>>>>>> point. I'm on vacation this week. Unless you're pushing hard on this, 
>>>>>>>> can
>>>>>>>> we pick this up and discuss more next week?
>>>>>>>
>>>>>>>
>>>>>>> Hi Jacques, I hope you had a good rest.  Any more thoughts on the
>>>>>>> reference implementation aspect of this?
>>>>>>>
>>>>>>>
>>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it
>>>>>>>> would be best to decouple this discussion from the release timeline
>>>>>>>> given how many people we have relying on regular releases coming
>>>>>>>> out.
>>>>>>>> We can keep continue making major 0.x releases until we're ready to
>>>>>>>> release 1.0.0.
>>>>>>>
>>>>>>>
>>>>>>> I'm OK with it as long as other stakeholders are. Timed releases are
>>>>>>> the way to go.  As stated on the release thread [1] we need a better
>>>>>>> mechanism to avoid this type of issue arising again.  The release thread
>>>>>>> also had some more discussion on compatibility.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Micah
>>>>>>>
>>>>>>> [1]
>>>>>>> https://lists.apache.org/thread.html/d70feeceaf2570906ade117030b29887af7c77ca5c4a976e6d555920@%3Cdev.arrow.apache.org%3E
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 14, 2019 at 3:23 PM Wes McKinney <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Sun, Aug 11, 2019 at 9:40 PM Micah Kornfield <
>>>>>>>> [email protected]> wrote:
>>>>>>>> >
>>>>>>>> > Hi Wes and Jacques,
>>>>>>>> > See responses below.
>>>>>>>> >
>>>>>>>> > With regards to the reference implementation point. It is a good
>>>>>>>> point. I'm
>>>>>>>> > > on vacation this week. Unless you're pushing hard on this, can
>>>>>>>> we pick this
>>>>>>>> > > up and discuss more next week?
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Sure thing, enjoy your vacation.  I think the only practical
>>>>>>>> implications
>>>>>>>> > are it delays choices around implementing LargeList, LargeBinary,
>>>>>>>> > LargeString in Java, which in turn might push out the 0.15.0
>>>>>>>> release.
>>>>>>>> >
>>>>>>>>
>>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it
>>>>>>>> would be best to decouple this discussion from the release timeline
>>>>>>>> given how many people we have relying on regular releases coming
>>>>>>>> out.
>>>>>>>> We can keep continue making major 0.x releases until we're ready to
>>>>>>>> release 1.0.0.
>>>>>>>>
>>>>>>>> > My stance on this is that I don't know how important it is for
>>>>>>>> Java to
>>>>>>>> > > support vectors over INT32_MAX elements. The use cases enabled
>>>>>>>> by
>>>>>>>> > > having very large arrays seem to be concentrated in the native
>>>>>>>> code
>>>>>>>> > > world (e.g. C/C++/Rust) -- that could just be
>>>>>>>> implementation-centrism
>>>>>>>> > > on my part, though.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > A data point against this view is Spark has done work to
>>>>>>>> eliminate 2GB
>>>>>>>> > memory limits on its block sizes [1].  I don't claim to
>>>>>>>> understand the
>>>>>>>> > implications of this. Bryan might you have any thoughts here?
>>>>>>>> I'm OK with
>>>>>>>> > INT32_MAX, as well, I think we should think about what this means
>>>>>>>> for
>>>>>>>> > adding Large types to Java and implications for reference
>>>>>>>> implementations.
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> > Micah
>>>>>>>> >
>>>>>>>> > [1] https://issues.apache.org/jira/browse/SPARK-6235
>>>>>>>> >
>>>>>>>> > On Sun, Aug 11, 2019 at 6:31 PM Jacques Nadeau <
>>>>>>>> [email protected]> wrote:
>>>>>>>> >
>>>>>>>> > > Hey Micah,
>>>>>>>> > >
>>>>>>>> > > Appreciate the offer on the compiling. The reality is I'm more
>>>>>>>> concerned
>>>>>>>> > > about the unknowns than the compiling issue itself. Any time
>>>>>>>> you've been
>>>>>>>> > > tuning for a while, changing something like this could be
>>>>>>>> totally fine or
>>>>>>>> > > cause a couple of major issues. For example, we've done a very
>>>>>>>> large amount
>>>>>>>> > > of work reducing heap memory footprint of the vectors. Are
>>>>>>>> target is to
>>>>>>>> > > actually get it down to 24 bytes per ArrowBuf and 24 bytes heap
>>>>>>>> per vector
>>>>>>>> > > (not including arrow bufs).
>>>>>>>> > >
>>>>>>>> > > With regards to the reference implementation point. It is a
>>>>>>>> good point.
>>>>>>>> > > I'm on vacation this week. Unless you're pushing hard on this,
>>>>>>>> can we pick
>>>>>>>> > > this up and discuss more next week?
>>>>>>>> > >
>>>>>>>> > > thanks,
>>>>>>>> > > Jacques
>>>>>>>> > >
>>>>>>>> > > On Sat, Aug 10, 2019 at 7:39 PM Micah Kornfield <
>>>>>>>> [email protected]>
>>>>>>>> > > wrote:
>>>>>>>> > >
>>>>>>>> > >> Hi Jacques,
>>>>>>>> > >> I definitely understand these concerns and this change is
>>>>>>>> risky because it
>>>>>>>> > >> is so large.  Perhaps, creating a new hierarchy, might be the
>>>>>>>> cleanest way
>>>>>>>> > >> of dealing with this.  This could have other benefits like
>>>>>>>> cleaning up
>>>>>>>> > >> some
>>>>>>>> > >> cruft around dictionary encode and "orphaned" method.   Per
>>>>>>>> past e-mail
>>>>>>>> > >> threads I agree it is beneficial to have 2 separate reference
>>>>>>>> > >> implementations that can communicate fully, and my intent here
>>>>>>>> was to
>>>>>>>> > >> close
>>>>>>>> > >> that gap.
>>>>>>>> > >>
>>>>>>>> > >> Trying to
>>>>>>>> > >> > determine the ramifications of these changes would be
>>>>>>>> challenging and
>>>>>>>> > >> time
>>>>>>>> > >> > consuming against all the different ways we interact with
>>>>>>>> the Arrow Java
>>>>>>>> > >> > library.
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> > >> Understood.  I took a quick look at Dremio-OSS it seems like
>>>>>>>> it has a
>>>>>>>> > >> simple java build system?  If it is helpful, I can try to get
>>>>>>>> a fork
>>>>>>>> > >> running that at least compiles against this PR.  My plan would
>>>>>>>> be to cast
>>>>>>>> > >> any place that was changed to return a long back to an int, so
>>>>>>>> in essence
>>>>>>>> > >> the Dremio algorithms would reman 32-bit implementations.
>>>>>>>> > >>
>>>>>>>> > >> I don't  have the infrastructure to test this change properly
>>>>>>>> from a
>>>>>>>> > >> distributed systems perspective, so it would still take some
>>>>>>>> time from
>>>>>>>> > >> Dremio to validate for regressions.
>>>>>>>> > >>
>>>>>>>> > >> I'm not saying I'm against this but want to make sure we've
>>>>>>>> > >> > explored all less disruptive options before considering
>>>>>>>> changing
>>>>>>>> > >> something
>>>>>>>> > >> > this fundamental (especially when I generally hold the view
>>>>>>>> that large
>>>>>>>> > >> cell
>>>>>>>> > >> > counts against massive contiguous memory is an anti pattern
>>>>>>>> to scalable
>>>>>>>> > >> > analytical processing--purely subjective of course).
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> > >> I'm open to other ideas here, as well. I don't think it is out
>>>>>>>> of the
>>>>>>>> > >> question to leave the Java implementation as 32-bit, but if we
>>>>>>>> do, then I
>>>>>>>> > >> think we should consider a different strategy for reference
>>>>>>>> > >> implementations.
>>>>>>>> > >>
>>>>>>>> > >> Thanks,
>>>>>>>> > >> Micah
>>>>>>>> > >>
>>>>>>>> > >> On Sat, Aug 10, 2019 at 5:09 PM Jacques Nadeau <
>>>>>>>> [email protected]>
>>>>>>>> > >> wrote:
>>>>>>>> > >>
>>>>>>>> > >> > Hey Micah, I didn't have a particular path in mind. Was
>>>>>>>> thinking more
>>>>>>>> > >> along
>>>>>>>> > >> > the lines of extra methods as opposed to separate classes.
>>>>>>>> > >> >
>>>>>>>> > >> > Arrow hasn't historically been a place where we're writing
>>>>>>>> algorithms in
>>>>>>>> > >> > Java so the fact that they aren't there doesn't mean they
>>>>>>>> don't exist.
>>>>>>>> > >> We
>>>>>>>> > >> > have a large amount of code that depends on the current
>>>>>>>> behavior that is
>>>>>>>> > >> > deployed in hundreds of customer clusters (you can peruse
>>>>>>>> our dremio
>>>>>>>> > >> repo
>>>>>>>> > >> > to see how extensively we leverage Arrow if interested).
>>>>>>>> Trying to
>>>>>>>> > >> > determine the ramifications of these changes would be
>>>>>>>> challenging and
>>>>>>>> > >> time
>>>>>>>> > >> > consuming against all the different ways we interact with
>>>>>>>> the Arrow Java
>>>>>>>> > >> > library. I'm not saying I'm against this but want to make
>>>>>>>> sure we've
>>>>>>>> > >> > explored all less disruptive options before considering
>>>>>>>> changing
>>>>>>>> > >> something
>>>>>>>> > >> > this fundamental (especially when I generally hold the view
>>>>>>>> that large
>>>>>>>> > >> cell
>>>>>>>> > >> > counts against massive contiguous memory is an anti pattern
>>>>>>>> to scalable
>>>>>>>> > >> > analytical processing--purely subjective of course).
>>>>>>>> > >> >
>>>>>>>> > >> > On Sat, Aug 10, 2019, 4:17 PM Micah Kornfield <
>>>>>>>> [email protected]>
>>>>>>>> > >> > wrote:
>>>>>>>> > >> >
>>>>>>>> > >> > > Hi Jacques,
>>>>>>>> > >> > > What avenue were you thinking for supporting both paths?
>>>>>>>>  I didn't
>>>>>>>> > >> want
>>>>>>>> > >> > > to pursue a different class hierarchy, because I felt like
>>>>>>>> that would
>>>>>>>> > >> > > effectively fork the code base, but that is potentially an
>>>>>>>> option that
>>>>>>>> > >> > > would allow us to have a complete reference implementation
>>>>>>>> in Java
>>>>>>>> > >> that
>>>>>>>> > >> > can
>>>>>>>> > >> > > fully interact with C++, without major changes to this
>>>>>>>> code.
>>>>>>>> > >> > >
>>>>>>>> > >> > > For supporting both APIs on the same classes/interfaces, I
>>>>>>>> think they
>>>>>>>> > >> > > roughly fall into three categories, changes to input
>>>>>>>> parameters,
>>>>>>>> > >> changes
>>>>>>>> > >> > to
>>>>>>>> > >> > > output parameters and algorithm changes.
>>>>>>>> > >> > >
>>>>>>>> > >> > > For inputs, changing from int to long is essentially a
>>>>>>>> no-op from the
>>>>>>>> > >> > > compiler perspective.  From the limited micro-benchmarking
>>>>>>>> this also
>>>>>>>> > >> > > doesn't seem to have a performance impact.  So we could
>>>>>>>> keep two
>>>>>>>> > >> versions
>>>>>>>> > >> > > of the methods that only differ on inputs, but it is not
>>>>>>>> clear what
>>>>>>>> > >> the
>>>>>>>> > >> > > value of that would be.
>>>>>>>> > >> > >
>>>>>>>> > >> > > For outputs, we can't support methods "long getLength()"
>>>>>>>> and "int
>>>>>>>> > >> > > getLength()" in the same class, so we would be forced into
>>>>>>>> something
>>>>>>>> > >> like
>>>>>>>> > >> > > "long getLength(boolean dummy)" which I think is a less
>>>>>>>> desirable.
>>>>>>>> > >> > >
>>>>>>>> > >> > > For algorithm changes, there did not appear to be too many
>>>>>>>> places
>>>>>>>> > >> where
>>>>>>>> > >> > we
>>>>>>>> > >> > > actually loop over all elements (it is quite possible I
>>>>>>>> missed
>>>>>>>> > >> something
>>>>>>>> > >> > > here), the ones that I did find I was able to mitigate
>>>>>>>> performance
>>>>>>>> > >> > > penalties as noted above.  Some of the current
>>>>>>>> implementation will
>>>>>>>> > >> get a
>>>>>>>> > >> > > lot slower for "large arrays", but we can likely fix those
>>>>>>>> later or in
>>>>>>>> > >> > this
>>>>>>>> > >> > > PR with a nested while loop instead of 2 for loops.
>>>>>>>> > >> > >
>>>>>>>> > >> > > Thanks,
>>>>>>>> > >> > > Micah
>>>>>>>> > >> > >
>>>>>>>> > >> > >
>>>>>>>> > >> > > On Saturday, August 10, 2019, Jacques Nadeau <
>>>>>>>> [email protected]>
>>>>>>>> > >> wrote:
>>>>>>>> > >> > >
>>>>>>>> > >> > >> This is a pretty massive change to the apis. I wonder how
>>>>>>>> nasty it
>>>>>>>> > >> would
>>>>>>>> > >> > >> be to just support both paths. Have you evaluated how
>>>>>>>> complex that
>>>>>>>> > >> > would be?
>>>>>>>> > >> > >>
>>>>>>>> > >> > >> On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield <
>>>>>>>> > >> [email protected]>
>>>>>>>> > >> > >> wrote:
>>>>>>>> > >> > >>
>>>>>>>> > >> > >>> After more investigation, it looks like Float8Benchmarks
>>>>>>>> at least
>>>>>>>> > >> on my
>>>>>>>> > >> > >>> machine are within the range of noise.
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> For BitVectorHelper I pushed a new commit [1], seems to
>>>>>>>> bring the
>>>>>>>> > >> > >>> BitVectorHelper benchmarks back inline (and even with
>>>>>>>> some
>>>>>>>> > >> improvement
>>>>>>>> > >> > >>> for
>>>>>>>> > >> > >>> getNullCountBenchmark).
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> Benchmark                                        Mode
>>>>>>>> Cnt   Score
>>>>>>>> > >> > >>>  Error
>>>>>>>> > >> > >>>  Units
>>>>>>>> > >> > >>> BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt
>>>>>>>> 5   3.821 ±
>>>>>>>> > >> > >>> 0.031
>>>>>>>> > >> > >>>  ns/op
>>>>>>>> > >> > >>> BitVectorHelperBenchmarks.getNullCountBenchmark  avgt
>>>>>>>> 5  14.884 ±
>>>>>>>> > >> > >>> 0.141
>>>>>>>> > >> > >>>  ns/op
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> I applied the same pattern to other loops that I could
>>>>>>>> find, and for
>>>>>>>> > >> > any
>>>>>>>> > >> > >>> "for (long" loop on the critical path, I broke it up
>>>>>>>> into two loops.
>>>>>>>> > >> > the
>>>>>>>> > >> > >>> first loop does iteration by integer, the second
>>>>>>>> finishes off for
>>>>>>>> > >> any
>>>>>>>> > >> > >>> long
>>>>>>>> > >> > >>> values.  As a side note it seems like optimization for
>>>>>>>> loops using
>>>>>>>> > >> long
>>>>>>>> > >> > >>> counters at least have a semi-recent open bug for the
>>>>>>>> JVM [2]
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> Thanks,
>>>>>>>> > >> > >>> Micah
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> [1]
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>>
>>>>>>>> > >> >
>>>>>>>> > >>
>>>>>>>> https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797
>>>>>>>> > >> > >>> [2] https://bugs.openjdk.java.net/browse/JDK-8223051
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield <
>>>>>>>> > >> [email protected]>
>>>>>>>> > >> > >>> wrote:
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> > Indeed, the BoundChecking and CheckNullForGet
>>>>>>>> variables can make a
>>>>>>>> > >> > big
>>>>>>>> > >> > >>> > difference.  I didn't initially run the benchmarks
>>>>>>>> with these
>>>>>>>> > >> turned
>>>>>>>> > >> > on
>>>>>>>> > >> > >>> > (you can see the result from above with
>>>>>>>> Float8Benchmarks).  Here
>>>>>>>> > >> are
>>>>>>>> > >> > >>> new
>>>>>>>> > >> > >>> > numbers including with the flags enabled.  It looks
>>>>>>>> like using
>>>>>>>> > >> longs
>>>>>>>> > >> > >>> might
>>>>>>>> > >> > >>> > be a little bit slower, I'll see what I can do to
>>>>>>>> mitigate this.
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Ravindra also volunteered to try to benchmark the
>>>>>>>> changes with
>>>>>>>> > >> > Dremio's
>>>>>>>> > >> > >>> > code on today's sync call.
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > New
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Benchmark                                        Mode
>>>>>>>> Cnt   Score
>>>>>>>> > >> > >>>  Error
>>>>>>>> > >> > >>> > Units
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt
>>>>>>>>   5
>>>>>>>> > >>  4.176 ±
>>>>>>>> > >> > >>> 1.292
>>>>>>>> > >> > >>> > ns/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt
>>>>>>>>   5
>>>>>>>> > >> 26.102 ±
>>>>>>>> > >> > >>> 0.700
>>>>>>>> > >> > >>> > ns/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5  7.398
>>>>>>>> ± 0.084
>>>>>>>> > >> us/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.711
>>>>>>>> ± 0.057
>>>>>>>> > >> us/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > old
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt
>>>>>>>>   5
>>>>>>>> > >>  3.828 ±
>>>>>>>> > >> > >>> 0.030
>>>>>>>> > >> > >>> > ns/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt
>>>>>>>>   5
>>>>>>>> > >> 20.611 ±
>>>>>>>> > >> > >>> 0.188
>>>>>>>> > >> > >>> > ns/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5  6.597
>>>>>>>> ± 0.462
>>>>>>>> > >> us/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.615
>>>>>>>> ± 0.027
>>>>>>>> > >> us/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya <
>>>>>>>> [email protected]>
>>>>>>>> > >> > wrote:
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> >> Hi Gonzalo,
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>> >> Thanks for sharing the performance results.
>>>>>>>> > >> > >>> >> I am wondering if you have turned off the flag
>>>>>>>> > >> > >>> >> BoundsChecking#BOUNDS_CHECKING_ENABLED.
>>>>>>>> > >> > >>> >> If not, the lower throughput should be expected.
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>> >> Best,
>>>>>>>> > >> > >>> >> Liya Fan
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>> >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield <
>>>>>>>> > >> > >>> [email protected]>
>>>>>>>> > >> > >>> >> wrote:
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>> >>> Hi Gonzalo,
>>>>>>>> > >> > >>> >>> Thank you for the feedback.  I wasn't aware of the
>>>>>>>> JIT
>>>>>>>> > >> > >>> implications.   At
>>>>>>>> > >> > >>> >>> least on the benchmark run they don't seem to have
>>>>>>>> an impact.
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>> If there are other benchmarks that people have that
>>>>>>>> can
>>>>>>>> > >> validate if
>>>>>>>> > >> > >>> this
>>>>>>>> > >> > >>> >>> change will be problematic I would appreciate trying
>>>>>>>> to run them
>>>>>>>> > >> > >>> with the
>>>>>>>> > >> > >>> >>> PR.  I will try to run the ones for zeroing/popcnt
>>>>>>>> tonight to
>>>>>>>> > >> see
>>>>>>>> > >> > if
>>>>>>>> > >> > >>> >>> there
>>>>>>>> > >> > >>> >>> is a change in those.
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>> -Micah
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>> On Wednesday, August 7, 2019, Gonzalo Ortiz
>>>>>>>> Jaureguizar <
>>>>>>>> > >> > >>> >>> [email protected]> wrote:
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>> > I would recommend to take care with this kind of
>>>>>>>> changes.
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> > I didn't try Arrow in more than one year, but by
>>>>>>>> then the
>>>>>>>> > >> > >>> performance
>>>>>>>> > >> > >>> >>> was
>>>>>>>> > >> > >>> >>> > quite bad in comparison with plain byte buffer
>>>>>>>> access
>>>>>>>> > >> > >>> >>> > (see
>>>>>>>> http://git.net/apache-arrow-development/msg02353.html *)
>>>>>>>> > >> > and
>>>>>>>> > >> > >>> >>> > there are several optimizations that the JVM
>>>>>>>> (specifically,
>>>>>>>> > >> C2)
>>>>>>>> > >> > >>> does
>>>>>>>> > >> > >>> >>> not
>>>>>>>> > >> > >>> >>> > apply when dealing with int instead of longs. One
>>>>>>>> of the
>>>>>>>> > >> > >>> >>> > most commons is the loop unrolling and
>>>>>>>> vectorization.
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> > * It doesn't seem the best way to reference an old
>>>>>>>> email on
>>>>>>>> > >> the
>>>>>>>> > >> > >>> list,
>>>>>>>> > >> > >>> >>> but
>>>>>>>> > >> > >>> >>> > it is the only result shown by Google
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (<
>>>>>>>> > >> > [email protected]
>>>>>>>> > >> > >>> >)
>>>>>>>> > >> > >>> >>> > escribió:
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> >> Hi Micah,
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> Thanks for your effort. The performance result
>>>>>>>> looks good.
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> As you indicated, ArrowBuf will take additional
>>>>>>>> 12 bytes (4
>>>>>>>> > >> > bytes
>>>>>>>> > >> > >>> for
>>>>>>>> > >> > >>> >>> each
>>>>>>>> > >> > >>> >>> >> of length, write index, and read index).
>>>>>>>> > >> > >>> >>> >> Similar overheads also exist for vectors like
>>>>>>>> > >> > >>> BaseFixedWidthVector,
>>>>>>>> > >> > >>> >>> >> BaseVariableWidthVector, etc.
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> IMO, such overheads are small enough to justify
>>>>>>>> the change.
>>>>>>>> > >> > >>> >>> >> Let's check if there are other overheads.
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> Best,
>>>>>>>> > >> > >>> >>> >> Liya Fan
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield <
>>>>>>>> > >> > >>> [email protected]
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> >> wrote:
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> > Hi Liya Fan,
>>>>>>>> > >> > >>> >>> >> > Based on the Float8Benchmark there does not
>>>>>>>> seem to be any
>>>>>>>> > >> > >>> >>> meaningful
>>>>>>>> > >> > >>> >>> >> > performance difference on my machine.  At least
>>>>>>>> for me, the
>>>>>>>> > >> > >>> >>> benchmarks
>>>>>>>> > >> > >>> >>> >> are
>>>>>>>> > >> > >>> >>> >> > not stable enough to say one is faster than the
>>>>>>>> other (I've
>>>>>>>> > >> > >>> pasted
>>>>>>>> > >> > >>> >>> >> results
>>>>>>>> > >> > >>> >>> >> > below).  That being said my machine isn't
>>>>>>>> necessarily the
>>>>>>>> > >> most
>>>>>>>> > >> > >>> >>> reliable
>>>>>>>> > >> > >>> >>> >> for
>>>>>>>> > >> > >>> >>> >> > benchmarking.
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > On an intuitive level, this makes sense to me,
>>>>>>>> for the
>>>>>>>> > >> most
>>>>>>>> > >> > >>> part it
>>>>>>>> > >> > >>> >>> >> seems
>>>>>>>> > >> > >>> >>> >> > like the change just moves casting from "int"
>>>>>>>> to "long"
>>>>>>>> > >> > further
>>>>>>>> > >> > >>> up
>>>>>>>> > >> > >>> >>> the
>>>>>>>> > >> > >>> >>> >> > stack  for  "PlatformDepdendent" operations.
>>>>>>>> If there are
>>>>>>>> > >> > other
>>>>>>>> > >> > >>> >>> >> benchmarks
>>>>>>>> > >> > >>> >>> >> > that you think are worth running let me know.
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > One downside performance wise I think for his
>>>>>>>> change is it
>>>>>>>> > >> > >>> >>> increases the
>>>>>>>> > >> > >>> >>> >> > size of ArrowBuf objects, which I suppose could
>>>>>>>> influence
>>>>>>>> > >> > cache
>>>>>>>> > >> > >>> >>> misses
>>>>>>>> > >> > >>> >>> >> at
>>>>>>>> > >> > >>> >>> >> > some level or increase the size of call-stacks,
>>>>>>>> but this
>>>>>>>> > >> > doesn't
>>>>>>>> > >> > >>> >>> seem to
>>>>>>>> > >> > >>> >>> >> > show up in the benchmark..
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > Thanks,
>>>>>>>> > >> > >>> >>> >> > Micah
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > Sample benchmark numbers:
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > [New Code]
>>>>>>>> > >> > >>> >>> >> > Benchmark                            Mode  Cnt
>>>>>>>>  Score
>>>>>>>> > >>  Error
>>>>>>>> > >> > >>> >>> Units
>>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5
>>>>>>>> 15.441 ±
>>>>>>>> > >> 0.469
>>>>>>>> > >> > >>> >>> us/op
>>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5
>>>>>>>> 14.057 ±
>>>>>>>> > >> 0.115
>>>>>>>> > >> > >>> >>> us/op
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > [Old code]
>>>>>>>> > >> > >>> >>> >> > Benchmark                            Mode  Cnt
>>>>>>>>  Score
>>>>>>>> > >>  Error
>>>>>>>> > >> > >>> >>> Units
>>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5
>>>>>>>> 16.248 ±
>>>>>>>> > >> 1.409
>>>>>>>> > >> > >>> >>> us/op
>>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5
>>>>>>>> 14.150 ±
>>>>>>>> > >> 0.084
>>>>>>>> > >> > >>> >>> us/op
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya <
>>>>>>>> > >> [email protected]
>>>>>>>> > >> > >
>>>>>>>> > >> > >>> >>> wrote:
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> >> Hi Micah,
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >> Thanks a lot for doing this.
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >> I am a little concerned about if there is any
>>>>>>>> negative
>>>>>>>> > >> > >>> performance
>>>>>>>> > >> > >>> >>> >> impact
>>>>>>>> > >> > >>> >>> >> >> on the current 32-bit-length based
>>>>>>>> applications.
>>>>>>>> > >> > >>> >>> >> >> Can we do some performance comparison on our
>>>>>>>> existing
>>>>>>>> > >> > >>> benchmarks?
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >> Best,
>>>>>>>> > >> > >>> >>> >> >> Liya Fan
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield
>>>>>>>> <
>>>>>>>> > >> > >>> >>> [email protected]>
>>>>>>>> > >> > >>> >>> >> >> wrote:
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >>> There have been some previous discussions on
>>>>>>>> the mailing
>>>>>>>> > >> > about
>>>>>>>> > >> > >>> >>> >> supporting
>>>>>>>> > >> > >>> >>> >> >>> 64-bit lengths for  Java ValueVectors (this
>>>>>>>> is what the
>>>>>>>> > >> IPC
>>>>>>>> > >> > >>> >>> >> specification
>>>>>>>> > >> > >>> >>> >> >>> and C++ support).  I created a PR [1] that
>>>>>>>> changes all
>>>>>>>> > >> APIs
>>>>>>>> > >> > >>> that I
>>>>>>>> > >> > >>> >>> >> could
>>>>>>>> > >> > >>> >>> >> >>> find that take an index to take an "long"
>>>>>>>> instead of an
>>>>>>>> > >> > "int"
>>>>>>>> > >> > >>> (and
>>>>>>>> > >> > >>> >>> >> >>> similarly change "size/rowcount" APIs).
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> It is a big change, so I think it is worth
>>>>>>>> discussing if
>>>>>>>> > >> it
>>>>>>>> > >> > is
>>>>>>>> > >> > >>> >>> >> something
>>>>>>>> > >> > >>> >>> >> >>> we
>>>>>>>> > >> > >>> >>> >> >>> still want to move forward with.  It would be
>>>>>>>> nice to
>>>>>>>> > >> come
>>>>>>>> > >> > to
>>>>>>>> > >> > >>> a
>>>>>>>> > >> > >>> >>> >> >>> conclusion
>>>>>>>> > >> > >>> >>> >> >>> quickly, ideally in the next few days, to
>>>>>>>> avoid a lot of
>>>>>>>> > >> > merge
>>>>>>>> > >> > >>> >>> >> conflicts.
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> The reason I did this work now is the C++
>>>>>>>> implementation
>>>>>>>> > >> has
>>>>>>>> > >> > >>> added
>>>>>>>> > >> > >>> >>> >> >>> support
>>>>>>>> > >> > >>> >>> >> >>> for LargeList, LargeBinary and LargeString
>>>>>>>> arrays and
>>>>>>>> > >> based
>>>>>>>> > >> > on
>>>>>>>> > >> > >>> >>> prior
>>>>>>>> > >> > >>> >>> >> >>> discussions we need to have similar support
>>>>>>>> in Java
>>>>>>>> > >> before
>>>>>>>> > >> > our
>>>>>>>> > >> > >>> >>> next
>>>>>>>> > >> > >>> >>> >> >>> release. Support 64-bit indexes means we can
>>>>>>>> have full
>>>>>>>> > >> > >>> >>> compatibility
>>>>>>>> > >> > >>> >>> >> and
>>>>>>>> > >> > >>> >>> >> >>> make the most use of the types in Java.
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> Look forward to hearing feedback.
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> Thanks,
>>>>>>>> > >> > >>> >>> >> >>> Micah
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> [1] https://github.com/apache/arrow/pull/5020
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>
>>>>>>>> > >> >
>>>>>>>> > >>
>>>>>>>> > >
>>>>>>>>
>>>>>>>

Re: [Discuss][Java] 64-bit lengths for ValueVectors

Reply via email to