[DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

Micah Kornfield Tue, 25 Aug 2020 21:33:25 -0700

I'm expanding the scope of this thread since it looks like work has also
started for making golang support BigEndian architectures.


I think as a community we should come to a consensus on whether we want to
support Big Endian architectures in general.  I don't think it is a good
outcome if some implementations accept PRs for Big Endian fixes and some
don't.

But maybe this is OK with others?

My current opinion on the matter is that we should support it under the
following conditions:

1.  As long as there is CI in place to catch regressions (right now I think
the CI is fairly unreliable?)
2.  No degradation in performance for little-endian architectures (verified
by additional micro benchmarks)
3.  Not a large amount of invasive code to distinguish between platforms.

Kazuaki Ishizaki I asked question previously, but could you give some data
points around:
1.  The current state of C++ support (how much code needed to change)?
2.  How many more PRs you expect to need for Java (and approximate size)?

I think this would help myself and others in the decision making process.

Thanks,
Micah

On Tue, Aug 18, 2020 at 9:15 AM Micah Kornfield <[email protected]>
wrote:

> My thoughts on the points raised so far:
>
> * Does supporting Big Endian increase the reach of Arrow by a lot?
>
> Probably not a significant amount, but it does provide one more avenue of
> adoption.
>
> * Does it increase code complexity?
>
> Yes.  I agree this is a concern.  The PR in question did not seem too bad
> to me but this is subjective.  I think the remaining question is how many
> more places need to be fixed up in the code base and how invasive are the
> changes.  In C++ IIUC it turned out to be a relatively small number of
> places.
>
> Kazuaki Ishizaki have you been able to get the Java implementation working
> fully locally?  How many additional PRs will be needed and what do
> they look like (I think there already a few more in the queue)?
>
> * Will it introduce performance regressions?
>
> If done properly I suspect no, but I think if we continue with BigEndian
> support the places that need to be touched should have benchmarks added to
> confirm this (including for PRs already merged).
>
> Thanks,
> Micah
>
> On Sun, Aug 16, 2020 at 7:37 PM Fan Liya <[email protected]> wrote:
>
>> Thank Kazuaki Ishizaki for working on this.
>> IMO, supporting the big-endian should be a large change, as in many
>> places of the code base, we have implicitly assumed the little-endian
>> platform (e.g.
>> https://github.com/apache/arrow/blob/master/java/memory/memory-core/src/main/java/org/apache/arrow/memory/util/ByteFunctionHelpers.java
>> ).
>> Supporting the big-endian platform may introduce branches in such places
>> (or virtual calls) which will affect the performance.
>> So it would be helpful to evaluate the performance impact.
>>
>> Best,
>> Liya Fan
>>
>>
>> On Sat, Aug 15, 2020 at 7:54 AM Jacques Nadeau <[email protected]>
>> wrote:
>>
>>> Hey Micah, thanks for starting the discussion.
>>>
>>> I just skimmed that thread and it isn't entirely clear that there was a
>>> conclusion that the overhead was worth it. I think everybody agrees that
>>> it
>>> would be nice to have the code work on both platforms. On the flipside,
>>> the
>>> code noise for a rare case makes the cost-benefit questionable.
>>>
>>> In the Java code, we wrote the code to explicitly disallow big endian
>>> platforms and put preconditions checks in. I definitely think if we want
>>> to
>>> support this, it should be done holistically across the code with
>>> appropriate test plan (both functional and perf).
>>>
>>> To me, the question is really about how many use cases are blocked by
>>> this.
>>> I'm not sure I've heard anyone say that the limiting factor to leveraging
>>> Java Arrow was the block on endianess. Keep in mind that until very
>>> recently, using any Arrow Java code would throw a preconditions check
>>> before you could even get started on big-endian and I don't think we've
>>> seen a bunch of messages on that exception. Adding if conditions
>>> throughout
>>> the codebase like this patch: [1] isn't exactly awesome and it can also
>>> risk performance impacts depending on how carefully it is done.
>>>
>>> If there isn't a preponderance of evidence of many users being blocked by
>>> this capability, I don't think we should accept the code. We already
>>> have a
>>> backlog of items that we need to address just ensure existing use cases
>>> work well. Expanding to new use cases that there is no clear demand for
>>> will likely just increase code development cost at little benefit.
>>>
>>> What do others think?
>>>
>>> [1] https://github.com/apache/arrow/pull/7923#issuecomment-674311119
>>>
>>> On Fri, Aug 14, 2020 at 4:36 PM Micah Kornfield <[email protected]>
>>> wrote:
>>>
>>> > Kazuaki Ishizak has started working on Big Endian support in Java
>>> > (including setting up CI for it).  Thank you!
>>> >
>>> > We previously discussed support for Big Endian architectures in C++
>>> [1] and
>>> > generally agreed that it was a reasonable thing to do.
>>> >
>>> > Similar to C++ I think as long as we have a working CI setup it is
>>> > reasonable for Java to support Big Endian machines.
>>> >
>>> > But I think there might be differing opinions so it is worth a
>>> discussion
>>> > to see if there are technical blockers or other reasons for not
>>> supporting
>>> > Big Endian architectures in the existing java implementation.
>>> >
>>> > Thanks,
>>> > Micah
>>> >
>>> >
>>> > [1]
>>> >
>>> >
>>> https://lists.apache.org/thread.html/rcae745f1d848981bb5e8dddacfc4554641aba62e3c949b96bfd8b019%40%3Cdev.arrow.apache.org%3E
>>> >
>>>
>>

[DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

Reply via email to