Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

Micah Kornfield Mon, 21 Sep 2020 22:16:41 -0700

I wanted to give this thread a bump, does the proposal I made below sound
reasonable?


On Sun, Sep 13, 2020 at 9:57 PM Micah Kornfield <[email protected]>
wrote:

> If I read the responses so far it seems like the following might be a good
> compromise/summary:
>
> 1. It does not seem too invasive to support native endianness in
> implementation libraries.  As long as there is appropriate performance
> testing and CI infrastructure to demonstrate the changes work.
> 2. It is up to implementation maintainers if they wish to accept PRs that
> handle byte swapping between different architectures.  (Right now it sounds
> like C++ is potentially OK with it and for Java at least Jacques is opposed
> to it?
>
> Testing changes that break big-endian can be a potential drag on developer
> productivity but there are methods to run locally (at least on more recent
> OSes).
>
> Thoughts?
>
> Thanks,
> Micah
>
> On Mon, Aug 31, 2020 at 7:08 PM Fan Liya <[email protected]> wrote:
>
>> Thank Kazuaki for the survey and thank Micah for starting the discussion.
>>
>> I do not oppose supporting BE. In fact, I am in general optimistic about
>> the performance impact (for Java).
>> IMO, this is going to be a painful way (many byte order related problems
>> are tricky to debug), so I hope we can make it short.
>>
>> It is good that someone is willing to take this on, and I would like to
>> provide help if needed.
>>
>> Best,
>> Liya Fan
>>
>>
>>
>> On Tue, Sep 1, 2020 at 7:25 AM Bryan Cutler <[email protected]> wrote:
>>
>> > I also think this would be a worthwhile addition and help the project
>> > expand in more areas. Beyond the Apache Spark optimization use case,
>> having
>> > Arrow interoperability with the Python data science stack on BE would be
>> > very useful. I have looked at the remaining PRs for Java and they seem
>> > pretty minimal and straightforward. Implementing the equivalent record
>> > batch swapping as done in C++ at [1] would be a little more involved,
>> but
>> > still reasonable. Would it make sense to create a branch to apply all
>> > remaining changes with CI to get a better picture before deciding on
>> > bringing into master branch?  I could help out with shepherding this
>> effort
>> > and assist in maintenance, if we decide to accept.
>> >
>> > Bryan
>> >
>> > [1] https://github.com/apache/arrow/pull/7507
>> >
>> > On Mon, Aug 31, 2020 at 1:42 PM Wes McKinney <[email protected]>
>> wrote:
>> >
>> > > I think it's well within the right of an implementation to reject BE
>> > > data (or non-native-endian), but if an implementation chooses to
>> > > implement and maintain the endianness conversions, then it does not
>> > > seem so bad to me.
>> > >
>> > > On Mon, Aug 31, 2020 at 3:33 PM Jacques Nadeau <[email protected]>
>> > wrote:
>> > > >
>> > > > And yes, for those of you looking closely, I commented on ARROW-245
>> > when
>> > > it
>> > > > was committed. I just forgot about it.
>> > > >
>> > > > It looks like I had mostly the same concerns then that I do now :)
>> Now
>> > > I'm
>> > > > just more worried about format sprawl...
>> > > >
>> > > > On Mon, Aug 31, 2020 at 1:30 PM Jacques Nadeau <[email protected]>
>> > > wrote:
>> > > >
>> > > > > What do you mean?  The Endianness field (a Big|Little enum) was
>> > added 4
>> > > > >> years ago:
>> > > > >> https://issues.apache.org/jira/browse/ARROW-245
>> > > > >
>> > > > >
>> > > > > I didn't realize that was done, my bad. Good example of format rot
>> > > from my
>> > > > > pov.
>> > > > >
>> > > > >
>> > > > >
>> > >
>> >
>>
>

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

Reply via email to