Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

Micah Kornfield Sun, 13 Sep 2020 21:58:18 -0700

If I read the responses so far it seems like the following might be a good
compromise/summary:


1. It does not seem too invasive to support native endianness in
implementation libraries.  As long as there is appropriate performance
testing and CI infrastructure to demonstrate the changes work.
2. It is up to implementation maintainers if they wish to accept PRs that
handle byte swapping between different architectures.  (Right now it sounds
like C++ is potentially OK with it and for Java at least Jacques is opposed
to it?

Testing changes that break big-endian can be a potential drag on developer
productivity but there are methods to run locally (at least on more recent
OSes).

Thoughts?

Thanks,
Micah

On Mon, Aug 31, 2020 at 7:08 PM Fan Liya <[email protected]> wrote:

> Thank Kazuaki for the survey and thank Micah for starting the discussion.
>
> I do not oppose supporting BE. In fact, I am in general optimistic about
> the performance impact (for Java).
> IMO, this is going to be a painful way (many byte order related problems
> are tricky to debug), so I hope we can make it short.
>
> It is good that someone is willing to take this on, and I would like to
> provide help if needed.
>
> Best,
> Liya Fan
>
>
>
> On Tue, Sep 1, 2020 at 7:25 AM Bryan Cutler <[email protected]> wrote:
>
> > I also think this would be a worthwhile addition and help the project
> > expand in more areas. Beyond the Apache Spark optimization use case,
> having
> > Arrow interoperability with the Python data science stack on BE would be
> > very useful. I have looked at the remaining PRs for Java and they seem
> > pretty minimal and straightforward. Implementing the equivalent record
> > batch swapping as done in C++ at [1] would be a little more involved, but
> > still reasonable. Would it make sense to create a branch to apply all
> > remaining changes with CI to get a better picture before deciding on
> > bringing into master branch?  I could help out with shepherding this
> effort
> > and assist in maintenance, if we decide to accept.
> >
> > Bryan
> >
> > [1] https://github.com/apache/arrow/pull/7507
> >
> > On Mon, Aug 31, 2020 at 1:42 PM Wes McKinney <[email protected]>
> wrote:
> >
> > > I think it's well within the right of an implementation to reject BE
> > > data (or non-native-endian), but if an implementation chooses to
> > > implement and maintain the endianness conversions, then it does not
> > > seem so bad to me.
> > >
> > > On Mon, Aug 31, 2020 at 3:33 PM Jacques Nadeau <[email protected]>
> > wrote:
> > > >
> > > > And yes, for those of you looking closely, I commented on ARROW-245
> > when
> > > it
> > > > was committed. I just forgot about it.
> > > >
> > > > It looks like I had mostly the same concerns then that I do now :)
> Now
> > > I'm
> > > > just more worried about format sprawl...
> > > >
> > > > On Mon, Aug 31, 2020 at 1:30 PM Jacques Nadeau <[email protected]>
> > > wrote:
> > > >
> > > > > What do you mean?  The Endianness field (a Big|Little enum) was
> > added 4
> > > > >> years ago:
> > > > >> https://issues.apache.org/jira/browse/ARROW-245
> > > > >
> > > > >
> > > > > I didn't realize that was done, my bad. Good example of format rot
> > > from my
> > > > > pov.
> > > > >
> > > > >
> > > > >
> > >
> >
>

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

Reply via email to