If I read the responses so far it seems like the following might be a good compromise/summary:
1. It does not seem too invasive to support native endianness in implementation libraries. As long as there is appropriate performance testing and CI infrastructure to demonstrate the changes work. 2. It is up to implementation maintainers if they wish to accept PRs that handle byte swapping between different architectures. (Right now it sounds like C++ is potentially OK with it and for Java at least Jacques is opposed to it? Testing changes that break big-endian can be a potential drag on developer productivity but there are methods to run locally (at least on more recent OSes). Thoughts? Thanks, Micah On Mon, Aug 31, 2020 at 7:08 PM Fan Liya <liya.fa...@gmail.com> wrote: > Thank Kazuaki for the survey and thank Micah for starting the discussion. > > I do not oppose supporting BE. In fact, I am in general optimistic about > the performance impact (for Java). > IMO, this is going to be a painful way (many byte order related problems > are tricky to debug), so I hope we can make it short. > > It is good that someone is willing to take this on, and I would like to > provide help if needed. > > Best, > Liya Fan > > > > On Tue, Sep 1, 2020 at 7:25 AM Bryan Cutler <cutl...@gmail.com> wrote: > > > I also think this would be a worthwhile addition and help the project > > expand in more areas. Beyond the Apache Spark optimization use case, > having > > Arrow interoperability with the Python data science stack on BE would be > > very useful. I have looked at the remaining PRs for Java and they seem > > pretty minimal and straightforward. Implementing the equivalent record > > batch swapping as done in C++ at [1] would be a little more involved, but > > still reasonable. Would it make sense to create a branch to apply all > > remaining changes with CI to get a better picture before deciding on > > bringing into master branch? I could help out with shepherding this > effort > > and assist in maintenance, if we decide to accept. > > > > Bryan > > > > [1] https://github.com/apache/arrow/pull/7507 > > > > On Mon, Aug 31, 2020 at 1:42 PM Wes McKinney <wesmck...@gmail.com> > wrote: > > > > > I think it's well within the right of an implementation to reject BE > > > data (or non-native-endian), but if an implementation chooses to > > > implement and maintain the endianness conversions, then it does not > > > seem so bad to me. > > > > > > On Mon, Aug 31, 2020 at 3:33 PM Jacques Nadeau <jacq...@apache.org> > > wrote: > > > > > > > > And yes, for those of you looking closely, I commented on ARROW-245 > > when > > > it > > > > was committed. I just forgot about it. > > > > > > > > It looks like I had mostly the same concerns then that I do now :) > Now > > > I'm > > > > just more worried about format sprawl... > > > > > > > > On Mon, Aug 31, 2020 at 1:30 PM Jacques Nadeau <jacq...@apache.org> > > > wrote: > > > > > > > > > What do you mean? The Endianness field (a Big|Little enum) was > > added 4 > > > > >> years ago: > > > > >> https://issues.apache.org/jira/browse/ARROW-245 > > > > > > > > > > > > > > > I didn't realize that was done, my bad. Good example of format rot > > > from my > > > > > pov. > > > > > > > > > > > > > > > > > > > > >