I wanted to give this thread a bump, does the proposal I made below sound reasonable?
On Sun, Sep 13, 2020 at 9:57 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > If I read the responses so far it seems like the following might be a good > compromise/summary: > > 1. It does not seem too invasive to support native endianness in > implementation libraries. As long as there is appropriate performance > testing and CI infrastructure to demonstrate the changes work. > 2. It is up to implementation maintainers if they wish to accept PRs that > handle byte swapping between different architectures. (Right now it sounds > like C++ is potentially OK with it and for Java at least Jacques is opposed > to it? > > Testing changes that break big-endian can be a potential drag on developer > productivity but there are methods to run locally (at least on more recent > OSes). > > Thoughts? > > Thanks, > Micah > > On Mon, Aug 31, 2020 at 7:08 PM Fan Liya <liya.fa...@gmail.com> wrote: > >> Thank Kazuaki for the survey and thank Micah for starting the discussion. >> >> I do not oppose supporting BE. In fact, I am in general optimistic about >> the performance impact (for Java). >> IMO, this is going to be a painful way (many byte order related problems >> are tricky to debug), so I hope we can make it short. >> >> It is good that someone is willing to take this on, and I would like to >> provide help if needed. >> >> Best, >> Liya Fan >> >> >> >> On Tue, Sep 1, 2020 at 7:25 AM Bryan Cutler <cutl...@gmail.com> wrote: >> >> > I also think this would be a worthwhile addition and help the project >> > expand in more areas. Beyond the Apache Spark optimization use case, >> having >> > Arrow interoperability with the Python data science stack on BE would be >> > very useful. I have looked at the remaining PRs for Java and they seem >> > pretty minimal and straightforward. Implementing the equivalent record >> > batch swapping as done in C++ at [1] would be a little more involved, >> but >> > still reasonable. Would it make sense to create a branch to apply all >> > remaining changes with CI to get a better picture before deciding on >> > bringing into master branch? I could help out with shepherding this >> effort >> > and assist in maintenance, if we decide to accept. >> > >> > Bryan >> > >> > [1] https://github.com/apache/arrow/pull/7507 >> > >> > On Mon, Aug 31, 2020 at 1:42 PM Wes McKinney <wesmck...@gmail.com> >> wrote: >> > >> > > I think it's well within the right of an implementation to reject BE >> > > data (or non-native-endian), but if an implementation chooses to >> > > implement and maintain the endianness conversions, then it does not >> > > seem so bad to me. >> > > >> > > On Mon, Aug 31, 2020 at 3:33 PM Jacques Nadeau <jacq...@apache.org> >> > wrote: >> > > > >> > > > And yes, for those of you looking closely, I commented on ARROW-245 >> > when >> > > it >> > > > was committed. I just forgot about it. >> > > > >> > > > It looks like I had mostly the same concerns then that I do now :) >> Now >> > > I'm >> > > > just more worried about format sprawl... >> > > > >> > > > On Mon, Aug 31, 2020 at 1:30 PM Jacques Nadeau <jacq...@apache.org> >> > > wrote: >> > > > >> > > > > What do you mean? The Endianness field (a Big|Little enum) was >> > added 4 >> > > > >> years ago: >> > > > >> https://issues.apache.org/jira/browse/ARROW-245 >> > > > > >> > > > > >> > > > > I didn't realize that was done, my bad. Good example of format rot >> > > from my >> > > > > pov. >> > > > > >> > > > > >> > > > > >> > > >> > >> >