In case any one wants to comment further, I've opened https://github.com/apache/arrow/pull/8374 <https://github.com/apache/arrow/pull/8374#pullrequestreview-504324361> to canonicalize the details.
On Mon, Sep 28, 2020 at 9:08 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > OK, I will try to update documentation reflecting this in the next few > days (in particular it would be good to document which implementations are > willing to support byte flipping). > > On Tue, Sep 22, 2020 at 3:30 AM Antoine Pitrou <anto...@python.org> wrote: > >> >> >> Le 22/09/2020 à 06:36, Micah Kornfield a écrit : >> > I wanted to give this thread a bump, does the proposal I made below >> sound >> > reasonable? >> >> It does! >> >> Regards >> >> Antoine. >> >> >> > >> > On Sun, Sep 13, 2020 at 9:57 PM Micah Kornfield <emkornfi...@gmail.com> >> > wrote: >> > >> >> If I read the responses so far it seems like the following might be a >> good >> >> compromise/summary: >> >> >> >> 1. It does not seem too invasive to support native endianness in >> >> implementation libraries. As long as there is appropriate performance >> >> testing and CI infrastructure to demonstrate the changes work. >> >> 2. It is up to implementation maintainers if they wish to accept PRs >> that >> >> handle byte swapping between different architectures. (Right now it >> sounds >> >> like C++ is potentially OK with it and for Java at least Jacques is >> opposed >> >> to it? >> >> >> >> Testing changes that break big-endian can be a potential drag on >> developer >> >> productivity but there are methods to run locally (at least on more >> recent >> >> OSes). >> >> >> >> Thoughts? >> >> >> >> Thanks, >> >> Micah >> >> >> >> On Mon, Aug 31, 2020 at 7:08 PM Fan Liya <liya.fa...@gmail.com> wrote: >> >> >> >>> Thank Kazuaki for the survey and thank Micah for starting the >> discussion. >> >>> >> >>> I do not oppose supporting BE. In fact, I am in general optimistic >> about >> >>> the performance impact (for Java). >> >>> IMO, this is going to be a painful way (many byte order related >> problems >> >>> are tricky to debug), so I hope we can make it short. >> >>> >> >>> It is good that someone is willing to take this on, and I would like >> to >> >>> provide help if needed. >> >>> >> >>> Best, >> >>> Liya Fan >> >>> >> >>> >> >>> >> >>> On Tue, Sep 1, 2020 at 7:25 AM Bryan Cutler <cutl...@gmail.com> >> wrote: >> >>> >> >>>> I also think this would be a worthwhile addition and help the project >> >>>> expand in more areas. Beyond the Apache Spark optimization use case, >> >>> having >> >>>> Arrow interoperability with the Python data science stack on BE >> would be >> >>>> very useful. I have looked at the remaining PRs for Java and they >> seem >> >>>> pretty minimal and straightforward. Implementing the equivalent >> record >> >>>> batch swapping as done in C++ at [1] would be a little more involved, >> >>> but >> >>>> still reasonable. Would it make sense to create a branch to apply all >> >>>> remaining changes with CI to get a better picture before deciding on >> >>>> bringing into master branch? I could help out with shepherding this >> >>> effort >> >>>> and assist in maintenance, if we decide to accept. >> >>>> >> >>>> Bryan >> >>>> >> >>>> [1] https://github.com/apache/arrow/pull/7507 >> >>>> >> >>>> On Mon, Aug 31, 2020 at 1:42 PM Wes McKinney <wesmck...@gmail.com> >> >>> wrote: >> >>>> >> >>>>> I think it's well within the right of an implementation to reject BE >> >>>>> data (or non-native-endian), but if an implementation chooses to >> >>>>> implement and maintain the endianness conversions, then it does not >> >>>>> seem so bad to me. >> >>>>> >> >>>>> On Mon, Aug 31, 2020 at 3:33 PM Jacques Nadeau <jacq...@apache.org> >> >>>> wrote: >> >>>>>> >> >>>>>> And yes, for those of you looking closely, I commented on ARROW-245 >> >>>> when >> >>>>> it >> >>>>>> was committed. I just forgot about it. >> >>>>>> >> >>>>>> It looks like I had mostly the same concerns then that I do now :) >> >>> Now >> >>>>> I'm >> >>>>>> just more worried about format sprawl... >> >>>>>> >> >>>>>> On Mon, Aug 31, 2020 at 1:30 PM Jacques Nadeau <jacq...@apache.org >> > >> >>>>> wrote: >> >>>>>> >> >>>>>>> What do you mean? The Endianness field (a Big|Little enum) was >> >>>> added 4 >> >>>>>>>> years ago: >> >>>>>>>> https://issues.apache.org/jira/browse/ARROW-245 >> >>>>>>> >> >>>>>>> >> >>>>>>> I didn't realize that was done, my bad. Good example of format rot >> >>>>> from my >> >>>>>>> pov. >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > >> >