I also think this would be a worthwhile addition and help the project expand in more areas. Beyond the Apache Spark optimization use case, having Arrow interoperability with the Python data science stack on BE would be very useful. I have looked at the remaining PRs for Java and they seem pretty minimal and straightforward. Implementing the equivalent record batch swapping as done in C++ at [1] would be a little more involved, but still reasonable. Would it make sense to create a branch to apply all remaining changes with CI to get a better picture before deciding on bringing into master branch? I could help out with shepherding this effort and assist in maintenance, if we decide to accept.
Bryan [1] https://github.com/apache/arrow/pull/7507 On Mon, Aug 31, 2020 at 1:42 PM Wes McKinney <wesmck...@gmail.com> wrote: > I think it's well within the right of an implementation to reject BE > data (or non-native-endian), but if an implementation chooses to > implement and maintain the endianness conversions, then it does not > seem so bad to me. > > On Mon, Aug 31, 2020 at 3:33 PM Jacques Nadeau <jacq...@apache.org> wrote: > > > > And yes, for those of you looking closely, I commented on ARROW-245 when > it > > was committed. I just forgot about it. > > > > It looks like I had mostly the same concerns then that I do now :) Now > I'm > > just more worried about format sprawl... > > > > On Mon, Aug 31, 2020 at 1:30 PM Jacques Nadeau <jacq...@apache.org> > wrote: > > > > > What do you mean? The Endianness field (a Big|Little enum) was added 4 > > >> years ago: > > >> https://issues.apache.org/jira/browse/ARROW-245 > > > > > > > > > I didn't realize that was done, my bad. Good example of format rot > from my > > > pov. > > > > > > > > > >