I suspect that relaxing the constraint to native endianness (and including this in any IPC/RPC metadata (per ARROW-245) will not cause too many problems. One of the challenges for us will be testing and continuous integration -- what are the options for running the test suite on a regular basis on big endian platforms? I know that in pandas we occasionally ran into esoteric test failures for the PPC / big-endian Debian package builds but for the most part there haven't been any problems.
- Wes On Fri, Aug 5, 2016 at 4:39 AM, Sanjay Rao <getsanjay...@live.com> wrote: > Some places where explicit check for Little Endian is there- > ./memory/src/main/java/io/netty/buffer/UnsafeDirectLittleEndian.java: if > (!NATIVE_ORDER || buf.order() != ByteOrder.BIG_ENDIAN) > {./memory/src/main/java/io/netty/buffer/UnsafeDirectLittleEndian.java: > throw new IllegalStateException("Arrow only runs on LittleEndian systems."); > Sanjay >> From: pchan...@maprtech.com >> Date: Thu, 4 Aug 2016 17:04:34 -0700 >> Subject: Re: Is there plan to support BigEndian Systems like SUN SPARC >> Hardware ? >> To: dev@arrow.apache.org; emkornfi...@gmail.com >> CC: jul...@dremio.com >> >> Drill's assumption of little endian is in the ValueVector code, and Arrow >> has inherited the same assertion. ( >> https://github.com/apache/arrow/blob/master/java/memory/src/main/java/io/netty/buffer/UnsafeDirectLittleEndian.java#L58 >> ) >> >> In the Java implementation, the underlying Netty implementation handles the >> conversion between endianness fairly well, so potentially this assert can >> be removed from here and Drill can move this higher up in the Drill code. >> >> >> Parth >> >> On Thu, Aug 4, 2016 at 1:14 PM, Micah Kornfield <emkornfi...@gmail.com> >> wrote: >> >> > Hi Julien, >> > Thats the theory. I don't think that there is anything in the C++ code >> > base that should break but we don't have access to hardware to verify that. >> > >> > The java Arrow code currently asserts that it is running on a little endian >> > machine. I did a very quick scan of the Java code and didn't see anything >> > there would break on a big-endian system, but according to at least one >> > person who is working on Drill, it seems that Drill assumes little >> > endianness (I don't know if this is in Arrow/ValueVector code or it is >> > higher up the stack in the Drill code). >> > >> > Thanks, >> > Micah >> > >> > >> > On Thu, Aug 4, 2016 at 11:36 AM, Julien Le Dem <jul...@dremio.com> wrote: >> > >> > > So it sounds like right now it just works as long as there are no >> > > inter-system communication (with different endianness) because both java >> > > and c++ code just use the underlying endianness. >> > > Is that correct? >> > > >> > > >> > > On Thu, Aug 4, 2016 at 11:17 AM, Micah Kornfield <emkornfi...@gmail.com> >> > > wrote: >> > > >> > >> Hi Sanjay, >> > >> I think we are trying to work that out now. As you've seen with some of >> > >> you initial investigation we have no coverage for big-endian machines >> > yet. >> > >> But in the long run, we should be able to make it work (it seems like >> > >> there >> > >> might be some difference of opinion on how to make it work). >> > >> >> > >> Thanks, >> > >> Micah >> > >> >> > >> On Mon, Aug 1, 2016 at 11:16 AM, Sanjay Rao <getsanjay...@live.com> >> > >> wrote: >> > >> >> > >> > Hi Wes, Hi Micah, >> > >> > I understood what you meant, so point 2. Arrow working with Big Endian >> > >> > machine to Big Endian shouldn't be an issue right ? >> > >> > Please confirm. >> > >> > Thanks,Sanjay >> > >> > > From: wesmck...@gmail.com >> > >> > > Date: Mon, 1 Aug 2016 11:07:07 -0700 >> > >> > > Subject: Re: Is there plan to support BigEndian Systems like SUN >> > SPARC >> > >> > Hardware ? >> > >> > > To: dev@arrow.apache.org; emkornfi...@gmail.com >> > >> > > >> > >> > > hey Micah, >> > >> > > >> > >> > > On Mon, Aug 1, 2016 at 11:02 AM, Micah Kornfield < >> > >> emkornfi...@gmail.com> >> > >> > wrote: >> > >> > > > Hi Wes, >> > >> > > > The point I was trying to argue from an earlier thread is that the >> > >> most >> > >> > > > common cases for relocation are: >> > >> > > > 1. Little endian machine to little endian machine (most likely >> > same >> > >> > > > machine) >> > >> > > > 2. big endian machine to big endian machine (most likely same >> > >> machine) >> > >> > > > 3. big endian machine to little endian machine or vice versa >> > >> > > > >> > >> > > > The purpose of the metadata would be to make use-cases 1 and 2 >> > >> possible >> > >> > > > without byte-swapping. Use case 3 would obviously require byte >> > >> > swapping >> > >> > > > but for an initial implementation the code could simply indicate >> > >> that >> > >> > it is >> > >> > > > not supported. >> > >> > > > >> > >> > > > This seems less complex to me than actually implementing any sort >> > of >> > >> > > > byte-swapping logic while still supporting the widest variety of >> > >> > hardware >> > >> > > > with the same code for the most common use-cases. >> > >> > > >> > >> > > This makes sense. My comments were for the situation that a big >> > endian >> > >> > > system would be exposing memory to an unknown consumer -- for >> > example, >> > >> > > if we implemented an RPC wire format for Arrow memory, then in >> > general >> > >> > > a big endian system would need to send little-endian integers to an >> > >> > > arbitrary receiver. I'm not sure the best way to provide for easy >> > >> > > native-endianness support for cases 1/2, but trying to fully solve >> > >> > > this problem now seems premature until we've established some of >> > these >> > >> > > tools (so long as we haven't painted ourselves into a corner). >> > >> > > >> > >> > > - Wes >> > >> > > >> > >> > > > >> > >> > > > Thanks, >> > >> > > > Micah >> > >> > > > >> > >> > > > P.S. If anybody can provide pointers I'd be interested to >> > understand >> > >> > which >> > >> > > > pieces of the java code make assumptions about little-endianness. >> > >> > >> > >> > >> > >> >> > > >> > > >> > > >> > > -- >> > > Julien >> > > >> > >