OK got everything to work, https://github.com/apache/arrow/pull/8644 (part of ARROW-10573 now) is ready for review. I've updated the test case to show it is possible to zero-copy a pandas DataFrame! The next step is to dig into `arrow_to_pandas.cc` to make it work automagically...
On Wed, 11 Nov 2020 at 22:52, Nicholas White <n.j.wh...@gmail.com> wrote: > Thanks all, this has been interesting. I've made a patch that sort-of does > what I want[1] - I hope the test case is clear! I made the batch writer use > the `alignment` field that was already in the `IpcWriteOptions` to align > the buffers, instead of fixing their alignment at 8. Arrow then writes out > the buffers consecutively, so you can map them as a 2D memory array like I > wanted. There's one problem though...the test case thinks the arrow data is > invalid as it can't read the metadata properly (error below). Do you have > any idea why? I think it's because Arrow puts the metadata at the end of > the file after the now-unaligned buffers yet assumes the metadata is still > 8-byte aligned (which it probably no longer is). > > Nick > > ```` > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > pyarrow/ipc.pxi:494: in pyarrow.lib.RecordBatchReader.read_all > check_status(self.reader.get().ReadAll(&table)) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > > raise ArrowInvalid(message) > E pyarrow.lib.ArrowInvalid: Expected to read 117703432 metadata bytes, > but only read 19 > ```` > > [1] https://github.com/apache/arrow/pull/8644 > >