Re: Pandas Block Manager

Nicholas White Thu, 12 Nov 2020 15:02:23 -0800

OK got everything to work, https://github.com/apache/arrow/pull/8644 (part
of ARROW-10573 now) is ready for review. I've updated the test case to show
it is possible to zero-copy a pandas DataFrame! The next step is to dig
into `arrow_to_pandas.cc` to make it work automagically...


On Wed, 11 Nov 2020 at 22:52, Nicholas White <n.j.wh...@gmail.com> wrote:

> Thanks all, this has been interesting. I've made a patch that sort-of does
> what I want[1] - I hope the test case is clear! I made the batch writer use
> the `alignment` field that was already in the `IpcWriteOptions` to align
> the buffers, instead of fixing their alignment at 8. Arrow then writes out
> the buffers consecutively, so you can map them as a 2D memory array like I
> wanted. There's one problem though...the test case thinks the arrow data is
> invalid as it can't read the metadata properly (error below). Do you have
> any idea why? I think it's because Arrow puts the metadata at the end of
> the file after the now-unaligned buffers yet assumes the metadata is still
> 8-byte aligned (which it probably no longer is).
>
> Nick
>
> ````
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> pyarrow/ipc.pxi:494: in pyarrow.lib.RecordBatchReader.read_all
>     check_status(self.reader.get().ReadAll(&table))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>
> >   raise ArrowInvalid(message)
> E   pyarrow.lib.ArrowInvalid: Expected to read 117703432 metadata bytes,
> but only read 19
> ````
>
> [1] https://github.com/apache/arrow/pull/8644
>
>

Re: Pandas Block Manager

Reply via email to