[
https://issues.apache.org/jira/browse/ARROW-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504715#comment-17504715
]
Dan Coates commented on ARROW-15642:
------------------------------------
Thank you for the explanation [~westonpace] it is much appreciated.
I think my confusion might have stemmed from the fact that arquero creates
arrow files in the streaming format. I'll see if it is possible to add an
option to arquero to output IPC file format instead.
> [Python] [JavaScript] Arrow IPC file output by apache-arrow tableToIPC method
> cannot be read by pyarrow
> -------------------------------------------------------------------------------------------------------
>
> Key: ARROW-15642
> URL: https://issues.apache.org/jira/browse/ARROW-15642
> Project: Apache Arrow
> Issue Type: Bug
> Components: JavaScript, Python
> Affects Versions: 7.0.0
> Reporter: Dan Coates
> Assignee: Weston Pace
> Priority: Major
>
> IPC files created by the node library `apache-arrow` don't seem to be able to
> be read by pyarrow. There is an example of this issue here:
> [https://github.com/dancoates/pyarrow-jsarrow-test
> |https://github.com/dancoates/pyarrow-jsarrow-test]
>
> writing the arrow file from js
> {code:javascript}
> import {tableToIPC, tableFromArrays} from 'apache-arrow';
> import fs from 'fs';
> const LENGTH = 2000;
> const rainAmounts = Float32Array.from(
> { length: LENGTH },
> () => Number((Math.random() * 20).toFixed(1)));
> const rainDates = Array.from(
> { length: LENGTH },
> (_, i) => new Date(Date.now() - 1000 * 60 * 60 * 24 * i));
> const rainfall = tableFromArrays({
> precipitation: rainAmounts,
> date: rainDates
> });
> const outputTable = tableToIPC(rainfall);
> fs.writeFileSync('jsarrow.arrow', outputTable); {code}
>
> reading in python
> {code:python}
> import pyarrow as pa
> with open('jsarrow.arrow', 'rb') as f:
> with pa.ipc.open_file(f) as reader:
> df = reader.read_pandas()
> print(df.head())
> {code}
>
> produces the error:
> {code:java}
> pyarrow.lib.ArrowInvalid: Not an Arrow file {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)