[
https://issues.apache.org/jira/browse/ARROW-17123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ARROW-17123:
-----------------------------------
Labels: pull-request-available (was: )
> [JS] Unable to open reader on .arrow file after fetch: Uncaught (in promise)
> Error: Expected to read 1329865020 metadata bytes, but only read 1123.
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-17123
> URL: https://issues.apache.org/jira/browse/ARROW-17123
> Project: Apache Arrow
> Issue Type: Bug
> Components: JavaScript
> Affects Versions: 8.0.1
> Reporter: Benoit Cantin
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> I created a file in raw arrow format with the script given in the Py arrow
> cookbook here:
> [https://arrow.apache.org/cookbook/py/io.html#saving-arrow-arrays-to-disk]
>
> In a Node.js application, this file can be read doing:
>
> {code:java}
> const r = await RecordBatchReader.from(fs.createReadStream(filePath));
>
> await r.open();
> for (let i = 0; i < r.numRecordBatches; i++) {
> const rb = await r.readRecordBatch(i);
> if (rb !== null) {
> console.log(rb.numRows);
> }
> } {code}
> However this method loads the whole file in memory (is that a bug?), which is
> not scalable.
>
> To solve this scalability issue, I try to load the data with fetch as
> described in the the [README.md|#load-data-with-fetch]. Both:
>
> {code:java}
> import { tableFromIPC } from "apache-arrow";
> const table = await tableFromIPC(fetch(filePath));
> console.table([...table]);{code}
> and
> {code:java}
> const r = await RecordBatchReader.from(await fetch(filePath));
> await r.open(); {code}
> fail with error:
> Uncaught (in promise) Error: Expected to read 1329865020 metadata bytes, but
> only read 1123.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)