temporaryfix commented on issue #315: URL: https://github.com/apache/arrow-js/issues/315#issuecomment-4707910457
We're hitting what looks like the same bug — `tableToIPC()` producing an IPC buffer that `tableFromIPC()` then can't read — and we have a **deterministic, purely in-memory repro** (no OPFS/storage involved), which might help isolate it. **Symptom.** A defensive round-trip ```js tableFromIPC(tableToIPC(new Table([batch]))) ``` throws: ``` Error: Expected to read <N> bytes for message body, but only read <N−K> ``` i.e. `tableToIPC` emits a RecordBatch message whose declared `bodyLength` is larger than the bytes it actually writes, so `tableFromIPC` runs off the end of the body. (In your OPFS case the same over/under-count could surface as *either* a throw or a hang depending on where the reader lands — so this may be data-shape-dependent rather than truly "occasional".) **Versions.** Reproduces identically on `apache-arrow` **18.1.0** and **21.1.0** (latest). Upgrading does not help. **Trigger (narrow + deterministic).** It only fires for a *full* (`nullCount === 0`) RecordBatch carrying **nested** columns. In our case those are GeoArrow: - `geometry: Struct<x: float64, y: float64>` (point) - `uncertainty: List<List<Struct<x: float64, y: float64>>>` (polygon) and the batch was imported via [`arrow-js-ffi`](https://github.com/kylebarron/arrow-js-ffi) from the **C Data Interface** (zero-copy `Data` views into WASM linear memory), rather than built from fresh JS buffers. What *passes* the same round-trip: - sparse / null-padded chunks of the same schema, - flat (non-nested) schemas, - round-tripping each column **individually**. So it looks like a **cumulative `bodyLength` miscount across the full nested batch**, not a per-column error. We could not reproduce by hand-constructing the nested types from fresh JS `Data` — it seems to need the actual buffer layout produced by the CDI import. **Workaround.** We replaced the `tableToIPC` → `tableFromIPC` round-trip (we used it only as a defensive copy onto the JS heap) with a direct recursive buffer deep-copy of each column's `Data` (slice each typed-array buffer, clone children). Correct, and faster. **Offer.** We have a ~190 KB `.arrow` fixture (150 real GeoArrow rows) that reproduces this deterministically, and can assemble a minimal `apache-arrow` + `arrow-js-ffi` repro harness if that would help triage. Happy to share. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
