temporaryfix commented on issue #315:
URL: https://github.com/apache/arrow-js/issues/315#issuecomment-4707910457

   We're hitting what looks like the same bug — `tableToIPC()` producing an IPC 
buffer that `tableFromIPC()` then can't read — and we have a **deterministic, 
purely in-memory repro** (no OPFS/storage involved), which might help isolate 
it.
   
   **Symptom.** A defensive round-trip
   
   ```js
   tableFromIPC(tableToIPC(new Table([batch])))
   ```
   
   throws:
   
   ```
   Error: Expected to read <N> bytes for message body, but only read <N−K>
   ```
   
   i.e. `tableToIPC` emits a RecordBatch message whose declared `bodyLength` is 
larger than the bytes it actually writes, so `tableFromIPC` runs off the end of 
the body. (In your OPFS case the same over/under-count could surface as 
*either* a throw or a hang depending on where the reader lands — so this may be 
data-shape-dependent rather than truly "occasional".)
   
   **Versions.** Reproduces identically on `apache-arrow` **18.1.0** and 
**21.1.0** (latest). Upgrading does not help.
   
   **Trigger (narrow + deterministic).** It only fires for a *full* (`nullCount 
=== 0`) RecordBatch carrying **nested** columns. In our case those are GeoArrow:
   
   - `geometry: Struct<x: float64, y: float64>` (point)
   - `uncertainty: List<List<Struct<x: float64, y: float64>>>` (polygon)
   
   and the batch was imported via 
[`arrow-js-ffi`](https://github.com/kylebarron/arrow-js-ffi) from the **C Data 
Interface** (zero-copy `Data` views into WASM linear memory), rather than built 
from fresh JS buffers.
   
   What *passes* the same round-trip:
   - sparse / null-padded chunks of the same schema,
   - flat (non-nested) schemas,
   - round-tripping each column **individually**.
   
   So it looks like a **cumulative `bodyLength` miscount across the full nested 
batch**, not a per-column error. We could not reproduce by hand-constructing 
the nested types from fresh JS `Data` — it seems to need the actual buffer 
layout produced by the CDI import.
   
   **Workaround.** We replaced the `tableToIPC` → `tableFromIPC` round-trip (we 
used it only as a defensive copy onto the JS heap) with a direct recursive 
buffer deep-copy of each column's `Data` (slice each typed-array buffer, clone 
children). Correct, and faster.
   
   **Offer.** We have a ~190 KB `.arrow` fixture (150 real GeoArrow rows) that 
reproduces this deterministically, and can assemble a minimal `apache-arrow` + 
`arrow-js-ffi` repro harness if that would help triage. Happy to share.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to