offset when writing list rows [arrow-adbc]

via GitHub Mon, 18 May 2026 03:34:17 -0700


mediprtl opened a new pull request, #4320:
URL: https://github.com/apache/arrow-adbc/pull/4320


   ## Summary
   
   `PostgresCopyListFieldWriter::Write` computed each row's child range from 
the *logical* row index without adding `array_view_->offset`. When the parent 
`List` / `LargeList` / `FixedSizeList` array has `offset > 0` (a sliced 
parent), the variable-length branch read the wrong slot of the offsets buffer 
(`ArrowArrayViewListChildOffset` does not honor `offset`, unlike 
`ArrowArrayViewGetIntUnsafe`), and the fixed-size branch multiplied the wrong 
base index by the element size. The child ranges still indexed into the 
still-full child values buffer, so list elements ended up attached to the wrong 
rows.
   
   Practical impact: silent, per-row drift of list-column values when an Arrow 
table is sliced into multiple batches and ingested via `adbc_ingest` with 
`mode="create"` then `mode="append"`. The first chunk (`offset=0`) is always 
correct; every subsequent chunk's list column is shifted by the chunk's 
`parent.offset`. Scalar columns are unaffected because their writers route 
through `ArrowArrayViewGetInt*`, which honors `offset`.
   
   Fixes #4319.
   
   ## Fix
   
   ```cpp
   const int64_t logical = array_view_->offset + index;
   if constexpr (IsFixedSize) {
     start = logical * array_view_->layout.child_size_elements;
     end   = start + array_view_->layout.child_size_elements;
   } else {
     start = ArrowArrayViewListChildOffset(array_view_, logical);
     end   = ArrowArrayViewListChildOffset(array_view_, logical + 1);
   }
   ```
   
   ## Test plan
   
   - [x] Added `PostgresCopyWriteListSlicedMatchesDirect` (parameterized over 
`LIST` and `LARGE_LIST`) and 
`PostgresCopyWriteFixedSizeListSlicedMatchesDirect`. Each writes rows 3..5 of a 
6-row source via `offset`/`length` and asserts the output body equals what the 
writer produces for those same rows passed as a fresh 3-row array.
   - [x] Verified the new tests **fail on `main`** (sliced output is the wrong 
size — extra bytes from longer rows the writer mis-attributes) and **pass with 
this change**.
   - [x] Full `adbc-driver-postgresql-copy-test` suite (73 tests) passes with 
the change — no regressions in existing list / fixed-size-list / multi-batch 
coverage.
   - [x] End-to-end reproduction against the `postgres-test` service from 
`compose.yaml`, using `adbc-driver-postgresql 1.11.0` with the patched `.so` 
swapped in: a multi-chunk PyArrow `Table.slice` → `adbc_ingest` flow that 
drifted before now round-trips correctly for `list<string>`, 
`large_list<string>`, and `fixed_size_list<string, 2>`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] fix(c/driver/postgresql): use array_view->offset when writing list rows [arrow-adbc]

Reply via email to