mediprtl opened a new pull request, #4320:
URL: https://github.com/apache/arrow-adbc/pull/4320
## Summary
`PostgresCopyListFieldWriter::Write` computed each row's child range from
the *logical* row index without adding `array_view_->offset`. When the parent
`List` / `LargeList` / `FixedSizeList` array has `offset > 0` (a sliced
parent), the variable-length branch read the wrong slot of the offsets buffer
(`ArrowArrayViewListChildOffset` does not honor `offset`, unlike
`ArrowArrayViewGetIntUnsafe`), and the fixed-size branch multiplied the wrong
base index by the element size. The child ranges still indexed into the
still-full child values buffer, so list elements ended up attached to the wrong
rows.
Practical impact: silent, per-row drift of list-column values when an Arrow
table is sliced into multiple batches and ingested via `adbc_ingest` with
`mode="create"` then `mode="append"`. The first chunk (`offset=0`) is always
correct; every subsequent chunk's list column is shifted by the chunk's
`parent.offset`. Scalar columns are unaffected because their writers route
through `ArrowArrayViewGetInt*`, which honors `offset`.
Fixes #4319.
## Fix
```cpp
const int64_t logical = array_view_->offset + index;
if constexpr (IsFixedSize) {
start = logical * array_view_->layout.child_size_elements;
end = start + array_view_->layout.child_size_elements;
} else {
start = ArrowArrayViewListChildOffset(array_view_, logical);
end = ArrowArrayViewListChildOffset(array_view_, logical + 1);
}
```
## Test plan
- [x] Added `PostgresCopyWriteListSlicedMatchesDirect` (parameterized over
`LIST` and `LARGE_LIST`) and
`PostgresCopyWriteFixedSizeListSlicedMatchesDirect`. Each writes rows 3..5 of a
6-row source via `offset`/`length` and asserts the output body equals what the
writer produces for those same rows passed as a fresh 3-row array.
- [x] Verified the new tests **fail on `main`** (sliced output is the wrong
size — extra bytes from longer rows the writer mis-attributes) and **pass with
this change**.
- [x] Full `adbc-driver-postgresql-copy-test` suite (73 tests) passes with
the change — no regressions in existing list / fixed-size-list / multi-batch
coverage.
- [x] End-to-end reproduction against the `postgres-test` service from
`compose.yaml`, using `adbc-driver-postgresql 1.11.0` with the patched `.so`
swapped in: a multi-chunk PyArrow `Table.slice` → `adbc_ingest` flow that
drifted before now round-trips correctly for `list<string>`,
`large_list<string>`, and `fixed_size_list<string, 2>`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]