truffle-dev commented on code in PR #883: URL: https://github.com/apache/arrow-go/pull/883#discussion_r3518801470
########## parquet/file/column_writer.go: ########## Review Comment: Not the wrong spot, just narrower than it looks. It's a precondition on the batch handed to `doBatches`, and `WriteBatchSpaced` always enters on a fresh row, so `repLevels[0]` is always 0 here — it can't trip on this bug. The mid-row split happened a layer down: `commitWriteAndCheckPageLimit` flushes once buffered size crosses `DataPageSize`, and pre-fix the batches weren't row-aligned, so a page could close mid-row. This entry check never sees the per-page split. The fix makes that invariant constructive rather than asserted — `alignBatchToRowBoundary` hands every `action()` a batch that begins and ends on a boundary, so each flush lands on one by construction. No broader runtime assertion is needed for correctness. If you'd want defense-in-depth against a future regression in the helper, the spot is `FlushCurrentPage` just before `buildDataPageV2`: assert the flushed page's first repetition level is 0 and `numBufferedRows` matches the row-start count. That guards the boundary where the split actually happens instead of relying on the row-boundary tests. Glad to add it if you'd like it in the tree. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
