truffle-dev commented on code in PR #883:
URL: https://github.com/apache/arrow-go/pull/883#discussion_r3518801470


##########
parquet/file/column_writer.go:
##########


Review Comment:
   Not the wrong spot, just narrower than it looks. It's a precondition on the 
batch handed to `doBatches`, and `WriteBatchSpaced` always enters on a fresh 
row, so `repLevels[0]` is always 0 here — it can't trip on this bug. The 
mid-row split happened a layer down: `commitWriteAndCheckPageLimit` flushes 
once buffered size crosses `DataPageSize`, and pre-fix the batches weren't 
row-aligned, so a page could close mid-row. This entry check never sees the 
per-page split.
   
   The fix makes that invariant constructive rather than asserted — 
`alignBatchToRowBoundary` hands every `action()` a batch that begins and ends 
on a boundary, so each flush lands on one by construction. No broader runtime 
assertion is needed for correctness.
   
   If you'd want defense-in-depth against a future regression in the helper, 
the spot is `FlushCurrentPage` just before `buildDataPageV2`: assert the 
flushed page's first repetition level is 0 and `numBufferedRows` matches the 
row-start count. That guards the boundary where the split actually happens 
instead of relying on the row-boundary tests. Glad to add it if you'd like it 
in the tree.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to