vigneshsiva11 commented on PR #9362: URL: https://github.com/apache/arrow-rs/pull/9362#issuecomment-3909469984
Hi @alamb, You're absolutely right - this PR doesn't actually implement batch splitting as described. After further investigation, I realized that: **This PR (#9362) only:** - Moves the overflow check to happen BEFORE buffer mutation - Prevents buffer corruption by detecting overflow early - Returns a clean error instead of potentially panicking **This PR does NOT:** - Emit smaller batches - Stop decoding early to avoid overflow - Change the batch size **The complete solution is in PR #9369:** https://github.com/apache/arrow-rs/pull/9369 PR #9369 includes the same defensive overflow check from this PR, PLUS the full implementation that actually splits RecordBatches when binary offsets would overflow. It properly detects the overflow condition during decoding and stops early, emitting smaller batches as described. I'll close this PR in favor of #9369, which is the comprehensive solution that addresses issue #7973 as intended. Thank you for catching this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
