eric-wang-1990 opened a new pull request, #3656: URL: https://github.com/apache/arrow-adbc/pull/3656
## Summary This PR contains two changes: 1. **Reapplies PR #3652** - The memory utilization improvements that were reverted in #3655 2. **Clarifies memory manager behavior** - Documents that `CloudFetchMemoryBufferManager` tracks in-flight compressed download sizes and reduces the default from 200MB to 100MB ## Reapplying PR #3652 The original PR #3652 improved memory utilization by: - Properly disposing of IPC readers after reading record batches - Using `ReadNextRecordBatchAsync` instead of synchronous reads - Better resource cleanup in the download pipeline This PR was reverted due to a suspected regression, but the root cause was actually related to the memory manager behavior, which is now clarified. ## Memory Manager Clarification The `CloudFetchMemoryBufferManager` tracks **in-flight download memory based on compressed file sizes**, not decompressed sizes. This design is intentional: 1. **Limits concurrent downloads** - Prevents unbounded parallel downloads from exhausting system resources 2. **Natural decompression bounds** - Decompressed data memory is naturally bounded by the result queue capacity and batch processing flow 3. **Lightweight concurrency control** - Tracking compressed sizes provides efficient download throttling without overhead of tracking decompressed memory ### Changes - Added comprehensive documentation to `CloudFetchMemoryBufferManager` explaining it tracks in-flight compressed data sizes - Reduced `DefaultMemoryBufferSizeMB` from 200 to 100 in `CloudFetchDownloadManager` - Added inline comments clarifying that size parameters represent compressed file sizes from the server ### Impact With typical CloudFetch file sizes of ~1MB compressed, the 100MB default allows for approximately 100 concurrent in-flight downloads, providing better control over concurrent download behavior while maintaining good throughput. ## Test plan - [ ] Existing CloudFetch tests pass - [ ] Manual testing with CloudFetch queries to verify download behavior - [ ] Verify the original regression from #3655 is resolved 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
