[PR] feat(csharp/databricks): Clarify CloudFetch memory manager behavior and reapply memory improvements [arrow-adbc]

via GitHub Fri, 31 Oct 2025 01:00:56 -0700


eric-wang-1990 opened a new pull request, #3656:
URL: https://github.com/apache/arrow-adbc/pull/3656


   ## Summary
   
   This PR contains two changes:
   
   1. **Reapplies PR #3652** - The memory utilization improvements that were 
reverted in #3655
   2. **Clarifies memory manager behavior** - Documents that 
`CloudFetchMemoryBufferManager` tracks in-flight compressed download sizes and 
reduces the default from 200MB to 100MB
   
   ## Reapplying PR #3652
   
   The original PR #3652 improved memory utilization by:
   - Properly disposing of IPC readers after reading record batches
   - Using `ReadNextRecordBatchAsync` instead of synchronous reads
   - Better resource cleanup in the download pipeline
   
   This PR was reverted due to a suspected regression, but the root cause was 
actually related to the memory manager behavior, which is now clarified.
   
   ## Memory Manager Clarification
   
   The `CloudFetchMemoryBufferManager` tracks **in-flight download memory based 
on compressed file sizes**, not decompressed sizes. This design is intentional:
   
   1. **Limits concurrent downloads** - Prevents unbounded parallel downloads 
from exhausting system resources
   2. **Natural decompression bounds** - Decompressed data memory is naturally 
bounded by the result queue capacity and batch processing flow
   3. **Lightweight concurrency control** - Tracking compressed sizes provides 
efficient download throttling without overhead of tracking decompressed memory
   
   ### Changes
   
   - Added comprehensive documentation to `CloudFetchMemoryBufferManager` 
explaining it tracks in-flight compressed data sizes
   - Reduced `DefaultMemoryBufferSizeMB` from 200 to 100 in 
`CloudFetchDownloadManager`
   - Added inline comments clarifying that size parameters represent compressed 
file sizes from the server
   
   ### Impact
   
   With typical CloudFetch file sizes of ~1MB compressed, the 100MB default 
allows for approximately 100 concurrent in-flight downloads, providing better 
control over concurrent download behavior while maintaining good throughput.
   
   ## Test plan
   
   - [ ] Existing CloudFetch tests pass
   - [ ] Manual testing with CloudFetch queries to verify download behavior
   - [ ] Verify the original regression from #3655 is resolved
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] feat(csharp/databricks): Clarify CloudFetch memory manager behavior and reapply memory improvements [arrow-adbc]

Reply via email to