Re: [PR] (DO NOT MERGE)feat(csharp/databricks): Optimize CloudFetch LZ4 decompression with streaming approach [arrow-adbc]

via GitHub Mon, 03 Nov 2025 08:43:04 -0800


CurtHagenlocher commented on PR #3657:
URL: https://github.com/apache/arrow-adbc/pull/3657#issuecomment-3481480340


   The main concern with a purely streaming approach is being able to handle 
retries. That is, let's say that we read half of a response and then the 
connection is reset for some reason. Will the end-to-end system reissue the 
command to re-fetch the data and stream it again or are we forced to return an 
error to the user? Buffering a response in memory ensures that we know the 
entire response was read.
   
   If a single cloud fetch is always just a single Arrow record batch, then 
addressing this concern is relatively straightforward. But it gets more 
complicated if a single fetched stream has multiple record batches and one or 
more have already been returned to the caller when the connection goes awry.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] (DO NOT MERGE)feat(csharp/databricks): Optimize CloudFetch LZ4 decompression with streaming approach [arrow-adbc]

Reply via email to