Apache): Add prefetch functionality to CloudFetch in Spark ADBC driver [arrow-adbc]

via GitHub Thu, 10 Apr 2025 13:23:49 -0700


jadewang-db opened a new pull request, #2678:
URL: https://github.com/apache/arrow-adbc/pull/2678


   # Add Prefetch Functionality to CloudFetch in Spark ADBC Driver
   
   This PR enhances the CloudFetch feature in the Spark ADBC driver by 
implementing prefetch functionality, which improves performance by fetching 
multiple batches of results ahead of time.
   
   ## Changes
   
   ### CloudFetchResultFetcher Enhancements
   
   - **Initial Prefetch**: Added code to perform an initial prefetch of 
multiple batches when the fetcher starts, ensuring data is available 
immediately when needed.
   - **State Management**: Added tracking for current batch offset and size, 
with proper state reset when starting the fetcher.
   - **Link Caching**: Implemented caching of result links to avoid unnecessary 
server requests.
   - **Enhanced Refresh Logic**: Improved the link refresh mechanism to handle 
cases when the requested offset is not in the current batch.
   
   ### Interface Updates
   
   - Added new methods to `ICloudFetchResultFetcher` interface:
     - `RefreshLinkAsync`: Refreshes a link for a specified row offset
     - `RefreshCurrentBatchAsync`: Refreshes all links in the current batch
   
   ### Testing Infrastructure
   
   - Created `ITestableHiveServer2Statement` interface to facilitate testing
   - Updated tests to account for prefetch behavior
   - Ensured all tests pass with the new prefetch functionality
   
   ## Benefits
   
   - **Improved Performance**: By prefetching multiple batches, data is 
available sooner, reducing wait times.
   - **Better Reliability**: Enhanced error handling and state management make 
the system more robust.
   - **More Efficient Resource Usage**: Link caching reduces unnecessary server 
requests.
   
   This implementation maintains backward compatibility while providing 
significant performance improvements for CloudFetch operations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] feat(csharp/src/Drivers/Apache): Add prefetch functionality to CloudFetch in Spark ADBC driver [arrow-adbc]

Reply via email to