jadewang-db opened a new pull request, #2678: URL: https://github.com/apache/arrow-adbc/pull/2678
# Add Prefetch Functionality to CloudFetch in Spark ADBC Driver This PR enhances the CloudFetch feature in the Spark ADBC driver by implementing prefetch functionality, which improves performance by fetching multiple batches of results ahead of time. ## Changes ### CloudFetchResultFetcher Enhancements - **Initial Prefetch**: Added code to perform an initial prefetch of multiple batches when the fetcher starts, ensuring data is available immediately when needed. - **State Management**: Added tracking for current batch offset and size, with proper state reset when starting the fetcher. - **Link Caching**: Implemented caching of result links to avoid unnecessary server requests. - **Enhanced Refresh Logic**: Improved the link refresh mechanism to handle cases when the requested offset is not in the current batch. ### Interface Updates - Added new methods to `ICloudFetchResultFetcher` interface: - `RefreshLinkAsync`: Refreshes a link for a specified row offset - `RefreshCurrentBatchAsync`: Refreshes all links in the current batch ### Testing Infrastructure - Created `ITestableHiveServer2Statement` interface to facilitate testing - Updated tests to account for prefetch behavior - Ensured all tests pass with the new prefetch functionality ## Benefits - **Improved Performance**: By prefetching multiple batches, data is available sooner, reducing wait times. - **Better Reliability**: Enhanced error handling and state management make the system more robust. - **More Efficient Resource Usage**: Link caching reduces unnecessary server requests. This implementation maintains backward compatibility while providing significant performance improvements for CloudFetch operations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org