jadewang-db opened a new pull request, #2855:
URL: https://github.com/apache/arrow-adbc/pull/2855
### Problem
The Databricks driver's CloudFetch functionality was not properly handling
expired cloud file URLs, which could lead to failed downloads and errors during
query execution. The system needed a way to track, cache, and refresh presigned
URLs before they expire.
### Solution
- Implemented a new `CloudFetchUrlManager` class that:
- Manages a cache of cloud file URLs with their expiration times
- Proactively refreshes URLs that are about to expire
- Efficiently fetches and caches URLs in batches
- Provides thread-safe access to URL information
- Added an `IClock` interface and implementations to facilitate testing with
controlled time
- Extended the `IDownloadResult` interface to support URL refreshing and
expiration checking
- Updated namespace from
`Apache.Arrow.Adbc.Drivers.Apache.Databricks.CloudFetch` to
`Apache.Arrow.Adbc.Drivers.Databricks.CloudFetch` for better organization
### Testing
- Created comprehensive unit tests in `CloudFetchUrlManagerTest.cs` that
verify:
- URL caching behavior
- Proper handling of URL expiration
- Batch fetching of URLs
- Refreshing of expired URLs
- Thread safety of the implementation
This change improves the reliability of the CloudFetch functionality by
ensuring that cloud file URLs are refreshed before they expire, preventing
download failures during query execution
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]