pradeeee opened a new pull request, #18641:
URL: https://github.com/apache/pinot/pull/18641
## Problem
The `PredownloadScheduler` runs in a predownload init-container before the
main Pinot server starts. It downloads segments from deep storage (e.g., S3,
HDFS, GCS) to local disk so the server can start faster.
Currently, if a deep storage download fails (e.g., due to transient network
issues, throttling, or deep store unavailability), the segment is marked as
failed and the predownload container reports partial failure. There is no
fallback mechanism.
In production, deep storage failures can be transient while peer servers
holding the same segment are available and healthy. The main Pinot server
already supports peer download as a fallback, but the predownload container
does not.
## Changes
### PredownloadScheduler.java
1. Constructor now accepts a `boolean peerDownloadEnabled` parameter
2. Reads the peer download scheme from `InstanceDataManagerConfig`
3. Extracted existing deep store download logic into
`downloadFromDeepStore()` for clarity
4. Added `downloadFromPeers()` method that:
- Discovers ONLINE peer servers via ExternalView from ZooKeeper
- Shuffles the peer list for load distribution
- Downloads the segment tar file from a peer using
`SegmentFetcherFactory.fetchAndDecryptSegmentToLocal()`
- Untars and moves the segment to the final data directory
5. When deep store download fails and peer download is enabled, catches the
exception and falls back to peer download
### PredownloadZKClient.java
1. Added `getPeerServerURIs()` method that discovers ONLINE peer servers
hosting a given segment
2. Uses ExternalView (not IdealState) to find servers that actually have the
segment ready
3. Builds download URIs using the peer's hostname and admin port
4. Mirrors the logic of `PeerServerSegmentFinder` but works without
requiring a `HelixManager` instance (which is not available in the predownload
container)
### Tests
- `PredownloadSchedulerTest.java`: Added tests for deep store fallback to
peer download, peer-only download failure, and constructor with peer download
flag
- `PredownloadZKClientTest.java`: Added tests for `getPeerServerURIs()`
covering no external view, no online peers, and successful peer discovery
## Test Plan
- Unit tests cover:
- Deep store success (no peer fallback triggered)
- Deep store failure with peer download enabled (fallback to peer)
- Deep store failure with peer download disabled (exception propagated)
- Peer discovery with various ExternalView states
- Verified constructor accepts the new parameter without breaking existing
callers
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]