pradeeee opened a new pull request, #18641:
URL: https://github.com/apache/pinot/pull/18641

   ## Problem
   
   The `PredownloadScheduler` runs in a predownload init-container before the 
main Pinot server starts. It downloads segments from deep storage (e.g., S3, 
HDFS, GCS) to local disk so the server can start faster.
   
   Currently, if a deep storage download fails (e.g., due to transient network 
issues, throttling, or deep store unavailability), the segment is marked as 
failed and the predownload container reports partial failure. There is no 
fallback mechanism.
   
   In production, deep storage failures can be transient while peer servers 
holding the same segment are available and healthy. The main Pinot server 
already supports peer download as a fallback, but the predownload container 
does not.
   
   ## Changes
   
   ### PredownloadScheduler.java
   1. Constructor now accepts a `boolean peerDownloadEnabled` parameter
   2. Reads the peer download scheme from `InstanceDataManagerConfig`
   3. Extracted existing deep store download logic into 
`downloadFromDeepStore()` for clarity
   4. Added `downloadFromPeers()` method that:
      - Discovers ONLINE peer servers via ExternalView from ZooKeeper
      - Shuffles the peer list for load distribution
      - Downloads the segment tar file from a peer using 
`SegmentFetcherFactory.fetchAndDecryptSegmentToLocal()`
      - Untars and moves the segment to the final data directory
   5. When deep store download fails and peer download is enabled, catches the 
exception and falls back to peer download
   
   ### PredownloadZKClient.java
   1. Added `getPeerServerURIs()` method that discovers ONLINE peer servers 
hosting a given segment
   2. Uses ExternalView (not IdealState) to find servers that actually have the 
segment ready
   3. Builds download URIs using the peer's hostname and admin port
   4. Mirrors the logic of `PeerServerSegmentFinder` but works without 
requiring a `HelixManager` instance (which is not available in the predownload 
container)
   
   ### Tests
   - `PredownloadSchedulerTest.java`: Added tests for deep store fallback to 
peer download, peer-only download failure, and constructor with peer download 
flag
   - `PredownloadZKClientTest.java`: Added tests for `getPeerServerURIs()` 
covering no external view, no online peers, and successful peer discovery
   
   ## Test Plan
   - Unit tests cover:
     - Deep store success (no peer fallback triggered)
     - Deep store failure with peer download enabled (fallback to peer)
     - Deep store failure with peer download disabled (exception propagated)
     - Peer discovery with various ExternalView states
   - Verified constructor accepts the new parameter without breaking existing 
callers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to