rohityadav1993 opened a new pull request, #17754: URL: https://github.com/apache/pinot/pull/17754
- `feature` - `release-notes` - `bugfix` ## Description This change introduces automated repair for partially offline replicas in realtime segment consumption. This addresses scenarios(issue: #11314) where some replicas fail during initialization (e.g., KafkaConsumer errors) and mark themselves OFFLINE while other replicas continue consuming normally. ### Changes - Added new configuration flag `controller.realtime.segment.partialOfflineReplicaRepairEnabled` (defaults to `false`) - Enhanced `PinotLLCRealtimeSegmentManager` validation to detect and repair mixed CONSUMING/OFFLINE replica states - When enabled, controller automatically resets OFFLINE replicas back to CONSUMING state for IN_PROGRESS segments, allowing retry ### Implementation Details **Configuration:** - New property: `controller.realtime.segment.partialOfflineReplicaRepairEnabled` in `ControllerConf` - Default: `false` (opt-in for backward compatibility) **Repair Logic:** - Detects segments with mixed CONSUMING/OFFLINE replica states during validation - Logs repair actions with details (segment name, offline count, instance list) - Resets identified OFFLINE replicas to CONSUMING state **Testing:** Unit Tests: - Added unit tests for enabled scenario (verifies OFFLINE→CONSUMING transition) - Added unit tests for disabled scenario (verifies no-op behavior) Local cluster test pLan: - Set up realtime table(kafka) with two servers and two replicas, partialOfflineReplicaRepairEnabled = true - Mangle DNS config: `echo "nameserver 0.0.0.0" > /etc/resolv.conf` in server-1 - Force commit, new consuming server on server-1 comes up in error state and moves to OFFLINE state while server-2 is in CONSUMING state - Run controller validation job: RealtimeSegmentValidationManager - The replica becomes healthy in CONSUMING state. ## Upgrade Notes This feature is disabled by default. To enable, set: `controller.realtime.segment.partialOfflineReplicaRepairEnabled=true` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
