jadami10 opened a new issue, #11488: URL: https://github.com/apache/pinot/issues/11488
We caught this as we're trying to use the new ingestion lag metric. There is a [5 minute delay](https://github.com/apache/pinot/blob/2f48060d37f94a4570a0b15b13a58391e177ee8a/pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java#L109C41-L109C41) before the lead controller resource rebalances partitions. When we unexpectedly lose a controller, maybe this helps. But Pinot has a path for gracefully shutting down a controller, and this shouldn't impact ingestion lag. Potential solutions here: - interestingly enough, helix has added [on demand rebalance](https://github.com/apache/helix/pull/2601) support as of last week. I know we recently upgraded to helix 1.x, so maybe a further upgrade is possible - we can make the rebalance delay configurable. I think for us, we're just going to test with 0, or no delay. It doesn't seem like rebalancing is particularly expensive, and server retry segment commits quite frequently. This raises a secondary issue that the `leadControllerResource` cannot be update once created, though that's a simple enough fix. Happy to hear other thoughts here or maybe even a background on if 5 minutes was choses for a reason. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
