jadami10 opened a new issue, #11488:
URL: https://github.com/apache/pinot/issues/11488

   We caught this as we're trying to use the new ingestion lag metric. There is 
a [5 minute 
delay](https://github.com/apache/pinot/blob/2f48060d37f94a4570a0b15b13a58391e177ee8a/pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java#L109C41-L109C41)
 before the lead controller resource rebalances partitions. When we 
unexpectedly lose a controller, maybe this helps. But Pinot has a path for 
gracefully shutting down a controller, and this shouldn't impact ingestion lag.
   
   Potential solutions here:
   - interestingly enough, helix has added [on demand 
rebalance](https://github.com/apache/helix/pull/2601) support as of last week. 
I know we recently upgraded to helix 1.x, so maybe a further upgrade is possible
   - we can make the rebalance delay configurable. I think for us, we're just 
going to test with 0, or no delay. It doesn't seem like rebalancing is 
particularly expensive, and server retry segment commits quite frequently. This 
raises a secondary issue that the `leadControllerResource` cannot be update 
once created, though that's a simple enough fix.
   
   Happy to hear other thoughts here or maybe even a background on if 5 minutes 
was choses for a reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to