mluvin-stripe commented on issue #17599:
URL: https://github.com/apache/pinot/issues/17599#issuecomment-3843951021

   @Jackie-Jiang revisiting this after our meeting last week. Zooming out, I 
think we were discussing two different problems:
   1. Preventing segment uploads from succeeding on a controller when it has 
UNDETERMINED status for the disk utilization check, while other controllers 
have already started denying uploads due to high disk usage (i.e. incorrectly 
failing open)
   2. Ensuring that the controller’s disk utilization cache is populated before 
running the disk utilization check
   
   For (2), we discussed potentially updating the controller's disk utilization 
cache synchronously during each segment upload API call (probably with some 
TTL). The concern I have with this approach is that we’ve seen the disk 
utilization checker task (which updates the controller’s disk utilization 
cache) take up to 3 minutes to run in the worst case (but typically < 500ms avg 
case), and the segment upload HTTP request will timeout by then. I think a ~20s 
timeout on disk utilization checker is reasonable, but then we run back into 
the problem described in (1) above, with an UNDETERMINED checker status.
   
   To address (1), I was thinking we could implement something similar to what 
the realtime ingestion pausing does: have a field in the segment’s ideal state 
that marks whether to “pause” consumption ([realtime 
code](https://github.com/apache/pinot/blob/7305eec8581a4fbbd200b1e7fcb1c8bef380ab64/pinot-controller/src/main/java/org/apache/pinot/controller/validation/RealtimeSegmentValidationManager.java#L159-L179)),
 which for offline tables means to reject all future segment uploads. This is 
making the check “sticky.” The first controller to recognize that disk 
utilization has been exceeded will mark the field in the offline segment’s 
ideal state, so all future segment uploads will correctly fail if the disk 
utilization check is UNDETERMINED (this is what the [realtime 
code](https://github.com/apache/pinot/blob/7305eec8581a4fbbd200b1e7fcb1c8bef380ab64/pinot-controller/src/main/java/org/apache/pinot/controller/validation/RealtimeSegmentValidationManager.java#L192)
 does).
   
   I think making the disk utilization check “sticky” is more useful than 
populating the disk utilization cache synchronously, if I had to just choose 
one. But both would be ideal. Let me know what you think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to