shekhars-li opened a new pull request #1434:
URL: https://github.com/apache/samza/pull/1434


   Symptoms:
   - On deploying a new monitor that takes variable time, we observed that 
there is an immediate drop in the metrics emitted by other samza-admin 
monitors. 
   
   Cause:
   - We schedule all the monitors in samza-admin in a thread pool of size 1 
with a fixed rate scheduling strategy.
   - Our new Monitor depends on a script to get data from the hosts. The script 
takes variable time to get the data and can sometimes takes a long time to 
return if the files to be scanned for change is too high.
   - Other monitors are waiting for the new Monitor to complete execution since 
the thread pool size is 1. 
   
   Fix:
   - We create a new ScheduledExecutorService for every monitor with a thread 
pool of size 1. 
   - Every monitor now runs in it's own thread and do not block other monitors 
from scheduling/starting. 
   - If a monitor takes too long to execute (time to return > scheduledTime), 
new work is not scheduled until previous execution is complete. This prevents 
queueing up of work. 
   - Updated default scheduling jitter to 100, so that every thread/monitor has 
some jitter in the first time it is scheduled. 
   
   Test:
   - Updated relevant unit tests.  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to