shekhars-li opened a new pull request #1434: URL: https://github.com/apache/samza/pull/1434
Symptoms: - On deploying a new monitor that takes variable time, we observed that there is an immediate drop in the metrics emitted by other samza-admin monitors. Cause: - We schedule all the monitors in samza-admin in a thread pool of size 1 with a fixed rate scheduling strategy. - Our new Monitor depends on a script to get data from the hosts. The script takes variable time to get the data and can sometimes takes a long time to return if the files to be scanned for change is too high. - Other monitors are waiting for the new Monitor to complete execution since the thread pool size is 1. Fix: - We create a new ScheduledExecutorService for every monitor with a thread pool of size 1. - Every monitor now runs in it's own thread and do not block other monitors from scheduling/starting. - If a monitor takes too long to execute (time to return > scheduledTime), new work is not scheduled until previous execution is complete. This prevents queueing up of work. - Updated default scheduling jitter to 100, so that every thread/monitor has some jitter in the first time it is scheduled. Test: - Updated relevant unit tests. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
