[jira] [Commented] (SLING-5965) Metrics and a Health-Check for Scheduler to detect long-running Quartz-Jobs

Stefan Egli (JIRA) Tue, 22 Aug 2017 02:04:38 -0700

    [ 
https://issues.apache.org/jira/browse/SLING-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136537#comment-16136537
 ]


Stefan Egli commented on SLING-5965:
------------------------------------

[~cziegeler]
looks good, tested, works fine. except for the new dropmetrics 3.2 dependency:
bq. updated dependency to dropmetrics 3.2.3
seen the discussion in SLING-7047 - as I understand the goal of it, we should 
rather use dropmetrics 3.2(.x) than 3.1. That dropmetrics bundle is not part of 
scheduler (nor metrics) though. Why does this justify this dependency upgrade 
though? For those running on Sun JDKs where 3.1 works fine, why would we not 
still support them? IMO we have a larger backwards compatibility if we remain 
with 3.1 (assuming it will also work with 3.2.x when the dependency is on 3.1)

> Metrics and a Health-Check for Scheduler to detect long-running Quartz-Jobs
> ---------------------------------------------------------------------------
>
>                 Key: SLING-5965
>                 URL: https://issues.apache.org/jira/browse/SLING-5965
>             Project: Sling
>          Issue Type: New Feature
>          Components: Commons
>    Affects Versions: Commons Scheduler 2.5.0
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>             Fix For: Commons Scheduler 2.6.4
>
>         Attachments: numRunningJobs.jpg, oldestRunningJob.jpg, patch.txt, 
> SchedulerHealthCheck.jpg, SLING-5965.patch, SLING-5965.v2.patch.txt, 
> SLING-5965.v3.patch.txt, SLING-5965.v4.patch.txt, SLING-5965.v5.patch.txt, 
> timers.jpg
>
>
> Sling Scheduler jobs (aka Quartz-Jobs) should typically be fast running jobs. 
> They are served from a thread-pool and should occupy that thread only for a 
> short amount of time.
> If there are 'misbehaving' quartz-jobs that run for a very long time, they 
> start to occupy threads from that thread-pool, thus have an influence on the 
> performance of other scheduled/quartz-jobs.
> We should have metrics (using 
> [sling.commons.metrics|https://sling.apache.org/documentation/bundles/metrics.html])
>  that provide information about internas of Sling Scheduler, such as average, 
> max etc duration of scheduled jobs, as well as how many jobs are currently 
> running and since when was the oldest job running.
> Based on this, a Health-Check can monitor the 'oldest job running' metric and 
> flag {{critical}} when eg the oldest job is older than {{60'000ms}} 
> (configurable, default).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (SLING-5965) Metrics and a Health-Check for Scheduler to detect long-running Quartz-Jobs

Reply via email to