[
https://issues.apache.org/jira/browse/SLING-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426470#comment-15426470
]
Stefan Egli commented on SLING-5965:
------------------------------------
[~cziegeler], re
bq. I think we should move the HC to a separate bundle
I believe we don't consistently separate health-checks from their bundles, so
probably have introduced a dependency to the HC bundle elsewhere. The exception
to this seems to be
[installer/hc|https://github.com/apache/sling/tree/trunk/installer/hc].
How would you see this separation, a _new_ bundle with 1 class?
> Metrics and a Health-Check for Scheduler to detect long-running Quartz-Jobs
> ---------------------------------------------------------------------------
>
> Key: SLING-5965
> URL: https://issues.apache.org/jira/browse/SLING-5965
> Project: Sling
> Issue Type: New Feature
> Components: Commons
> Affects Versions: Commons Scheduler 2.5.0
> Reporter: Stefan Egli
> Assignee: Stefan Egli
> Fix For: Commons Scheduler 2.5.2
>
> Attachments: SLING-5965.patch
>
>
> Sling Scheduler jobs (aka Quartz-Jobs) should typically be fast running jobs.
> They are served from a thread-pool and should occupy that thread only for a
> short amount of time.
> If there are 'misbehaving' quartz-jobs that run for a very long time, they
> start to occupy threads from that thread-pool, thus have an influence on the
> performance of other scheduled/quartz-jobs.
> We should have metrics (using
> [sling.commons.metrics|https://sling.apache.org/documentation/bundles/metrics.html])
> that provide information about internas of Sling Scheduler, such as average,
> max etc duration of scheduled jobs, as well as how many jobs are currently
> running and since when was the oldest job running.
> Based on this, a Health-Check can monitor the 'oldest job running' metric and
> flag {{critical}} when eg the oldest job is older than {{60'000ms}}
> (configurable, default).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)