[
https://issues.apache.org/jira/browse/SLING-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457294#comment-13457294
]
Jörg Hoh commented on SLING-2597:
---------------------------------
I agree to both of you.
Currently things like Sling eventing are hard to check, because they don't
expose (enough) internal state. Exposing it through the API would probably
bloat it. So JMX might be a viable solution here.
A problem with JMX is the dynamic nature of OSGI. Eg. when you add a new Sling
event queue, the statistics of that queue might appear under a certain JMX
objectname. But your external monitoring (custom service ontop of Sling) is not
notified of that new event queue and its statistics. Unless you check for sling
event queues via OSGI and get their statistic from JMX. This is a scenario I
want to avoid.
To the definition of CRITICAL: I know that this is hard to do. But maybe some
basic configurable rules could be sufficient (e.g. queue-length > 100k). Or we
could define multiple aspects of sling eventing (delay, throughput, avg
processing time, ...) and handle and configure each of them individually.
> Provide interface for monitoring services
> -----------------------------------------
>
> Key: SLING-2597
> URL: https://issues.apache.org/jira/browse/SLING-2597
> Project: Sling
> Issue Type: New Feature
> Reporter: Jörg Hoh
>
> There should be an interface which one can query to get information about the
> status of a service implementing this interface.
> eg.
> public interface HealthCheckable {
> public int getStatus();
> }
> For the return value for this method we could use:
> static int OK = 0;
> static int WARNING = 1;
> static int CRITICAL = 2;
> static int UNKNOWN = 3;
> (these are the values which Nagios uses as return values for its plugins, see
> http://nagiosplug.sourceforge.net/developer-guidelines.html#AEN76).
> The decision what value is returned is delegated to the service, so maybe
> they need to have some configuration to define the points, where a "OK"
> becomes "WARNING".
> Via OSGI whiteboard pattern we can collect then all services providing status
> information and calculate an overall status of the system.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira