[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872031#comment-13872031 ] Bertrand Delacretaz commented on SLING-3278: Is the 500ms caching time used to expire results from the cache, and is it the same for all results? If yes I think results will need different times to live depending on their nature - I'd suggest adding a method to the HealthCheckExecution result that is used to expire results from the cache, maybe getTimeToLiveMsec(). For now this can be based on this default value, but it will allow results to provide appropriate values later on, without changing the API. Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Carsten Ziegeler Fix For: Health Check Core 1.0.8 Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-21-withExecutorResult.patch, SLING-3278-hc.webconsole-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-21.patch, SLING-3278-more-explicit-use-of-constructor.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872036#comment-13872036 ] Carsten Ziegeler commented on SLING-3278: - Thanks for your patch, I've just applied it. We should at least cache for 1500ms - this was the time used within the jmx implementation; right now we have 2000ms, so I think this is fine. In general I personally would even increase the caching time :) Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Carsten Ziegeler Fix For: Health Check Core 1.0.8 Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-21-withExecutorResult.patch, SLING-3278-hc.webconsole-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-21.patch, SLING-3278-more-explicit-use-of-constructor.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872059#comment-13872059 ] Georg Henzler commented on SLING-3278: -- Ok, let's settle for 2000ms, the most important is that it is configurable :) Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Carsten Ziegeler Fix For: Health Check Core 1.0.8 Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-21-withExecutorResult.patch, SLING-3278-hc.webconsole-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-21.patch, SLING-3278-more-explicit-use-of-constructor.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872164#comment-13872164 ] Bertrand Delacretaz commented on SLING-3278: Any thoughts about my getTimeToLiveMsec() ? See above, my comment of 2 hours ago. Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Carsten Ziegeler Fix For: Health Check Core 1.0.8 Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-21-withExecutorResult.patch, SLING-3278-hc.webconsole-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-21.patch, SLING-3278-more-explicit-use-of-constructor.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872179#comment-13872179 ] Carsten Ziegeler commented on SLING-3278: - I'm not sure how a getter method should help in having different TTLs? As a client of the executor service, when you get a result you usually do not care what the TTL of the check is. There is not much you can do with this value Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Carsten Ziegeler Fix For: Health Check Core 1.0.8 Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-21-withExecutorResult.patch, SLING-3278-hc.webconsole-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-21.patch, SLING-3278-more-explicit-use-of-constructor.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872209#comment-13872209 ] Bertrand Delacretaz commented on SLING-3278: The executor needs it to set a per-result (or per-HC) time to live, see the add resultCacheTtlInMs as property... comment in HealthCheckResultCache. But actually looking at it again it's only HealthCheckMetadata that needs to provide that, it makes sense that this is linked to a particular HealthCheck instead of a particular result. So I think what we need is a HealthCheckMetadata.getResultTTLMsec() method, and if that returns zero the HealthCheckResultCache uses the globally configured TTL. Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Carsten Ziegeler Fix For: Health Check Core 1.0.8 Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-21-withExecutorResult.patch, SLING-3278-hc.webconsole-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-21.patch, SLING-3278-more-explicit-use-of-constructor.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872234#comment-13872234 ] Carsten Ziegeler commented on SLING-3278: - Got it, so we define a new service property for this, yes sounds good to me. What about defining that a provided value of less than 1 means no caching at all? Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Carsten Ziegeler Fix For: Health Check Core 1.0.8 Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-21-withExecutorResult.patch, SLING-3278-hc.webconsole-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-21.patch, SLING-3278-more-explicit-use-of-constructor.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872246#comment-13872246 ] Bertrand Delacretaz commented on SLING-3278: We need two special values: # do not cache, can be zero? # use default TTL, can be less than zero? I'd make this the default value, if the HC service doesn't provide a value via a service property . Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Carsten Ziegeler Fix For: Health Check Core 1.0.8 Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-21-withExecutorResult.patch, SLING-3278-hc.webconsole-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-21.patch, SLING-3278-more-explicit-use-of-constructor.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873087#comment-13873087 ] Georg Henzler commented on SLING-3278: -- +1 for adding resultCacheTtlInMs as service prop + HealthCheckMetadata.getResultTTLMsec() Regarding the special values: Using zero for do not cache is good, for 2. use default TTL we could maybe use the type Long (instead of long) and the value null (and an empty string in the console config UI). Leave empty for using the global default reads better in the documentation than Use -1 for the global default. Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Carsten Ziegeler Fix For: Health Check Core 1.0.8 Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-21-withExecutorResult.patch, SLING-3278-hc.webconsole-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-21.patch, SLING-3278-more-explicit-use-of-constructor.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870750#comment-13870750 ] Carsten Ziegeler commented on SLING-3278: - As discussed in the mailing list, we switched back to execute(String) and merged the jmx implementation into the core in order to use the caching/executor service. I think with the current implementation we're fine and can close this issue. Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Carsten Ziegeler Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-21-withExecutorResult.patch, SLING-3278-hc.webconsole-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-21.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859422#comment-13859422 ] Bertrand Delacretaz commented on SLING-3278: I have tried to clarify the use cases, and also suggested API changes, at https://cwiki.apache.org/confluence/display/SLING/Health+Checks+Executor+Design Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Carsten Ziegeler Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-21-withExecutorResult.patch, SLING-3278-hc.webconsole-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-21.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857487#comment-13857487 ] Bertrand Delacretaz commented on SLING-3278: Also, for my use cases I'll need a way to selectively clear cached results and specify a timeout when executing HCs. We can discuss this separately (started that in [1]), just wanted to mention it here so we don't forget. [1] http://sling.markmail.org/thread/xg4k2pu4ii7xdgbw Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Carsten Ziegeler Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-21-withExecutorResult.patch, SLING-3278-hc.webconsole-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-21.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856225#comment-13856225 ] Bertrand Delacretaz commented on SLING-3278: I still think execute(ServiceReference) is an unnecessary leak of implementation details, will discuss it on the dev list. Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Carsten Ziegeler Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-21-withExecutorResult.patch, SLING-3278-hc.webconsole-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-21.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855715#comment-13855715 ] Carsten Ziegeler commented on SLING-3278: - Thanks again Georg for your updated patch - I've committed a modified version in rev 1553133 in order to make going forward much easier. I left out the async stuff, used different names/signatures for the executor service and also did some minor code clean ups/changes. Now, let's discuss the things - as a first step I think we should focus on the API 1) I renamed the runXX methods to execute - as the service is named Executor and changed the result to CollectionHealthCheckResult 2) I totally agree, that there is rarely the use to call this executor from within own code, so jmx and web console are the number one clients for this. Therefore I think we can go with an OSGi free interface and directly use the service. The JMX code already has the service anyways and doing it once in the web console code is not too hard either. So we don't make it harder for users as they don't use this anyway :) 3) I removed the async execution for now. As Bertrand suggested, let's discuss this on the list separately from this and keep the focus on the executor service 4) I think the HealthCheckResult interface is fine for now; we might need to tweak it a little bit before we close this issue Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Carsten Ziegeler Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-21-withExecutorResult.patch, SLING-3278-hc.webconsole-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-21.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855763#comment-13855763 ] Carsten Ziegeler commented on SLING-3278: - I've updated the implementation a little bit - now the service.id is used as a cache key. WIth this we don't need to hold any service reference objects (or anything else anymore). And therefore I now agree to use the ServiceReference as a single argument for executing a health check, so forgot comment 2) from above. Without the service reference we don't have anything we could use as a cache key Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Carsten Ziegeler Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-21-withExecutorResult.patch, SLING-3278-hc.webconsole-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-21.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854799#comment-13854799 ] Georg Henzler commented on SLING-3278: -- Find another patch attached that uses a bundle-private ExecutionResult. Changes to the existing API are really minimal now (the biggest change having an interface HealthCheckResult, but I think that's good). Other than that the from a HealthCheck implementor's point of view, nothing has changed as the Result is still the class to construct and return. Oher than that i left org.apache.sling.hc.api.HealthCheckExecutor.run(ServiceReference) with the service reference for now as I believe it's still the cleanest way. Having had a closer look both name/class-name can not be used as they are not unique for factory components like ScriptableHealthCheck (only the service PID would be unique, and that's certainly something that is OSGi specific itself and a service reference would be needed on client side to retrieve it). Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Georg Henzler Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-21-withExecutorResult.patch, SLING-3278-hc.webconsole-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-21.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852731#comment-13852731 ] Georg Henzler commented on SLING-3278: -- 1) maybe we should rather go run(String fullyQualifiedClassname) then - although getting the service is easy it's also easy to get it wrong. And it's just a waste to get the service if you don't need to (the executor does not get the service if there is a cache hit, there is async results or a future is running already) 2) I agree we should get of the method setHealthCheckDescriptor(), but adding as constructor element is not an easy option (the health check itself is constructing it, but it is set by the executor later). If we add it as a constructor parameter we need a clone constructor Result(resultFromCheckItself, hcDescriptor, finishedDate, elapsedTime). Or we put the class result behind an implementation class (that would be a nicer option IMHO), the problem with that is that new Result(...) is used directly in the checks and is part of the current API = the clone constructor is probably still the better solution 3) The descriptor contains currently the hc name and the tags - this meta information is useful in the UI (it is shown in the web console). Before my patch the service reference was used directly in the web console (using a lot more code, e.g. dealing with the fact that the OSGi array props do appear as simple string if only one element is contained). Now IMHO it is cleanly separated and the HC meta information is available to the UI (without being tied to the OSGi API, also see your first point ;-)). Another option would be to copy name and tags as plain properties to the Result, but IMHO it is cleaner and more extensible to leave the constant meta data (not changing for multiple hc executions) in a separate class (also it is a really useful key class for the executor, if we got rid of it in the API we should at least leave it in the impl.executor package) Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Georg Henzler Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-19.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852761#comment-13852761 ] Bertrand Delacretaz commented on SLING-3278: I'm also for having an executor method that takes a single HealthCheck - getting the service is not expensive, I don't think that's a problem, and that's consistent with how we generally do things in Sling. bq. We cannot use a created date in the Result constructor because the instance of the result is created by the implementing class... It's created when the check is done executing, as the Result is immutable there's no other way. So I think storing the creation timestamp at that point is fine. If a health check starts at T, runs for 2 minutes and has a time to live of 1 minute, you want to kill it at T+3, not T+1. So IMO we should set the Result creation time in its constructor, have a method to set the time to live and an isExpired method that becomes true at T+3 in my example. In the meantime I've thought about a fluent API that looks better for the executor, started a thread on our dev list to discuss that ([RT] Fluent API for HealthCheckExecutor). That would have little impact to the core of your patch, but provide a more flexible API, including per-call execution timeouts and the ability to clear some or all cached results at will. Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Georg Henzler Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-19.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852762#comment-13852762 ] Carsten Ziegeler commented on SLING-3278: - 1) The class name is implementation detail, no one knows it - I can leave it we go with health check name, so if I want to execute a hc via the executor, it has to have a name or a tag. Deal? 2) The Executor could return an ExecutionResult (or maybe there is a better name), containing the additional information and as a field the result 3) Ok, I'm fine with having this descriptor as long as it is a simple data object not containing any complex objects like a ServiceReference. The idea behind all of this is to make this serializable and be able to serialize it to a remote machine. As long as these are data objects, everything is fine, but a ServiceReference can never be transferred Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Georg Henzler Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-19.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852998#comment-13852998 ] Georg Henzler commented on SLING-3278: -- Re Fluent API as well as The class name is implementation detail, no one knows it: I think we have a disconnect in how we think the health check executor could/should be used. I think implementation projects really only want to be able to add checks and run them via web console or JMX, they would never want to execute the checks in a custom fashion other than configuring timeouts or restricting it to certain tags (also see mailing list post). @Carsten / 1): I think you need the run(hc-class) mainly for the JMX module: Don't you have access to the classname via the property component.name there? The problem with the name is that it may contain spaces and therefore is not really a nice id for using it as a parameter? 2) The ExecutionResult is a good idea - that way all timing data can go there. Also, Execution Result should be an interface and ExecutionResultImpl can be hidden in the impl.executor package - that way ExecutionResultImpl does not have to be immutable and the timing data can be collected correctly (also solving Bertrands concerns). 3) If we create the interface ExecutionResult, we can hide the existence of the HealthCheckDescriptor and move it to impl.executor. The field serviceReference can be marked transient and is therefore be taken out from Serialization (if that really ever is needed, I think probably not) Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Georg Henzler Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-19.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851862#comment-13851862 ] Bertrand Delacretaz commented on SLING-3278: Competition is good! That's how we get excellent code ;-) Thanks Georg for your revised patch, I see your point about parallel execution of the HC, agree that it's a good thing and that's the cause for some additional complexity w.r.t my variant. but it's worth it. Here are my comments in no particular order. I agree with the reformatting issue that Carsten mentions. IMO the Result timing information should be: * Result creation time, set automatically in constructor * Optional time to live, can be set with a method but not changed after that * Optional HC execution elapsed time, can be set with a method but not changed after that * isExpired() method that uses creation time + time to live We do not use @author tags, as is customary in Apache projects. You will get credit in the commit message but we don't want code to appear like it belongs to specific people, especially as over time this changes. The it tests fail, I'll attach a patch that fixes them. I'd still like to remove the async execution from this patch, and rediscuss on list. My use case would be to execute some HC at regular intervals based on their tags, instead of based on individual HC configurations. This can then be implemented in the support bundle. We need to keep the CompositeHealthCheck as we already released it, I agree that it can lead to executing some health checks several times if you don't choose tags wisely. That's not really a problem, and even less with this improved execution mechanism as results are cached. It shouldn't be hard to adapt CompositeHealthCheck to use the new executor. The maven-sling-plugin shouldn't be added to the pom.xml in this patch - the Sling parent pom as an autoInstallBundle profile which does that already. The SlowHealthCheck demo HC from my patch should be included once we apply your patch, with its config as that's a useful demo for caching results and execution timeout. Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Georg Henzler Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-18.patch, SLING-3278-hc.core-HealthCheckExecutorService-v0.5.patch, SLING-3278-hc.webconsole-2013-12-18.patch, SLING-3278-hc.webconsole-v0.5.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852213#comment-13852213 ] Georg Henzler commented on SLING-3278: -- Competition in general is not that bad, I agree :) Find attached my patch that adresses comments from above: * Removed author tags, reformatting lines and sling plugin from pom * I added the method Result run(ServiceReference healthCheckReference) to execute a single health check with caching/timeout checks in place. The parameter is not of type HealthCheck as this would push the responsibility for getting/ungetting the service to the user of the interface * I tried to not use HealthCheckDescriptor in result (and make it private to the bundle by moving it to impl.executor)... however, I think the code gets worse then (an extra map would be needed to keep references by descriptor). In general I think it's good design to have the descriptor: It is a different type of attributes and for the user it is immedately clear that these attributes won't change over time. Also ServiceReferences can safely be cached reused (opposed to the service class itself) and HealthCheckDescriptor is immutable. * Timing information: We cannot use a created date in the Result constructor because the instance of the result is created by the implementing class and there is no guarantee, that new Result(..) is called at the beginning of a check (rather it will normally be called at the very end!). Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Georg Henzler Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-18.patch, SLING-3278-hc.core-HealthCheckExecutorService-v0.5.patch, SLING-3278-hc.webconsole-2013-12-18.patch, SLING-3278-hc.webconsole-v0.5.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852634#comment-13852634 ] Carsten Ziegeler commented on SLING-3278: - Thanks for the updated patch. I think we're making good progress! Now, some comments: - run(ServiceReference) looks convenient but creates an API that is tied to OSGi API - we should avoid that. Getting a service is fairly easy - and with passing in the service object we ensure that the client is allowed/able to get the service. - Result must be immutable - that's the contract we had before and we have to keep it. So no setter methods - if the constructor approach gets ugly I suggest a builder passed approach, like createResult(log).name(bla)...build(); - I don't think we need the health check descriptor - so far I don't see a need for client code for this information. And if, why not simply copy the information from the service reference into an immutable data object? But I would go without this Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Georg Henzler Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-19.patch, SLING-3278-hc.webconsole-2013-12-19.patch, hc-it.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851218#comment-13851218 ] Georg Henzler commented on SLING-3278: -- Hi Bertrand, now we have two competing implementations :( I was hoping you would base your work on my patch or just give feedback. So I have an improved version of the patch that makes as little changes as possible (but still is more capable as the version from your patch): * HealthCheckExecutor.runAllForTags() acts as a facade to the user interface, implementation details (timeouts, hc lookup etc.) are handled in the executor, but the user of the interface does not need to know them. Also the design is cleaner as in the web console, you don't have to read service reference properties (as the results have the property HealthCheckDescriptor) * The tests are truely run in parallel and the latest results are returned (as HealthCheckExecutorImpl.runAllForTags(String...) waits for the futures) * Caching is in place (configurable, 2sec default) * Async execution can easily be achieved by specifying a property (this could easily be taken out if necessary) In contrary your patch has the following disadvantages: * The checks are somewhat triggered for parallel execution, but usually you receive the result from the last call (if the last call to the check is 2hours in the past, then the result will be 2h old). I really think the HealthCheckExecutor needs to be the broker for futures in order to be able to achieve the goals as stated in the issue description (e.g. look at HealthCheckExecutorImpl.waitForFuturesRespectingTimeout() for what you cannot achieve with your design) * The HealthCheckExecutor is only capable of running one check at at time - I believe the main use case for the HC in general is to run all tests and get a current system health as quickly as possible. No matter how the actual implementation ends up looking like, I would really like to see the signature SetResult runAllForTags(String... tags) in the interface HealthCheckExecutor My latest patch is attached, and I think it's better to use that version as a base for further work. Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Georg Henzler Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-18.patch, SLING-3278-hc.core-HealthCheckExecutorService-v0.5.patch, SLING-3278-hc.webconsole-2013-12-18.patch, SLING-3278-hc.webconsole-v0.5.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851397#comment-13851397 ] Carsten Ziegeler commented on SLING-3278: - Hi thanks for your patch - unfortunately your patch does also reformat a lot of the existing code, could you please redo the patch without reformatting? I'm missing a method to execute a single specific health check, the advantage of the executor in that case should be that it takes care of caching etc. even for a single check The Result has a reference to a new api class HealthCheckDescriptor which in turn holds a service object - we should avoid this and keep the result as a simple data object Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Georg Henzler Attachments: SLING-3278-bertrand.patch, SLING-3278-hc.core-HealthCheckExecutorService-2013-12-18.patch, SLING-3278-hc.core-HealthCheckExecutorService-v0.5.patch, SLING-3278-hc.webconsole-2013-12-18.patch, SLING-3278-hc.webconsole-v0.5.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutor service
[ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846328#comment-13846328 ] Georg Henzler commented on SLING-3278: -- The property for async execution property can make sense when you want to make sure a check is called not as often as the health check itself (e.g. only twice a day). I'm pretty much done, No 2 of Bertrand's list and unit tests are missing if you like you can have a look at the patches to give feedback before I submit a final one. Impl Notes: * The main entry method is org.apache.sling.hc.core.executor.HealthCheckExecutor.runAllForTags(String...) * Results have now a HealthCheckDescriptor that contains meta info for the check (also used in the executor as cache key etc.) * Async is supported by attribute hc.async.cronExpression, a service listener is in place for registering/unregistering of jobs (org.apache.sling.hc.core.executor.AsyncHealthCheckExecutor) * I did add a natural order to results (failed tests first, then by name alphabetically) - if not using this the order would be arbitrary (depending on execution time) * The result has an additional finishDate and elapsedTime (I think finish date is more interesting for caching than the start date!) Other thoughts (not in patch): * I'm not sure if the CompositeHealthCheck makes sense - is this not a grouping competing with the tags? It is easy to configure it in a way that some checks are executed twice, especially if you run all checks without giving a tag (and the HealthCheckExecutor cannot prevent it as the CompositeHealthCheck looks like any other check to it) * Exceptions: The result should be able to carry a exception - I would even go as far as adding throws Exception to the execute() signature (this would not break any existing implementation classes) and generically add a last critical log if the HC happens to throw an exception Provide a HealthCheckExecutor service - Key: SLING-3278 URL: https://issues.apache.org/jira/browse/SLING-3278 Project: Sling Issue Type: New Feature Components: Health Check Reporter: Georg Henzler Assignee: Georg Henzler Attachments: SLING-3278-hc.core-HealthCheckExecutorService-v0.5.patch, SLING-3278-hc.webconsole-v0.5.patch Goals: * Be able to get an overall (aggregated) result as quickly as possible (ideally 2sec) * Whenever possible, return most current results (e.g. for a memory check) * Provide a declarative way for async checks (async checks should be the exception though) Approach * Run checks in parallel * Make sure long running (or even stuck) checks are timed out * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. hc.async). See also http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402 http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html -- This message was sent by Atlassian JIRA (v6.1.4#6159)