I'Ve done the suggested changes: - moved jmx stuff into the core - changed the signature to execute(String... tags) - Added service reference to metadata
The only thing I'm unclear with are the different result possibilites. I think there are these cases when execute is called: a) no result in the cache/cached result is obsolete, hc executes within the timeout -> hc is executed, real result is cached and returned b) no result in the cache/cached result is obsolete, hc executes not within the timeout -> hc is executed, fake result is returned c) valid result in the cache -> hc is not executed, cached result is returned I think there is no need for a client to find out whether it's case a) or c) - if interested client code can inspect the getFinishedAt() method. Now, the interesting case still is b) - I see the point that we return WARN in that case - I would assume that getElapsedTimeInMs returns -1 and getFinishedAt() returns null. So both could be used as an indicator for this situation. WDYT? Regards Carsten 2014/1/5 Georg Henzler <sl...@cq-eclipse-plugin.net> > Hi, > > I think this looks pretty good now! Thanks for adding HealthCheckMetadata > :) > > 1. "b) merge the jmx implementation into the core" > I'm happy with that and the signature execute(String ... tags) is my > preferred > > 2. Status NO_RESULT > I think it's probably cleaner to add a property hasTimedOut to the > ExecutionResult - this is also better for the case that we can return an > old result from cache. The main disadvantage of NO_RESULT is that it is not > immediately clear from a caller's perspective, "how bad" the status > NO_RESULT is. A naive caller might think this is sort of ok (GREEN) but I > think it really is WARN (as the timeout is configured in a way that should > be sufficient for all checks). Therefore explicitly returning WARN plus > hasTimedOut=true in the ExecutionResult is IMHO the clearer option from a > consumer's point of view. > > 3. Execution Timeouts defined by caller / "executing the HCs tagged with > external ... with longer timeout" > The globally defined timeout is sufficient, I agree with Carsten. The > "external HCs" example is valid and I would define the global timeout to > fit to the external checks, the point is that for the "internal HCs" you > don't need to define a separate timeout because > * they are usually quick - and if they aren't, they should probably be > rewritten! ;-) > * from a conceptual point of view, the maximum time you are prepared to > wait is still the same as for the long running ones (why would you set the > timeout for internal checks to 500ms? If the checks are quick it makes no > difference, if a check requires 800ms I believe you are still better off > getting a proper result after 800ms instead of a timeout one after 500ms). > Anyway, as suggested let's leave it for later if we find a good use case > that justifies the more complex API. > > 4. HealthCheckDescriptor is obsolete? > The current version in SVN uses HealthCheckDescriptor and > HealthCheckMetadata in utils - you probably created it that way to not have > the property serviceReference in HealthCheckMetadata. On the other hand > HealthCheckMetadata is taking a service reference as constructor argument > anyway - why not just merge the two classes in HealthCheckMetadata and get > rid of HealthCheckDescriptor? > > 5. Have you had a look at https://issues.apache.org/jira/browse/SLING-3302? > The tabular result could also include the property hasTimedOut from 2. > > Best Regards > Georg > > Am 03.01.2014 17:29, schrieb Carsten Ziegeler: > > Yes, I would prefer a separate state, parsing the log is too error prone. >> >> Carsten >> >> >> 2014/1/3 Bertrand Delacretaz <bdelacre...@apache.org> >> >> Hi, >>> >>> On Fri, Jan 3, 2014 at 3:28 PM, Carsten Ziegeler <cziege...@apache.org> >>> wrote: >>> > Bertrand wrote: >>> >> As is you can compute the age of a result, that should already help. >>> >> >>> > Hmm and what about the case, where there is no result yet and the HC >>> takes >>> > longer than the timeout? >>> >>> In my prototype I returned an HEALTH_CHECK_ERROR status for that, with >>> a Result that includes a log that explains the problem. Haven't >>> checked what the current implementation does. >>> >>> We could also return a specific NO_RESULT state in this case. >>> >>> -Bertrand >>> >>> > -- Carsten Ziegeler cziege...@apache.org