I'Ve done the suggested changes:
- moved jmx stuff into the core
- changed the signature to execute(String... tags)
- Added service reference to metadata

The only thing I'm unclear with are the different result possibilites. I
think there are these cases when execute is called:
a) no result in the cache/cached result is obsolete, hc executes within the
timeout -> hc is executed, real result is cached and returned
b) no result in the cache/cached result is obsolete, hc executes not within
the timeout -> hc is executed, fake result is returned
c) valid result in the cache -> hc is not executed, cached result is
returned

I think there is no need for a client to find out whether it's case a) or
c) - if interested client code can inspect the getFinishedAt() method.
Now, the interesting case still is b) - I see the point that we return WARN
in that case - I would assume that getElapsedTimeInMs returns -1 and
getFinishedAt() returns null. So both could be used as an indicator for
this situation.

WDYT?

Regards
Carsten


2014/1/5 Georg Henzler <sl...@cq-eclipse-plugin.net>

> Hi,
>
> I think this looks pretty good now! Thanks for adding HealthCheckMetadata
> :)
>
> 1. "b) merge the jmx implementation into the core"
> I'm happy with that and the signature execute(String ... tags) is my
> preferred
>
> 2. Status NO_RESULT
> I think it's probably cleaner to add a property hasTimedOut to the
> ExecutionResult - this is also better for the case that we can return an
> old result from cache. The main disadvantage of NO_RESULT is that it is not
> immediately clear from a caller's perspective, "how bad" the status
> NO_RESULT is. A naive caller might think this is sort of ok (GREEN) but I
> think it really is WARN (as the timeout is configured in a way that should
> be sufficient for all checks). Therefore explicitly returning WARN plus
> hasTimedOut=true in the ExecutionResult is IMHO the clearer option from a
> consumer's point of view.
>
> 3. Execution Timeouts defined by caller / "executing the HCs tagged with
> external ... with longer timeout"
> The globally defined timeout is sufficient, I agree with Carsten. The
> "external HCs" example is valid and I would define the global timeout to
> fit to the external checks, the point is that for the "internal HCs" you
> don't need to define a separate timeout because
>  * they are usually quick - and if they aren't, they should probably be
> rewritten! ;-)
>  * from a conceptual point of view, the maximum time you are prepared to
> wait is still the same as for the long running ones (why would you set the
> timeout for internal checks to 500ms? If the checks are quick it makes no
> difference, if a check requires 800ms I believe you are still better off
> getting a proper result after 800ms instead of a timeout one after 500ms).
> Anyway, as suggested let's leave it for later if we find a good use case
> that justifies the more complex API.
>
> 4. HealthCheckDescriptor is obsolete?
> The current version in SVN uses HealthCheckDescriptor and
> HealthCheckMetadata in utils - you probably created it that way to not have
> the property serviceReference in HealthCheckMetadata. On the other hand
> HealthCheckMetadata is taking a service reference as constructor argument
> anyway - why not just merge the two classes in HealthCheckMetadata and get
> rid of HealthCheckDescriptor?
>
> 5. Have you had a look at https://issues.apache.org/jira/browse/SLING-3302? 
> The tabular result could also include the property hasTimedOut from 2.
>
> Best Regards
> Georg
>
> Am 03.01.2014 17:29, schrieb Carsten Ziegeler:
>
>  Yes, I would prefer a separate state, parsing the log is too error prone.
>>
>> Carsten
>>
>>
>> 2014/1/3 Bertrand Delacretaz <bdelacre...@apache.org>
>>
>>  Hi,
>>>
>>> On Fri, Jan 3, 2014 at 3:28 PM, Carsten Ziegeler <cziege...@apache.org>
>>> wrote:
>>> > Bertrand wrote:
>>> >> As is you can compute the age of a result, that should already help.
>>> >>
>>> > Hmm and what about the case, where there is no result yet and the HC
>>> takes
>>> > longer than the timeout?
>>>
>>> In my prototype I returned an HEALTH_CHECK_ERROR status for that, with
>>> a Result that includes a log that explains the problem. Haven't
>>> checked what the current implementation does.
>>>
>>> We could also return a specific NO_RESULT state in this case.
>>>
>>> -Bertrand
>>>
>>>
>


-- 
Carsten Ziegeler
cziege...@apache.org

Reply via email to