Re: [hackathon] health checks

Oliver Lietz Fri, 21 Sep 2018 04:42:18 -0700

On Wednesday 19 September 2018 10:36:24 Andrei Dulvac wrote:
> Hi guys.

Hi,


> So first of all I acknowledge that conceptually there is some overlap - the
> concepts of health, readiness, liveness themselves overlap.
> When we wrote systemready, we did know of the Sling HCs (at least I did)
> and how they're used. And that was one of the reasons why we decided not to
> use them.
> 
> They're currently used, as Justin put it, for a much broader scope. A
> system can fail a HC and it doesn't mean it's not ready. In one of
> Bertrand's adapt.to presentation from 2013 [0], a security checklist is
> mentioned explicitly - which we use in AEM. It's one of those things that
> requires manual input to turn healthy. The docu also mentions a lot of
> stuff, including susing them as serverside Junit tests [1]. And all those
> things are great.
> 
> > Now the system readyness framework was mostly created to have something
> 
> on Felix level and the capabilities of the Sling Health Checks weren’t
> known.
> 
> Not entirely accurate. We knew of the sling HCs and initially we wanted to
> donate systemready to sling; but it's definitely good it went into felix.
> 
> > The dependencies of Sling HC to Sling are minimal today already: It’s
> 
> Sling thread pool (a felix pendant or just a plain java one can be used)
> and Sling Scheduler (also this can easily be replaced by the standard java
> mechanism).
> 
> In my opinion, that's A LOT. And they're prefixed by "Sling-". Systemready
> has two dependencies: javax.servlet and the osgi API. And it can
> technically run on any framewok. The deps were another reason why we didn't
> use the HCs. Of course, those might grow as it becomes more mature.

both Scheduler and Threads are *Commons* modules and can run without Sling.

Regards,
O.

> > What would make sense is a bridge where a subset of health checks could
> 
> be fed into the readyness framework (i.e. if these X health checks pass,
> the system is considered "ready" and/or "alive").
> 
> > (you just create two tags for readiness and liveness each).
> 
> These don't seem to contradict each other.
> Stefan, did you mean that the SystemReady checks would also become some
> tagged HCs or the other way around? That some tagged HCs would be fed into
> systemready?
> 
> So I'm game for unifying a bit at the felix level and hopefully we don't go
> overboard. I alone just don't have a solution yet that I can say I love
> 100%.
> 
> BTW, Sorry I couldn't make it to the hackathon, it would have been great to
> be part of the discussion.
> 
> - Andrei
> 
> 
> 
> 
> ---
> [0] https://adapt.to/2013/en/schedule/18_healthcheck.html
> [1]
> https://sling.apache.org/documentation/bundles/sling-health-check-tool.html#
> health-checks-as-server-side-junit-tests
> 
> 
> On Wed, Sep 19, 2018 at 1:15 AM Justin Edelson <jus...@justinedelson.com>
> 
> wrote:
> > Hi Georg,
> > Great. It looks like I misread Stefan's notes as being more dramatic than
> > they actually were intended to be :)
> > 
> > Regards,
> > Justin
> > 
> > On Tue, Sep 18, 2018 at 4:48 PM Georg Henzler <slin...@ghenzler.de> wrote:
> > > Hi Justin,
> > > 
> > > there was quite some discussion at adaptTo() around this topic already.
> > 
> > So
> > 
> > > as it stands all requirements to run Sling-based applications in
> > 
> > Kubernetes
> > 
> > > are met already by Sling Health Checks (you just create two tags for
> > > readiness and liveness each). HCs were developed from the first day with
> > > the goal to have them used by load balancers (and not only manual). Also
> > > Sling HCs are more mature in terms of parallel execution, timeout
> > 
> > handling,
> > 
> > > response customizing and special handling like asynchronous checks.
> > > 
> > > 
> > > Now the system readyness framework was mostly created to have something
> > 
> > on
> > 
> > > Felix level and the capabilities of the Sling Health Checks weren’t
> > 
> > known.
> > 
> > > I do agree that it would make sense to have it on Felix level though
> > 
> > (more
> > 
> > > visible to the non-Sling world, as a low level mechanism maybe best
> > 
> > located
> > 
> > > at the lowest framework level). The dependencies of Sling HC to Sling
> > > are
> > > minimal today already: It’s Sling thread pool (a felix pendant or just a
> > > plain java one can be used) and Sling Scheduler (also this can easily be
> > > replaced by the standard java mechanism).
> > > 
> > > > It might make more sense to invert this and identify what the
> > > > readyness
> > > 
> > > framework does (mostly in its OOTB checks and servlets)
> > > 
> > > > and merge that functionality into Sling Health Checks and then move
> > 
> > Sling
> > 
> > > > Health Checks (or solid chunks of it) to Felix.
> > > 
> > > This was the intention, but let’s wait for the feedback from Andrei and
> > > Christian.
> > > 
> > > -Georg
> > > 
> > > Sent from my iPhone
> > > 
> > > > On 18. Sep 2018, at 16:31, Justin Edelson <jus...@justinedelson.com>
> > > 
> > > wrote:
> > > > Hi,
> > > > After reviewing the presentation, this seems like kind of a stretch to
> > > 
> > > me.
> > > 
> > > > IIUC, the System Readyness Framework is (as its name would suggest)
> > > 
> > > solely
> > > 
> > > > concerned with "readyness"  and "liveness" (as seen in the example use
> > > > cases on slide 3) and the API is explicitly designed for this purpose
> > > > without any opportunity for namespace extension (i.e. you can extend
> > 
> > how
> > 
> > > > "readyness" and "liveness" are determined but you can't add new
> > > > categories). Sling Health Checks is concerned with a broader concept
> > > > of
> > > > "health" with no restrictions on namespacing. There are all kinds of
> > > > reasons why a system may be considered "ready" but still fails
> > > > specific
> > > > health checks. In other words, I'm doubtful that there is an overlap
> > 
> > here
> > 
> > > > at a framework level. What would make sense is a bridge where a subset
> > 
> > of
> > 
> > > > health checks could be fed into the readyness framework (i.e. if these
> > 
> > X
> > 
> > > > health checks pass, the system is considered "ready" and/or "alive").
> > 
> > But
> > 
> > > > I'd strongly suggest that the gamut of expression possible with the
> > > 
> > > health
> > > 
> > > > check framework goes far beyond the scope of what the readyness
> > 
> > framework
> > 
> > > > is designed to do. It might make more sense to invert this and
> > > > identify
> > > > what the readyness framework does (mostly in its OOTB checks and
> > > 
> > > servlets)
> > > 
> > > > and merge that functionality into Sling Health Checks and then move
> > 
> > Sling
> > 
> > > > Health Checks (or solid chunks of it) to Felix.
> > > > 
> > > > Or perhaps I've misunderstood the intention of this email/F2F
> > 
> > discussion.
> > 
> > > > But the way this looks is that we are going to take something with a
> > > 
> > > decent
> > > 
> > > > install base and replace it with something a few months old and a much
> > > > smaller functional scope. Just doesn't make sense to me.
> > > > 
> > > > Regards,
> > > > Justin
> > > > 
> > > > On Thu, Sep 13, 2018 at 1:03 PM Stefan Seifert <sseif...@pro-vision.de
> > > > 
> > > > wrote:
> > > >> - currently there is some overlap between sling health checks and the
> > > 
> > > new
> > > 
> > > >> felix system readyness framework presented [1]
> > > >> - the idea is to bring this together within felix
> > > >> - provide a facade for the sling healthcheck API for backwards
> > > >> compatibility
> > > >> 
> > > >> stefan
> > > >> 
> > > >> [1]
> > 
> > https://adapt.to/2018/en/schedule/system-readiness-framework-makes-deploym
> > ent-automation-a-breeze.html

Re: [hackathon] health checks

Reply via email to