Re: adding HA monitoring to bigtop

Jos Backus Thu, 11 Oct 2012 16:28:23 -0700

On Thu, Oct 11, 2012 at 12:55 PM, Roman Shaposhnik <[email protected]> wrote:

> On Thu, Oct 11, 2012 at 1:34 AM, Steve Loughran <[email protected]>
> wrote:
> >> Anyway... It seems
> >> like this monitoring is very Hadoop HA specific,
> >
> > It could actually monitor any service with one or more of
> >  pid
> >  port
> >  URL
> >
> > The Hadoop-ness currently comes from
> >  1. specific probes for HDFS and JT
> >  2. use of hadoop XML config for settings (trivial fix)
> >  3. -probe order fixed in source
> >  4. no current support for adding new probes just by putting them on the
> > classpath and declaring them
> >  5. an installation that goes under /usr/lib/hadoop and picks up the
> hadoop
> > classpath and native lib so its hadoop probes are always in sync with the
> > runtime,
> >
> > I'd fix 2 & 3 by having a better config language that lets you specify an
> > order of operations
>
> This sounds pretty interesting. I remember Jos wanting to make
> his monitoring code (I believe based on daemontools) available under:
>     https://issues.apache.org/jira/browse/BIGTOP-460
> there's a more generic JIRA as well:
>     https://issues.apache.org/jira/browse/BIGTOP-263
>
> Feel free to put more concrete proposals there. Also, if Jos could
> comment -- that'll be terrific!
>

I'm still working on this, now in the context of HDP. My goal at work is to
ultimately use the notify functionality in daemontools-encore to send
program crash/restart alerts, and change the various wrapper scripts
(start-dfs.sh, etc.) to use daemontools-encore (and shmux), per the ticket.

It would be great to be able to deprecate the init scripts completely
long-term and replace them with hooks for process supervision systems such
as daemontools-encore, which is very portable, or for the most popular
process supervision tools out there (Upstart, SMF, launchd). This should
not be hard as they are all very similar in concept, and would provide more
value out of the box compared to the current init scripts. It would also
ease maintenance, as the process supervisor replaces all the duplicated
code in the init scripts, and adds useful extra functionality, such as the
ability to immediately detect crashes without polling. Also, right now a
major source of trouble is the mess that are the startup/wrapper scripts
for Hadoop, because of the plethora of global environment variables and
shell code that sets/reads those variables.

There's also the continuing issue that the various organizations/vendors
can't seem to make their minds up about the script UIs; there are the
hadoop, mapred, hdfs, yarn, hadoop-daemon.sh and hadoop-daemons.sh
commands, all which source various config files and use a ton of global
variables. It's unclear which ones to use such that the right configuration
is applied. So my plan is to use the most low-level interface and stick all
needed environment variables in /service/foo/env/... so they are easy to
find, query and set in a platform-independent manner.

TL;DR: init scripts should die.

Jos
-- 
Jos Backus
jos at catnook.com

Re: adding HA monitoring to bigtop

Reply via email to