On Thu, Oct 11, 2012 at 12:55 PM, Roman Shaposhnik <[email protected]> wrote:
> On Thu, Oct 11, 2012 at 1:34 AM, Steve Loughran <[email protected]> > wrote: > >> Anyway... It seems > >> like this monitoring is very Hadoop HA specific, > > > > It could actually monitor any service with one or more of > > pid > > port > > URL > > > > The Hadoop-ness currently comes from > > 1. specific probes for HDFS and JT > > 2. use of hadoop XML config for settings (trivial fix) > > 3. -probe order fixed in source > > 4. no current support for adding new probes just by putting them on the > > classpath and declaring them > > 5. an installation that goes under /usr/lib/hadoop and picks up the > hadoop > > classpath and native lib so its hadoop probes are always in sync with the > > runtime, > > > > I'd fix 2 & 3 by having a better config language that lets you specify an > > order of operations > > This sounds pretty interesting. I remember Jos wanting to make > his monitoring code (I believe based on daemontools) available under: > https://issues.apache.org/jira/browse/BIGTOP-460 > there's a more generic JIRA as well: > https://issues.apache.org/jira/browse/BIGTOP-263 > > Feel free to put more concrete proposals there. Also, if Jos could > comment -- that'll be terrific! > I'm still working on this, now in the context of HDP. My goal at work is to ultimately use the notify functionality in daemontools-encore to send program crash/restart alerts, and change the various wrapper scripts (start-dfs.sh, etc.) to use daemontools-encore (and shmux), per the ticket. It would be great to be able to deprecate the init scripts completely long-term and replace them with hooks for process supervision systems such as daemontools-encore, which is very portable, or for the most popular process supervision tools out there (Upstart, SMF, launchd). This should not be hard as they are all very similar in concept, and would provide more value out of the box compared to the current init scripts. It would also ease maintenance, as the process supervisor replaces all the duplicated code in the init scripts, and adds useful extra functionality, such as the ability to immediately detect crashes without polling. Also, right now a major source of trouble is the mess that are the startup/wrapper scripts for Hadoop, because of the plethora of global environment variables and shell code that sets/reads those variables. There's also the continuing issue that the various organizations/vendors can't seem to make their minds up about the script UIs; there are the hadoop, mapred, hdfs, yarn, hadoop-daemon.sh and hadoop-daemons.sh commands, all which source various config files and use a ton of global variables. It's unclear which ones to use such that the right configuration is applied. So my plan is to use the most low-level interface and stick all needed environment variables in /service/foo/env/... so they are easy to find, query and set in a platform-independent manner. TL;DR: init scripts should die. Jos -- Jos Backus jos at catnook.com
