Add hadoop health check/diagnostics to run from command line, JSP pages, other
tools
------------------------------------------------------------------------------------
Key: HADOOP-3893
URL: https://issues.apache.org/jira/browse/HADOOP-3893
Project: Hadoop Core
Issue Type: New Feature
Components: dfs, mapred
Affects Versions: 0.19.0
Reporter: Steve Loughran
Priority: Minor
If the lifecycle ping() is for short-duration "are we still alive" checks,
Hadoop still needs something bigger to check the overall system health,.This
would be for end users, but also for automated cluster deployment, a complete
validation of the cluster,
It could be a command line tool, and something that runs on different nodes,
checked via IPC or JSP. the idea would be to do thorough checks with good
diagnostics. Oh, and they should be executable through JUnit too.
For example
-if running on windows, check that cygwin is on the path, fail with a pointer
to a wiki issue if not
-datanodes should check that it can create locks on the filesystem, create
files, timestamps are (roughly) aligned with local time.
-namenodes should try and create files/locks in the filesystem
-task tracker should try and exec() something
-run through the classpath and look for problems; duplicate JARs, unsupported
java, xerces versions, etc.
* The number of tests should be extensible -rather than one single class with
all the tests, there'd be something separate for name, task, data, job tracker
nodes
* They can't be in the nodes themselves, as they should be executable even if
the nodes don't come up.
* output could be in human readable text or html, and a form that could be
processed through hadoop itself in future
* these tests could have side effects, such as actually trying to submit work
to a cluster
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.