[
https://issues.apache.org/jira/browse/HADOOP-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Allen Wittenauer updated HADOOP-6473:
-------------------------------------
Labels: ipv6 (was: )
> Add hadoop health check/diagnostics to run from command line, JSP pages,
> other tools
> ------------------------------------------------------------------------------------
>
> Key: HADOOP-6473
> URL: https://issues.apache.org/jira/browse/HADOOP-6473
> Project: Hadoop Common
> Issue Type: New Feature
> Reporter: Steve Loughran
> Priority: Minor
> Labels: ipv6
>
> If the lifecycle ping() is for short-duration "are we still alive" checks,
> Hadoop still needs something bigger to check the overall system health,.This
> would be for end users, but also for automated cluster deployment, a complete
> validation of the cluster,
> It could be a command line tool, and something that runs on different nodes,
> checked via IPC or JSP. the idea would be to do thorough checks with good
> diagnostics. Oh, and they should be executable through JUnit too.
> For example
> -if running on windows, check that cygwin is on the path, fail with a
> pointer to a wiki issue if not
> -datanodes should check that it can create locks on the filesystem, create
> files, timestamps are (roughly) aligned with local time.
> -namenodes should try and create files/locks in the filesystem
> -task tracker should try and exec() something
> -run through the classpath and look for problems; duplicate JARs,
> unsupported java, xerces versions, etc.
> * The number of tests should be extensible -rather than one single class with
> all the tests, there'd be something separate for name, task, data, job
> tracker nodes
> * They can't be in the nodes themselves, as they should be executable even if
> the nodes don't come up.
> * output could be in human readable text or html, and a form that could be
> processed through hadoop itself in future
> * these tests could have side effects, such as actually trying to submit work
> to a cluster
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)