[
https://issues.apache.org/jira/browse/HADOOP-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611812#action_12611812
]
Ioannis Koltsidas commented on HADOOP-3585:
-------------------------------------------
Thanks very much for your input!
Regarding Steve's comments:
- My limited experience with smartmontools on different kinds of disks shows
that the smartctl output format is different for SCSI and SATA disks. I have
tested it with 2 different IBM SCSI disks and 2 hitachi SATA disks; that's why
I mention it in the comments. I believe that these 2 formats cover many
different brands, although the attributes that appear in the smartctl output
will vary among different brands and models. I can remove the brand names for
the comments to make it more clear (or I would be happy to include all
different disk models it has been tested with, if people submit their 'smartctl
-A /dev/xxx' output to me for other disk models). I can extend it for other
smartctl output formats fairly easily as well, provided that I get a sample of
them.
- We plan to use the commons logging API for logging messages. I'm working on
that.
- Since it will take a considerable amount of work to make it portable to
non-unix system, I think that it would be better to stick linux for now.
Therefore, the Executor thread will read the system os.name property and will
only start if it is a linux system. (currently this does not happen; a linux
system is assumed and I'm not sure how it will behave on other types of
systems).
- I haven't really looked into how testing of the package should be done. But
considering that failures cannot easily be injected/simulated from within java,
and especially from user-space, I guess that what you suggest is probably the
best way to go.
- Each monitor runs in the Executor thread. An Executor is started for each
NameNode and DataNode instance (in the costructor of classes NameNode and
DataNode) and is terminated when that node is terminated (we plan to do the
same for JobTrackers and Tasktrackers as well). Other than that, no startup or
shutdown code is required.
- One should not start a monitor outside of an executor (unless he knows
exactly what he's doing). Then, an Executor thread runs for each and every
NameNode and DataNode instance on a given machine. However, if more than one
Executors run in the same machine (.i.e., the machine is both a NameNode and a
DataNode or a DataNode and a TaskTracker) then the Executor that has been
spawned first will monitor all system metrics (system log and output of
utilities such as ifconfig, smartctl etc) and the Hadoop log for the object by
which it was spawned (i.e., the NameNode log for a NameNode, the DataNode log
for a DataNode etc). All executors started on the same machine after this one
will monitor no system-related metrics; only the hadoop-related logs for the
object that spawned the Executor will be monitored by those. Note that if more
than one hadoop/HDFS instances are running on the same machine you have to
replace "machine" with "hadoop/HDFS instance" on the above.
Regarding Dhruba's comments:
1. I agree with your idea, but I'm not sure how feasible it is. Some concerns
about this approach:
- This map-reduce job is needed to run on all machines, i.e. all
TaskTrackers and the JobTracker. I'm not sure how easy it is to force this to
happen. Furthermore, if some DataNodes are not TaskTrackers, then how would we
collect the data from those? If you think that forcing a map-reduce job to run
on all nodes is feasible, then we could go for it.
- I think this is ok for parsing the logs and uploading the collected
records, but I am not sure how appropriate it is for reading the output of
system utilities. I suppose that an administrator would like to run the
log-parsing monitors infrequently (e.g once a day) as they might take
non-negligible time to complete. On the other hand, he is more likely to want
to read the output of system utilities at smaller intervals (e.g. for ifconfig,
smart atributes and temperature sensors). This interval could be an hour or
less. So, if a map-reduce job would need to be created for these every hour, a
substantial overhead might be introduced (especially if map-reduce jobs are to
be run one all nodes).
2. I believe we can do that. I'll look into it.
3. I would be happy to change the name to whatever people think is more
representative of the contents of the package. Maybe we can have a logcollector
package and a failure monitoring subpackage (to capture that also system
utilities are read and for the failure identification code).
4. The filename of the uploaded HDFS file has the form
failmon<hostname><timestamp>.zip, so filenames are expected to be unique. In
the same context, the best thing to do, in my opinion, would be to append all
locally gathered records to an HDFS file, provided that the upload can be in a
compressed form. I'm not very familiar with the append API yet and I also am
not sure whether the communication can be compressed, but if it is feasible I
think it would be the best way to go. In the current approach, if very small
files are uploaded, a lot of space will be wasted (since the block size is
large).
> Hardware Failure Monitoring in large clusters running Hadoop/HDFS
> -----------------------------------------------------------------
>
> Key: HADOOP-3585
> URL: https://issues.apache.org/jira/browse/HADOOP-3585
> Project: Hadoop Core
> Issue Type: New Feature
> Environment: Linux
> Reporter: Ioannis Koltsidas
> Priority: Minor
> Attachments: FailMon-standalone.zip, failmon.pdf,
> FailMon_Package_descrip.html, HADOOP-3585.patch
>
> Original Estimate: 480h
> Remaining Estimate: 480h
>
> At IBM we're interested in identifying hardware failures on large clusters
> running Hadoop/HDFS. We are working on a framework that will enable nodes to
> identify failures on their hardware using the Hadoop log, the system log and
> various OS hardware diagnosing utilities. The implementation details are not
> very clear, but you can see a draft of our design in the attached document.
> We are pretty interested in Hadoop and system logs from failed machines, so
> if you are in possession of such, you are very welcome to contribute them;
> they would be of great value for hardware failure diagnosing.
> Some details about our design can be found in the attached document
> failmon.doc. More details will follow in a later post.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.