[ 
https://issues.apache.org/jira/browse/HADOOP-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611812#action_12611812
 ] 

Ioannis Koltsidas commented on HADOOP-3585:
-------------------------------------------

Thanks very much for your input!

Regarding Steve's comments:

- My limited experience with smartmontools on different kinds of disks shows 
that the smartctl output format is different for SCSI and SATA disks. I have 
tested it with 2 different IBM SCSI disks and 2 hitachi SATA disks; that's why 
I mention it in the comments. I believe that these 2 formats cover many 
different brands, although the attributes that appear in the smartctl output 
will vary among different brands and models. I can remove the brand names for 
the comments to make it more clear (or I would be happy to include all 
different disk models it has been tested with, if people submit their 'smartctl 
-A /dev/xxx' output to me for other disk models). I can extend it for other 
smartctl output formats fairly easily as well, provided that I get a sample of 
them.

- We plan to use the commons logging API for logging messages. I'm working on 
that.

- Since it will take a considerable amount of work to make it portable to  
non-unix system, I think that it would be better to stick linux for now. 
Therefore, the Executor thread will read the system os.name property and will 
only start if it is a linux system.  (currently this does not happen; a linux 
system is assumed and I'm not sure how it will behave on other types of 
systems).

- I haven't really looked into how testing of the package should be done. But 
considering that failures cannot easily be injected/simulated from within java, 
and especially from user-space, I guess that what you suggest is probably the 
best way to go.

- Each monitor runs in the Executor thread. An Executor is started for each 
NameNode and DataNode instance (in the costructor of classes NameNode and 
DataNode) and is terminated when that node is terminated (we plan to do the 
same for JobTrackers and Tasktrackers as well). Other than that, no startup or 
shutdown code is required.

- One should not start a monitor outside of an executor (unless he knows 
exactly what he's doing). Then, an Executor thread runs for each and every 
NameNode and DataNode instance on a given machine. However, if more than one 
Executors run in the same machine (.i.e., the machine is both a NameNode and a 
DataNode or a DataNode and a TaskTracker) then the Executor that has been 
spawned first will monitor all system metrics (system log and output of 
utilities such as ifconfig, smartctl etc) and the Hadoop log for the object by 
which it was spawned (i.e., the NameNode log for a NameNode, the DataNode log 
for a DataNode etc). All executors started on the same machine after this one 
will monitor no system-related metrics; only the hadoop-related logs for the 
object that spawned the Executor will be monitored by those. Note that if more 
than one hadoop/HDFS instances are running on the same machine you have to 
replace "machine" with "hadoop/HDFS instance" on the above.


Regarding Dhruba's comments:

1. I agree with your idea, but I'm not sure how feasible it is. Some concerns 
about this approach:
    - This map-reduce job is needed to run on all machines, i.e. all 
TaskTrackers and the JobTracker. I'm not sure how easy it is to force this to 
happen. Furthermore, if some DataNodes are not TaskTrackers, then how would we 
collect the data from those? If you think that forcing a map-reduce job to run 
on all nodes is feasible, then we could go for it.
    - I think this is ok for parsing the logs and uploading the collected 
records, but I am not sure how appropriate it is for reading the output of 
system utilities. I suppose that an administrator would like to run the 
log-parsing monitors infrequently (e.g once a day) as they might take 
non-negligible time to complete. On the other hand, he is more likely to want 
to read the output of system utilities at smaller intervals (e.g. for ifconfig, 
smart atributes and temperature sensors). This interval could be an hour or 
less. So, if a map-reduce job would need to be created for these every hour, a 
substantial overhead might be introduced (especially if map-reduce jobs are to 
be run one all nodes).

2. I believe we can do that. I'll look into it.

3. I would be happy to change the name to whatever people think is more 
representative of the contents of the package. Maybe we can have a logcollector 
package and a failure monitoring subpackage (to capture that also system 
utilities are read and for the failure identification code). 

4. The filename of the uploaded HDFS file has the form 
failmon<hostname><timestamp>.zip, so filenames are expected to be unique. In 
the same context, the best thing to do, in my opinion, would be to append all 
locally gathered records to an HDFS file, provided that the upload can be in a 
compressed form. I'm not very familiar with the append API yet and I also am 
not sure whether the communication can be compressed, but if it is feasible I 
think it would be the best way to go. In the current approach, if very small 
files are uploaded, a lot of space will be wasted (since the block size is 
large).  

> Hardware Failure Monitoring in large clusters running Hadoop/HDFS
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3585
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3585
>             Project: Hadoop Core
>          Issue Type: New Feature
>         Environment: Linux
>            Reporter: Ioannis Koltsidas
>            Priority: Minor
>         Attachments: FailMon-standalone.zip, failmon.pdf, 
> FailMon_Package_descrip.html, HADOOP-3585.patch
>
>   Original Estimate: 480h
>  Remaining Estimate: 480h
>
> At IBM we're interested in identifying hardware failures on large clusters 
> running Hadoop/HDFS. We are working on a framework that will enable nodes to 
> identify failures on their hardware using the Hadoop log, the system log and 
> various OS hardware diagnosing utilities. The implementation details are not 
> very clear, but you can see a draft of our design in the attached document. 
> We are pretty interested in Hadoop and system logs from failed machines, so 
> if you are in possession of such, you are very welcome to contribute them; 
> they would be of great value for hardware failure diagnosing.
> Some details about our design can be found in the attached document 
> failmon.doc. More details will follow in a later post.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to