[jira] [Commented] (HDFS-5022) Add explicit error message in log when datanode went out of service because of low disk space

Jim Huang (JIRA) Tue, 23 Jul 2013 17:33:38 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717826#comment-13717826
 ]


Jim Huang commented on HDFS-5022:
---------------------------------

Can you please add the affected version(s)?  

In Hadoop 1.x code base, DataNode.java's checkDiskError() checks and throws 
DiskOutOfSpaceException("No space left on device") in the datanode log.  So you 
should see this exception logged in the datanode log.  

In Hadoop 0.23, HDFS-1332 already implemented the enhancement to log the 
reasons for not able to place replicas.  

If you are talking about enhancing the writing client to handle the datanode 
DiskOutOfSpaceException exception, then this is similar to HDFS-264 (related to 
HDFS-483 and HADOOP-4679).  

HDFS-373 handles the improvement for NameNode when it is struggling with 
replication.  However, typical Hadoop administrator can obtain the HDFS metrics 
for HDFS "Free", "Used", "Total" through various means to setup alerting 
through programs like Nagios.  If this is over kill for a 1 node cluster, then 
"hadoop dfsadmin -report" or NameNode Web UI will provide this information as 
well.  


                
> Add explicit error message in log when datanode went out of service because 
> of low disk space
> ---------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5022
>                 URL: https://issues.apache.org/jira/browse/HDFS-5022
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Minor
>
> Currently, if a datanode run out of configured disk space, it will become out 
> of service silently, there's no way for user to analyze what happened to the 
> datanode. Actually, user even won't notice the datanode is out-of-service, 
> not any warning message in either namenode or datanode log.
> One example is if there's only one single datanode, and we are running a MR 
> job writing huge data into HDFS, then when the disk is full, we can only 
> observe error message like: 
> {noformat}
> java.io.IOException: File xxx could only be replicated to 0 nodes instead of 1
> {noformat}
> and don't know what happened and how to resolve the issue.
> We need to improve this by adding more explicit error message in both 
> datanode log and the message given to MR application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5022) Add explicit error message in log when datanode went out of service because of low disk space

Reply via email to