RE: datanode down alert

Tanping Wang Fri, 25 Feb 2011 11:19:53 -0800

Maybe grep for

2011-02-25 18:47:05,564 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Decommission complete for 
node 102.1.1.1:50010

In the namenode log to see if decommission is completed?

I remember a similar problem was reported just a few days ago ( in attachment) 
by James Litton.    According to James, no block was missing after the node was 
removed, however, it was unclear when/if the decommission process was finished.
From: Rita [mailto:rmorgan...@gmail.com]
Sent: Thursday, February 24, 2011 5:59 AM
To: hdfs-user@hadoop.apache.org
Cc: Harsh J
Subject: Re: datanode down alert

Thanks for the response.

I am asking because of the following issue, 
https://issues.apache.org/jira/browse/HDFS-694

When I decommission a data node it shows up in the "Dead list" on the webgui 
coincidentally it also shows up in the "Live" nodes.

I want to make sure this node is fully decommissioned before I remove it from 
the cluster.

On Tue, Feb 15, 2011 at 9:13 AM, Harsh J 
<qwertyman...@gmail.com<mailto:qwertyman...@gmail.com>> wrote:
I know of a way but I do not know for sure if that is what you're looking for:

DFSClient.datanodeReport(DataNodeReportType.DEAD) should give you a
list of all DEAD data nodes as per the NameNode.

Although I believe the reports cost a lot, so do not do it often (rpcs the NN).

On Tue, Feb 15, 2011 at 6:51 PM, Rita 
<rmorgan...@gmail.com<mailto:rmorgan...@gmail.com>> wrote:
> Is there a programmatic way to determine if a datanode is down?
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>

--
Harsh J
www.harshj.com<http://www.harshj.com>

--
--- Get your facts first, then you can distort them as you please.--

--- Begin Message ---

Tanping,

 Thank you for the reply. The nodes were marked as “decommissioning in 
progress.” My concern was that they never reached a decommissioned state. I 
have since begun taking the nodes down and have not have any data blocks 
missing, so I suspect the process worked. It was just unclear when the process 
was complete.

James


On 2/17/11 12:59 PM, "Tanping Wang" <tanp...@yahoo-inc.com> wrote:



James,
After issuing a command to decommission a node, you should at least be able to 
see the following log messages in the namenode logs

Setting the excludes files to some_file_contains_decommissioing_hostname
Refreshing hosts (include/exclude) list

If you do not see these log messages, maybe you want to check
1)     Weather you have set

<property>

 <name>dfs.hosts.exclude</name>

 <value>some_file_contains_decommissioing_hostname</value>

</property>

In hdfs-site.xml
2)     If you have your this decommissioning hostname file in place.

Regards,
Tanping

From: James Litton [mailto:james.lit...@chacha.com]
Sent: Friday, February 11, 2011 1:10 PM
To: hdfs-user@hadoop.apache.org
Subject: Decommissioning Nodes

While decommissioning nodes I am seeing the following in my namenode logs:
2011-02-11 21:05:16,290 WARN 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough 
replicas, still in need of 5

I haven’t seen any progress of decommissioning nodes in several days. I have 12 
total nodes with 6 being decommissioned and a replication factor of 3. How long 
should I expect this to take? Is there a way to force this to move forward?

Thank you.

--- End Message ---

RE: datanode down alert

Reply via email to