[
https://issues.apache.org/jira/browse/HADOOP-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590547#action_12590547
]
Raghu Angadi commented on HADOOP-3232:
--------------------------------------
Yes, looks like DU goes through previous also... So instead of both block
report and DU causing problems now it is DU..
Can you clarify if "datanodes lose contact" means NameNode actually marks them
"dead"?
> Although having the datanodes lose contact with the namenode because it's
> checking disk usage seems like quite a serious bug to me.
I agree. Doing these in the background without blocking normal DataNode
functions takes a little bit of restructuring. We should keep this jira open.
> not sure why sda is more busy, although that is where the logs are located
This might help your situation. If you find more info, please inform us.
> Datanodes time out
> ------------------
>
> Key: HADOOP-3232
> URL: https://issues.apache.org/jira/browse/HADOOP-3232
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.16.2
> Environment: 10 node cluster + 1 namenode
> Reporter: Johan Oskarsson
> Priority: Critical
> Fix For: 0.18.0
>
> Attachments: hadoop-hadoop-datanode-new.log,
> hadoop-hadoop-datanode-new.out, hadoop-hadoop-datanode.out,
> hadoop-hadoop-namenode-master2.out
>
>
> I recently upgraded to 0.16.2 from 0.15.2 on our 10 node cluster.
> Unfortunately we're seeing datanode timeout issues. In previous versions
> we've often seen in the nn webui that one or two datanodes "last contact"
> goes from the usual 0-3 sec to ~200-300 before it drops down to 0 again.
> This causes mild discomfort but the big problems appear when all nodes do
> this at once, as happened a few times after the upgrade.
> It was suggested that this could be due to namenode garbage collection, but
> looking at the gc log output it doesn't seem to be the case.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.