Datanodes time out
------------------
Key: HADOOP-3232
URL: https://issues.apache.org/jira/browse/HADOOP-3232
Project: Hadoop Core
Issue Type: Bug
Components: dfs
Affects Versions: 0.16.2
Environment: 10 node cluster + 1 namenode
Reporter: Johan Oskarsson
Priority: Critical
Fix For: 0.16.3
I recently upgraded to 0.16.2 from 0.15.2 on our 10 node cluster.
Unfortunately we're seeing datanode timeout issues. In previous versions we've
often seen in the nn webui that one or two datanodes "last contact" goes from
the usual 0-3 sec to ~200-300 before it drops down to 0 again.
This causes mild discomfort but the big problems appear when all nodes do this
at once, as happened a few times after the upgrade.
It was suggested that this could be due to namenode garbage collection, but
looking at the gc log output it doesn't seem to be the case.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.