Hi Hadoopers,
we are experiencing a lot of "Could not obtain block / Could not get
block locations IOExceptions" when processing a 400 GB large Map/Red job
using our 6 nodes DFS & MapRed (v. 0.16.4) cluster. Each node is
equipped with a 400GB Sata HDD and running Suse Linux Enterprise
Edition. While processing this "huge" MapRed job, the name node doesn't
seem to receive heartbeats from datanodes for up to a couple of minutes
and thus marks those nodes as dead even they are still alive and serving
blocks according to their logs. We first suspected network congestion
and measured the inter-node bandwidth using scp - receiving throughputs
of 30MB/s. CPU utilization is about 100% when processing the job,
however, the tasktracker instances shouldn't cause such datanode drop outs?
In the datanode logs, we see a lot of java.io.IOException: Block
blk_-7943096461180653598 is valid, and cannot be written to. errors...
Any ideas? Thanks in advance.
Cu on the 'net,
Bye - bye,
<<<<< André <<<< >>>> èrbnA >>>>>