Re: HDFS Corruption: How to Troubleshoot or Determine Root Cause?

Time Less Tue, 17 May 2011 18:22:32 -0700

It looks like you are running with only 1 replica so my first guess is
that you only have 1 datanode and it's writing to /tmp, which was
cleaned at some point.


Hi, J-D!

On the last iteration of setting up this cluster, I used 2 replicas and saw
similar corruption. Thus I setup this cluster with default 3 replicas (based
on the assumption that unusual replica values might expose unusual bugs). I
can't find the commandline interface to get replica information for the
file, but I was able to browse to it through the web interface, and here's
what I see:
Contents of directory/user/hive/warehouse/player_game_stat/2011-01-15
------------------------------
<http://hadooptest5:50075/browseDirectory.jsp?dir=/user/hive/warehouse/player_game_stat&namenodeInfoPort=50070&delegation=null>
Name Type Size Replication Block Size Modification Time Permission Owner
Group datafile file 231.12 MB 3 64 MB 2011-05-06 21:13 rw-r--r-- hdfs
supergroup
I'm assuming that means the 1-replica hypothesis is incorrect.

I'll follow up on the suggestion about the datanodes writing into /tmp. I
had a similar problem with the prior iteration of this cluster (dfs.name.dir
wasn't defined, and so NameNode metadata(?) was going into /tmp).

I now have a metaquestion: is there a default Hadoop configuration out there
somewhere that has all critical parameters at least listed, if not filled
out with some sane defaults? I keep discovering undefined parameters via
unusual and difficult-to-troubleshoot cluster behaviour.

-- 
Tim Ellis
Riot Games

Re: HDFS Corruption: How to Troubleshoot or Determine Root Cause?

Reply via email to