It looks like you are running with only 1 replica so my first guess is that you only have 1 datanode and it's writing to /tmp, which was cleaned at some point.
Hi, J-D! On the last iteration of setting up this cluster, I used 2 replicas and saw similar corruption. Thus I setup this cluster with default 3 replicas (based on the assumption that unusual replica values might expose unusual bugs). I can't find the commandline interface to get replica information for the file, but I was able to browse to it through the web interface, and here's what I see: Contents of directory/user/hive/warehouse/player_game_stat/2011-01-15 ------------------------------ <http://hadooptest5:50075/browseDirectory.jsp?dir=/user/hive/warehouse/player_game_stat&namenodeInfoPort=50070&delegation=null> Name Type Size Replication Block Size Modification Time Permission Owner Group datafile file 231.12 MB 3 64 MB 2011-05-06 21:13 rw-r--r-- hdfs supergroup I'm assuming that means the 1-replica hypothesis is incorrect. I'll follow up on the suggestion about the datanodes writing into /tmp. I had a similar problem with the prior iteration of this cluster (dfs.name.dir wasn't defined, and so NameNode metadata(?) was going into /tmp). I now have a metaquestion: is there a default Hadoop configuration out there somewhere that has all critical parameters at least listed, if not filled out with some sane defaults? I keep discovering undefined parameters via unusual and difficult-to-troubleshoot cluster behaviour. -- Tim Ellis Riot Games