I loaded data into HDFS last week, and this morning I was greeted with this on the web interface: "WARNING : There are about 32 missing blocks. Please check the log or run fsck."
I ran fsck and see several missing and corrupt blocks. The output is verbose, so here's a small sample: /tmp/hadoop-mapred/mapred/staging/hdfs/.staging/job_201104081532_0507/job.jar: CORRUPT block blk_-5745991833770623132 /tmp/hadoop-mapred/mapred/staging/hdfs/.staging/job_201104081532_0507/job.jar: MISSING 1 blocks of total size 2945889 B........ /user/hive/warehouse/player_game_stat/2011-01-15/datafile: CORRUPT block blk_1642129438978395720 /user/hive/warehouse/player_game_stat/2011-01-15/datafile: MISSING 1 blocks of total size 67108864 B................ Sometimes the number of dots after the B is quite large (several lines long). Some of these are tmp files, but many are important. If this cluster were prod, I'd have some splaining to do. I need to determine what caused this corruption. Questions: 1. What are the dots after the B? What is the significance of the number of them? 2. Does anyone have suggestions where to start? 3. Are there typical misconfigurations or issues that cause corruption & missing files? 4. What is "the log" that the NameNode web interface is refers to? Thanks for any infos! I'm... nervous. :) -- Tim Ellis Riot Games