Re: HDFS Corruption: How to Troubleshoot or Determine Root Cause?

Jean-Daniel Cryans Tue, 17 May 2011 17:17:15 -0700

Hey Tim,

It looks like you are running with only 1 replica so my first guess is
that you only have 1 datanode and it's writing to /tmp, which was
cleaned at some point.


J-D

On Tue, May 17, 2011 at 5:13 PM, Time Less <timelessn...@gmail.com> wrote:
> I loaded data into HDFS last week, and this morning I was greeted with this
> on the web interface: "WARNING : There are about 32 missing blocks. Please
> check the log or run fsck."
>
> I ran fsck and see several missing and corrupt blocks. The output is
> verbose, so here's a small sample:
>
> /tmp/hadoop-mapred/mapred/staging/hdfs/.staging/job_201104081532_0507/job.jar:
> CORRUPT block blk_-5745991833770623132
> /tmp/hadoop-mapred/mapred/staging/hdfs/.staging/job_201104081532_0507/job.jar:
> MISSING 1 blocks of total size 2945889 B........
> /user/hive/warehouse/player_game_stat/2011-01-15/datafile: CORRUPT block
> blk_1642129438978395720
> /user/hive/warehouse/player_game_stat/2011-01-15/datafile: MISSING 1 blocks
> of total size 67108864 B................
>
> Sometimes the number of dots after the B is quite large (several lines
> long). Some of these are tmp files, but many are important. If this cluster
> were prod, I'd have some splaining to do. I need to determine what caused
> this corruption.
>
> Questions:
>
> What are the dots after the B? What is the significance of the number of
> them?
> Does anyone have suggestions where to start?
> Are there typical misconfigurations or issues that cause corruption &
> missing files?
> What is "the log" that the NameNode web interface is refers to?
>
> Thanks for any infos! I'm... nervous. :)
> --
> Tim Ellis
> Riot Games
>
>

Re: HDFS Corruption: How to Troubleshoot or Determine Root Cause?

Reply via email to