Nevermind, I got it. Files are disappearing because the dfs dir is under /tmp (yes this is a dev cluster). When resources runs low on this box, space is reclaimed and files go poof.
On Fri, Jun 12, 2009 at 12:54 PM, Bill Graham <[email protected]> wrote: > Hi, > > For the second time in two weeks I'm getting errors that blocks that once > existed have gone missing from HDFS and I'm baffled as to the cause, or even > how to troubleshoot the issue. Any help would be appreciated. > > From the hive shell when I run a select on a table that used to work fine, > I the following error: > > Failed with exception Could not obtain block: blk_2102982369652986130_1284 > file=/user/hive/warehouse/apiusage/part-00000 > > When I look at the web ui for the name node I see this part file listed, > but when I click on it, it says "Empy file". Some of the parts in that > directory show their content, but more than half return "Empty file" so they > clearly still exist in the namenode metadata. Just the blocks are missing. > > Grepping the logs I can see when the part was written and then accessed > multiple times after, but that's it. Looking on the slaves I no longer see > references to one of the bad blocks, so they're definitely gone. This > command returns na-da: > > bin/slaves.sh ls -l /tmp/hadoop-chrish/dfs/data/current/*/ | grep > blk_2102982369652986130 > > > Any ideas what could cause this, or where else I should look for clues? > This behavior is troubling. It happens after the files have been there for a > week or two. > > thanks, > Bill >
