Re: HDFS behaving strangely

Ben Hardy Mon, 25 Jan 2010 17:38:47 -0800

For me the cause of this problem turned out to be a bug in Linux 2.6.21,
which is used in the default Elastic MapReduce AMI we run on c1-mediums.


What was going on in the files that I uploaded is that in one particular
directory with 15,000 odd files in it, some of the files were appearing in
output filesystem commands like find and ls TWICE. Really weird. So when
hadoop tried to copy these files into HDFS it quite rightly complained that
it had seen that file before.

Even though all my filenames are unique.

So watch out for that one folks, it's a doozy, and it's not a Hadoop bug,
but still might bite you.

-b

On Mon, Jan 25, 2010 at 11:08 AM, Mark Kerzner <[email protected]>wrote:

> I hit this error in -copyFromLocal, or a similar one, all the time. It is
> also found in .19 and .20.
>
> One can work around manually. For example, copy the file to a different
> place in HDFS, remove the offending file in HDFS, and rename your file into
> the problem one. This works, and after this I have no problem.
>
> The funny thing is that it happens for specific file names, only a few. For
> example, job.prop always gives a problem, whereas job.properties does not.
>
> If I were a good boy, I would debug it with "job.prop" file, but of course
> I
> just found a workaround and forgot about it.
>
> Sincerely,
> Mark
>
> On Mon, Jan 25, 2010 at 1:01 PM, Ben Hardy <[email protected]> wrote:
>
> > Hey folks,
> >
> > We're running a 100 node cluster on Hadoop 0.18.3 using Amazon Elastic
> > MapReduce.
> >
> > We've been uploading data to this cluster via SCP and using hadoop fs
> > -copyFromLocal to get it into HDFS.
> >
> > Generally this works fine but our last run saw a failure in this
> operation
> > which only said "RuntimeError".
> >
> > So we blew away the destination directory in HDFS and tried the
> > copyFromLocal again.
> >
> > This time it failed because it thinks one of the files it's trying to
> copy
> > to HDFS is already in HDFS, however, I don't get how this is possible if
> we
> > just blew away the destination's parent directory. Subequent attepts
> result
> > in identical results.
> >
> > hadoop fsck reports a HEALTHY filesystem.
> >
> > We do see a lot of errors like those below in the namenode log. Are these
> > normal, or perhaps related to the problem described above?
> >
> > Would appreciate any advice or suggestions.
> >
> > b
> >
> > 2010-01-25 16:34:19,762 INFO org.apache.hadoop.dfs.StateChange (IPC
> Server
> > handler 12 on 9000): BLOCK* NameSystem.addToInvalidates:
> > blk_-3060969094589165545 is added to invalidSet of 10.245.103.240:9200
> > 2010-01-25 16:34:19,762 INFO org.apache.hadoop.dfs.StateChange (IPC
> Server
> > handler 12 on 9000): BLOCK* NameSystem.addToInvalidates:
> > blk_-3060969094589165545 is added to invalidSet of 10.242.25.206:9200
> > 2010-01-25 16:34:19,762 INFO org.apache.hadoop.dfs.StateChange (IPC
> Server
> > handler 12 on 9000): BLOCK* NameSystem.addToInvalidates:
> > blk_5935615666845780861 is added to invalidSet of 10.242.15.111:9200
> > 2010-01-25 16:34:19,762 INFO org.apache.hadoop.dfs.StateChange (IPC
> Server
> > handler 12 on 9000): BLOCK* NameSystem.addToInvalidates:
> > blk_5935615666845780861 is added to invalidSet of 10.244.107.18:9200
> >
>

Re: HDFS behaving strangely

Reply via email to