You may be facing the other well-known problem in Hadoop - don't use many small files:
http://www.cloudera.com/blog/2009/02/02/the-small-files-problem/ On Mon, Jan 25, 2010 at 7:38 PM, Ben Hardy <[email protected]> wrote: > For me the cause of this problem turned out to be a bug in Linux 2.6.21, > which is used in the default Elastic MapReduce AMI we run on c1-mediums. > > What was going on in the files that I uploaded is that in one particular > directory with 15,000 odd files in it, some of the files were appearing in > output filesystem commands like find and ls TWICE. Really weird. So when > hadoop tried to copy these files into HDFS it quite rightly complained that > it had seen that file before. > > Even though all my filenames are unique. > > So watch out for that one folks, it's a doozy, and it's not a Hadoop bug, > but still might bite you. > > -b > > On Mon, Jan 25, 2010 at 11:08 AM, Mark Kerzner <[email protected] > >wrote: > > > I hit this error in -copyFromLocal, or a similar one, all the time. It is > > also found in .19 and .20. > > > > One can work around manually. For example, copy the file to a different > > place in HDFS, remove the offending file in HDFS, and rename your file > into > > the problem one. This works, and after this I have no problem. > > > > The funny thing is that it happens for specific file names, only a few. > For > > example, job.prop always gives a problem, whereas job.properties does > not. > > > > If I were a good boy, I would debug it with "job.prop" file, but of > course > > I > > just found a workaround and forgot about it. > > > > Sincerely, > > Mark > > > > On Mon, Jan 25, 2010 at 1:01 PM, Ben Hardy <[email protected]> wrote: > > > > > Hey folks, > > > > > > We're running a 100 node cluster on Hadoop 0.18.3 using Amazon Elastic > > > MapReduce. > > > > > > We've been uploading data to this cluster via SCP and using hadoop fs > > > -copyFromLocal to get it into HDFS. > > > > > > Generally this works fine but our last run saw a failure in this > > operation > > > which only said "RuntimeError". > > > > > > So we blew away the destination directory in HDFS and tried the > > > copyFromLocal again. > > > > > > This time it failed because it thinks one of the files it's trying to > > copy > > > to HDFS is already in HDFS, however, I don't get how this is possible > if > > we > > > just blew away the destination's parent directory. Subequent attepts > > result > > > in identical results. > > > > > > hadoop fsck reports a HEALTHY filesystem. > > > > > > We do see a lot of errors like those below in the namenode log. Are > these > > > normal, or perhaps related to the problem described above? > > > > > > Would appreciate any advice or suggestions. > > > > > > b > > > > > > 2010-01-25 16:34:19,762 INFO org.apache.hadoop.dfs.StateChange (IPC > > Server > > > handler 12 on 9000): BLOCK* NameSystem.addToInvalidates: > > > blk_-3060969094589165545 is added to invalidSet of 10.245.103.240:9200 > > > 2010-01-25 16:34:19,762 INFO org.apache.hadoop.dfs.StateChange (IPC > > Server > > > handler 12 on 9000): BLOCK* NameSystem.addToInvalidates: > > > blk_-3060969094589165545 is added to invalidSet of 10.242.25.206:9200 > > > 2010-01-25 16:34:19,762 INFO org.apache.hadoop.dfs.StateChange (IPC > > Server > > > handler 12 on 9000): BLOCK* NameSystem.addToInvalidates: > > > blk_5935615666845780861 is added to invalidSet of 10.242.15.111:9200 > > > 2010-01-25 16:34:19,762 INFO org.apache.hadoop.dfs.StateChange (IPC > > Server > > > handler 12 on 9000): BLOCK* NameSystem.addToInvalidates: > > > blk_5935615666845780861 is added to invalidSet of 10.244.107.18:9200 > > > > > >
