I even tried to lower number of parallel jobs even further but I still get these errors. Any suggestion on how to troubleshoot this issue would be very helpful. Should I run hadoop fsck? How do people troubleshoot such issues?? Does it sound like a bug?
2012-04-27 14:37:42,921 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2012-04-27 14:37:42,931 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.199:50010java.io.EOFException 2012-04-27 14:37:42,932 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_6343044536824463287_24619 2012-04-27 14:37:42,932 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.199:50010 2012-04-27 14:37:42,935 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.204:50010java.io.EOFException 2012-04-27 14:37:42,935 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_2837215798109471362_24620 2012-04-27 14:37:42,936 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.204:50010 2012-04-27 14:37:42,937 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2012-04-27 14:37:42,939 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.198:50010java.io.EOFException 2012-04-27 14:37:42,939 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_2223489090936415027_24620 2012-04-27 14:37:42,940 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.198:50010 2012-04-27 14:37:42,943 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.197:50010java.io.EOFException 2012-04-27 14:37:42,943 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_1265169201875643059_24620 2012-04-27 14:37:42,944 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.197:50010 2012-04-27 14:37:42,945 [Thread-5] WARN org.apache.hadoop.hdfs.DFSClient - DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3446) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2627) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2822) 2012-04-27 14:37:42,945 [Thread-5] WARN org.apache.hadoop.hdfs.DFSClient - Error Recovery for block blk_1265169201875643059_24620 bad datanode[0] nodes == null 2012-04-27 14:37:42,945 [Thread-5] WARN org.apache.hadoop.hdfs.DFSClient - Could not get block locations. Source file "/tmp/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201204261707_0411/job.jar" - Aborting... 2012-04-27 14:37:42,945 [Thread-4] INFO org.apache.hadoop.mapred.JobClient - Cleaning up the staging area hdfs://dsdb1:54310/tmp/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201204261707_0411 2012-04-27 14:37:42,945 [Thread-4] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.EOFException 2012-04-27 14:37:42,996 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.200:50010java.io.IOException: Bad connect ack with firstBadLink as 125.18.62.198:50010 2012-04-27 14:37:42,996 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_-7583284266913502018_24621 2012-04-27 14:37:42,997 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.198:50010java.io.EOFException 2012-04-27 14:37:42,997 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_4207260385919079785_24622 2012-04-27 14:37:42,998 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.198:50010 2012-04-27 14:37:43,000 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.198:50010 2012-04-27 14:37:43,002 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.197:50010java.io.EOFException 2012-04-27 14:37:43,002 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_-2859304645525022496_24624 2012-04-27 14:37:43,003 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.197:50010 2012-04-27 14:37:43,003 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.198:50010java.io.EOFException 2012-04-27 14:37:43,004 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_-5091361633954135154_24622 2012-04-27 14:37:43,004 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.199:50010java.io.EOFException 2012-04-27 14:37:43,004 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_-1445223397912067500_24624 2012-04-27 14:37:43,005 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.198:50010 2012-04-27 14:37:43,005 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.199:50010 2012-04-27 14:37:43,006 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.204:50010java.io.EOFException 2012-04-27 14:37:43,006 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_4137744363907213546_24624 2012-04-27 14:37:43,007 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.204:50010 2012-04-27 14:37:43,008 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.204:50010java.io.EOFException 2012-04-27 14:37:43,008 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_4553692535678376597_24624 2012-04-27 14:37:43,008 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.197:50010java.io.EOFException 2012-04-27 14:37:43,008 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_-7407489373889053706_24624 On Fri, Apr 27, 2012 at 3:45 PM, Mohit Anchlia <mohitanch...@gmail.com>wrote: > After all the jobs fail I can't run anything. Once I restart the cluster I > am able to run other jobs with no problems, hadoop fs and other io > intensive jobs run just fine. > > > On Fri, Apr 27, 2012 at 3:12 PM, John George <john...@yahoo-inc.com>wrote: > >> Can you run a regular 'hadoop fs' (put orls or get) command? >> If yes, how about a wordcount example? >> '<path>/hadoop jar <path>hadoop-*examples*.jar wordcount input output' >> >> >> -----Original Message----- >> From: Mohit Anchlia <mohitanch...@gmail.com> >> Reply-To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org> >> Date: Fri, 27 Apr 2012 14:36:49 -0700 >> To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org> >> Subject: Re: DFSClient error >> >> >I even tried to reduce number of jobs but didn't help. This is what I >> see: >> > >> >datanode logs: >> > >> >Initializing secure datanode resources >> >Successfully obtained privileged resources (streaming port = >> >ServerSocket[addr=/0.0.0.0,localport=50010] ) (http listener port = >> >sun.nio.ch.ServerSocketChannelImpl[/0.0.0.0:50075]) >> >Starting regular datanode initialization >> >26/04/2012 17:06:51 9858 jsvc.exec error: Service exit with a return >> value >> >of 143 >> > >> >userlogs: >> > >> >2012-04-26 19:35:22,801 WARN >> >org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library is >> >available >> >2012-04-26 19:35:22,801 INFO >> >org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library >> >loaded >> >2012-04-26 19:35:22,808 INFO >> >org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & >> >initialized native-zlib library >> >2012-04-26 19:35:22,903 INFO org.apache.hadoop.hdfs.DFSClient: Failed to >> >connect to /125.18.62.197:50010, add to deadNodes and continue >> >java.io.EOFException >> > at java.io.DataInputStream.readShort(DataInputStream.java:298) >> > at >> >> >org.apache.hadoop.hdfs.DFSClient$RemoteBlockReader.newBlockReader(DFSClien >> >t.java:1664) >> > at >> >> >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.getBlockReader(DFSClient.j >> >ava:2383) >> > at >> >> >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java >> >:2056) >> > at >> >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2170) >> > at java.io.DataInputStream.read(DataInputStream.java:132) >> > at >> >> >org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(Decompr >> >essorStream.java:97) >> > at >> >> >org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorSt >> >ream.java:87) >> > at >> >> >org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.j >> >ava:75) >> > at java.io.InputStream.read(InputStream.java:85) >> > at >> >org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205) >> > at >> org.apache.hadoop.util.LineReader.readLine(LineReader.java:169) >> > at >> >> >org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRe >> >cordReader.java:114) >> > at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:109) >> > at >> >> >org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordRead >> >er.nextKeyValue(PigRecordReader.java:187) >> > at >> >> >org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapT >> >ask.java:456) >> > at >> >org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) >> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) >> > at >> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) >> > at org.apache.hadoop.mapred.Child$4.run(Child.java:270) >> > at java.security.AccessController.doPrivileged(Native Method) >> > at javax.security.auth.Subject.doAs(Subject.java:396) >> > at >> >> >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation. >> >java:1157) >> > at org.apache.hadoop.mapred.Child.main(Child.java:264) >> >2012-04-26 19:35:22,906 INFO org.apache.hadoop.hdfs.DFSClient: Failed to >> >connect to /125.18.62.204:50010, add to deadNodes and continue >> >java.io.EOFException >> > >> >namenode logs: >> > >> >2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobTracker: Job >> >job_201204261140_0244 added successfully for user 'hadoop' to queue >> >'default' >> >2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobTracker: >> >Initializing job_201204261140_0244 >> >2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.AuditLogger: >> >USER=hadoop IP=125.18.62.196 OPERATION=SUBMIT_JOB >> >TARGET=job_201204261140_0244 RESULT=SUCCESS >> >2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobInProgress: >> >Initializing job_201204261140_0244 >> >2012-04-26 16:12:53,581 INFO org.apache.hadoop.hdfs.DFSClient: Exception >> >in >> >createBlockOutputStream 125.18.62.198:50010 java.io.IOException: Bad >> >connect ack with firstBadLink as 125.18.62.197:50010 >> >2012-04-26 16:12:53,581 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning >> >block blk_2499580289951080275_22499 >> >2012-04-26 16:12:53,582 INFO org.apache.hadoop.hdfs.DFSClient: Excluding >> >datanode 125.18.62.197:50010 >> >2012-04-26 16:12:53,594 INFO org.apache.hadoop.mapred.JobInProgress: >> >jobToken generated and stored with users keys in >> >/data/hadoop/mapreduce/job_201204261140_0244/jobToken >> >2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: >> Input >> >size for job job_201204261140_0244 = 73808305. Number of splits = 1 >> >2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: >> >tip:task_201204261140_0244_m_000000 has split on node:/default-rack/ >> >dsdb4.corp.intuit.net >> >2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: >> >tip:task_201204261140_0244_m_000000 has split on node:/default-rack/ >> >dsdb5.corp.intuit.net >> >2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: >> >job_201204261140_0244 LOCALITY_WAIT_FACTOR=0.4 >> >2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: Job >> >job_201204261140_0244 initialized successfully with 1 map tasks and 0 >> >reduce tasks. >> > >> >On Fri, Apr 27, 2012 at 7:50 AM, Mohit Anchlia >> ><mohitanch...@gmail.com>wrote: >> > >> >> >> >> >> >> On Thu, Apr 26, 2012 at 10:24 PM, Harsh J <ha...@cloudera.com> wrote: >> >> >> >>> Is only the same IP printed in all such messages? Can you check the DN >> >>> log in that machine to see if it reports any form of issues? >> >>> >> >>> All IPs were logged with this message >> >> >> >> >> >>> Also, did your jobs fail or kept going despite these hiccups? I notice >> >>> you're threading your clients though (?), but I can't tell if that may >> >>> cause this without further information. >> >>> >> >>> It started with this error message and slowly all the jobs died with >> >> "shortRead" errors. >> >> I am not sure about threading. I am using pig script to read .gz file >> >> >> >> >> >>> On Fri, Apr 27, 2012 at 5:19 AM, Mohit Anchlia < >> mohitanch...@gmail.com> >> >>> wrote: >> >>> > I had 20 mappers in parallel reading 20 gz files and each file >> around >> >>> > 30-40MB data over 5 hadoop nodes and then writing to the analytics >> >>> > database. Almost midway it started to get this error: >> >>> > >> >>> > >> >>> > 2012-04-26 16:13:53,723 [Thread-8] INFO >> >>> org.apache.hadoop.hdfs.DFSClient - >> >>> > Exception in createBlockOutputStream >> >>> > 17.18.62.192:50010java.io.IOException: Bad connect ack with >> >>> > firstBadLink as >> >>> > 17.18.62.191:50010 >> >>> > >> >>> > I am trying to look at the logs but doesn't say much. What could be >> >>>the >> >>> > reason? We are in pretty closed reliable network and all machines >> are >> >>> up. >> >>> >> >>> >> >>> >> >>> -- >> >>> Harsh J >> >>> >> >> >> >> >> >> >