Weird! This looks like some other problem which happened while merging the outputs at the Reduce task. The copying stage went through fine. This requires some more analysis.
> -----Original Message----- > From: Mike Smith [mailto:[EMAIL PROTECTED] > Sent: Thursday, March 01, 2007 3:44 AM > To: [email protected] > Subject: Re: some reducers stock in copying stage > > Devaraj, > > After applying patch 1043 the copying problem is solved. But, I am > getting new exceptions, but, the tasks will be finished after reassigning > to > another tasktracker. So, the job gets done eventually. But, I never had > this > exception before applying this patch (or could it be because of chaning > back-off time to 5 sec?): > > java.lang.NullPointerException > at > org.apache.hadoop.fs.FSDataInputStream$Buffer.seek(FSDataInputStream.java > :74) > at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:121) > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer( > ChecksumFileSystem.java:217) > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read( > ChecksumFileSystem.java:163) > at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read( > FSDataInputStream.java:41) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > at java.io.DataInputStream.readFully(DataInputStream.java:178) > at java.io.DataInputStream.readFully(DataInputStream.java:152) > at org.apache.hadoop.io.SequenceFile$UncompressedBytes.reset( > SequenceFile.java:427) > at org.apache.hadoop.io.SequenceFile$UncompressedBytes.access$700( > SequenceFile.java:414) > at org.apache.hadoop.io.SequenceFile$Reader.nextRawValue(SequenceFile.java > :1665) > at > org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawValue( > SequenceFile.java:2579) > at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next( > SequenceFile.java:2351) > at org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java > :2226) > at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge( > SequenceFile.java:2442) > at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2164) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:270) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1444) > > java.lang.NullPointerException > at > org.apache.hadoop.fs.FSDataInputStream$Buffer.seek(FSDataInputStream.java > :74) > at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:121) > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer( > ChecksumFileSystem.java:217) > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read( > ChecksumFileSystem.java:163) > at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read( > FSDataInputStream.java:41) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > at java.io.DataInputStream.readFully(DataInputStream.java:178) > at java.io.DataInputStream.readFully(DataInputStream.java:152) > at org.apache.hadoop.io.SequenceFile$UncompressedBytes.reset( > SequenceFile.java:427) > at org.apache.hadoop.io.SequenceFile$UncompressedBytes.access$700( > SequenceFile.java:414) > at org.apache.hadoop.io.SequenceFile$Reader.nextRawValue(SequenceFile.java > :1665) > at > org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawValue( > SequenceFile.java:2579) > at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next( > SequenceFile.java:2351) > at org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java > :2226) > at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge( > SequenceFile.java:2442) > at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2164) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:270) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1444) > > > > On 2/28/07, Mike Smith <[EMAIL PROTECTED]> wrote: > > > > Thanks Devaraj, patch 1042 seems to be already committed. Also, the > system > > never recovered even after 1 min, 300 sec, it stocked there for hours. I > > will try patch 1043 and also decrease the back-off time to see if those > help > > > >
