Re: MapRed Job Completes; Output Ceases Mid-Job

Geoffry Roberts Thu, 08 Oct 2009 08:05:14 -0700

Jason,

Quite possibly, here's what I did: I upped "dfs.datanode.max.xcievers" to
512, which is a doubling, and the full set of output files are created
correctly.


Thanks for responding.

Learning, learning the ins and outs of Hadoop.

On Thu, Oct 8, 2009 at 6:01 AM, Jason Venner <jason.had...@gmail.com> wrote:

> Are you perhaps creating large numbers of files, and running out of file
> descriptors in your tasks.
>
>
> On Wed, Oct 7, 2009 at 1:52 PM, Geoffry Roberts <geoffry.robe...@gmail.com
> > wrote:
>
>> All,
>>
>> I have a MapRed job that ceases to produce output about halfway through.
>> The obvious question is why?
>>
>> This job reads a file and uses MultipleTextOutputFormat to generate output
>> files named with the output key.  At about the halfway point, the job
>> continues to create files, but they are all of zero length.    I've worked
>> with this input file extensively and I know it actually contains the
>> required data and that it is clean or at least it was when I copied it in.
>>
>> My first impulse was to check for a full disk, but there seems to be ample
>> free space.
>>
>> This doesn't appear to have anything to do with my code.
>>
>> stderror is full of the following entry:
>>
>> java.io.EOFException
>>
>>
>>      at java.io.DataInputStream.readByte(DataInputStream.java:250)
>>      at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>>      at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>>      at org.apache.hadoop.io.Text.readString(Text.java:400)
>>
>>
>>      at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2837)
>>      at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2762)
>>      at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2046)
>>
>>
>>      at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2232)
>>
>>
>> syslog for the reducer starts filling up with the following at what could
>> indeed be the halfway point:
>>
>> 2009-10-07 11:27:50,874 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
>> createBlockOutputStream java.io.EOFException
>>
>>
>> 2009-10-07 11:27:50,916 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
>> block blk_-1693260904457793456_3495
>> 2009-10-07 11:27:56,919 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
>> createBlockOutputStream java.io.EOFException
>>
>>
>> 2009-10-07 11:27:56,919 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
>> block blk_7536254999085848659_3495
>> 2009-10-07 11:28:02,921 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
>> createBlockOutputStream java.io.EOFException
>>
>>
>> 2009-10-07 11:28:02,921 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
>> block blk_-7513223558440754487_3495
>> 2009-10-07 11:28:08,924 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
>> createBlockOutputStream java.io.EOFException
>>
>>
>> 2009-10-07 11:28:08,924 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
>> block blk_2580888829875117043_3495
>> 2009-10-07 11:28:14,965 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
>> Exception: java.io.IOException: Unable to create new block.
>>
>>
>>      at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2781)
>>      at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2046)
>>      at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2232)
>>
>>
>>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>

Re: MapRed Job Completes; Output Ceases Mid-Job

Reply via email to