Re: Reducer, writing from multiple threads

Markus Jelsma Mon, 16 Jan 2012 09:09:35 -0800

Please ignore my stupidity. I had an error in my cleanup method, killing 
everything too early hence the threads could never write to context.


On Monday 16 January 2012 16:23:48 Markus Jelsma wrote:
> Hi,
> 
> We have a job that is IO bound. The mapper aggregates the keys and the
> reducer has to lookup the incoming keys externally. If this runs serially
> with 15 reducers it takes many days so we are using threads to look them
> up.
> 
> We offer the keys to a SynchronousQueue and use a ThreadPoolExectutor for
> handing the worker threads. In those threads we need to write the k/v pair
> using the new MapReduce API in Hadoop 1.0.0 but we get an NPE:
> 
> java.lang.NullPointerException
>         at
> org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.jav
> a:975) at
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1017)
>         at
> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.write(Seq
> uenceFileOutputFormat.java:74) at
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTas
> k.java:587) at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputCon
> text.java:80) at
> org.apache.nutch.util.hostdb.UpdateHostDb$UpdateHostDbReducer$ResolverThrea
> d.run(UpdateHostDb.java:188) at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.j
> ava:886) at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:
> 908) at java.lang.Thread.run(Thread.java:662)
> 
> My question, what is the recommended method of writing from a thread pool
> inside a reducer? I've looked up the NPE but only see references to HBase
> issues which do not seem to apply to this situation.
> 
> Any hints to offer?
> Thanks!

-- 
Markus Jelsma - CTO - Openindex

Re: Reducer, writing from multiple threads

Reply via email to