Question on Critical Region size for SequenceFile next/write - 0.15.1

Jason Venner Wed, 12 Dec 2007 13:20:08 -0800

We have relatively heavy weight objects that we pass around the clusterfor our map/reduce tasks.We have noticed that when we are using the multi threaded mapper, wedon't get very high utilization of either cpu or disk.

On investigating, we discovered that the entirety of the next(key,value)and the entirety of the write( key, value) are synchronized on the fileobject.


This causes all threads to back up on the serialization/deserialization.

Before we start coding, are there any current patches floating aroundthe shrink this critical window? It is pretty straight forward forwrite, but not so simple for next.

We run multithreaded mappers because we have more cpu's than disk armson our cluster machines, and some of our tasks are inherently threadedso we can't just set the maximum task number.


Thanks -- Jason

Question on Critical Region size for SequenceFile next/write - 0.15.1

Reply via email to