As I said in the original message bad partitioning was my original theory I
have had issues with it in the past and am careful with my partitioner. It
 was the first thing I looked for but I do not see any evidence that the
slower jobs have significantly more data than the faster ones and certainly
not enough to justify a radically different running time.


On Thu, Aug 29, 2013 at 9:29 AM, Charles Baker <cba...@sdl.com> wrote:

>  Hi Steve. Sounds like a classic case of uneven data distribution among
> the reducers. Most of your data is probably going to those 10 reducers that
> are taking many hours. You may want to adjust your key and/or partitioning
> strategy to better distribute the data amongst the reducers. If you’re
> using a hashing type of partitioning strategy, think about using a prime
> number of reducers. Primes are proven to have a more even distribution with
> a hash type strategy and this alone may get you pretty far. I have no idea
> what your workflow or cluster configuration is like but 300 reducers for
> 300 mappers doesn’t sound right. Try using a (prime) number of reducers
> that’s roughly  equal to 95% of the total reducer slots allocated on the
> cluster and go from there. Usually, the cluster should be configured for
> less reducers than mappers. If you have 12 cores per node (HT off), try 8
> mappers and 3 reducers per node.****
>
> ** **
>
> Good luck!****
>
> ** **
>
> Chuck****
>
> ** **
>
> ** **
>
> *From:* Steve Lewis [mailto:lordjoe2...@gmail.com]
> *Sent:* Wednesday, August 28, 2013 7:48 PM
> *To:* mapreduce-user
> *Subject:* Some jobs seem to run forever****
>
> ** **
>
> I have an issue that I am running a hadoop job on a 40 node cluster with
> about 300 Map tasks and about 300 reduce tasks. Most tasks complete within
> 20 minutes but a few, typically less than 10 run for many hours. ****
>
> If they complete I see nothing to suggest that the number of bytes read or
> written or the number of records read or written is significantly different
> from tasks that run much faster. I sometimes see multiple attempts -
> usually only two and the cluster is doing nothing else.****
>
> ** **
>
> Any suggested tuning?
> ****
>
> ** **
>
>  ****
>
>
>
> www.sdl.com
> <http://www.sdl.com/?utm_source=Email&utm_medium=Email%2BSignature&utm_campaign=SDL%2BStandard%2BEmail%2BSignature>
>
>  *SDL PLC confidential, all rights reserved.* If you are not the intended
> recipient of this mail SDL requests and requires that you delete it without
> acting upon or copying any of its contents, and we further request that you
> advise us.
>
> SDL Enterprise Technologies, Inc. - all rights reserved. The information
> contained in this email may be confidential and/or legally privileged. It
> has been sent for the sole use of the intended recipient(s). If you are not
> the intended recipient of this mail, you are hereby notified that any
> unauthorized review, use, disclosure, dissemination, distribution, or
> copying of this communication, or any of its contents, is strictly
> prohibited. If you have received this communication in error, please reply
> to the sender and destroy all copies of the message.
> Registered address: 201 Edgewater Drive, Suite 225, Wakefield, MA 01880,
> USA
>



-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Reply via email to