Re: Patch

Andreas Kostyrka Fri, 13 Jun 2008 01:05:14 -0700

Sorry, for replying the private email to the mailing list, but I strongly 
believe in leaving the next guy something to google ;)


Anyway, as you seem to be knowledgeable about sorting, one question:

Does hadoop provide all key/value tuples for a given key in one batch to the 
reducer, or not?

TIA,

Andreas

On Friday 13 June 2008 02:48:52 you wrote:
> Great deal; thanks for sending it to me.
>
> This has exactly the same pattern described in the JIRA
> (HADOOP-3442); the partition that fails is nearly sorted and it's
> selected one of the largest values as its pivot.
>
> The fix is checked into the 0.17 branch; if you check it out and
> deploy it, your jobs should finish without causing the
> StackOverflowError. If you're noticing inordinately long sort times
> for your job (i.e. this is a common pattern for your data), then you
> might consider applying HADOOP-3308 and HADOOP-3442 (the former so
> the latter applies cleanly). Really sorry you hit this; let me know
> if the sort times with the 0.17.1 branch are inordinately long, so
> this can get another iteration if it needs it. -C

signature.asc
Description: This is a digitally signed message part.

Re: Patch

Reply via email to