Sorry, for replying the private email to the mailing list, but I strongly believe in leaving the next guy something to google ;)
Anyway, as you seem to be knowledgeable about sorting, one question: Does hadoop provide all key/value tuples for a given key in one batch to the reducer, or not? TIA, Andreas On Friday 13 June 2008 02:48:52 you wrote: > Great deal; thanks for sending it to me. > > This has exactly the same pattern described in the JIRA > (HADOOP-3442); the partition that fails is nearly sorted and it's > selected one of the largest values as its pivot. > > The fix is checked into the 0.17 branch; if you check it out and > deploy it, your jobs should finish without causing the > StackOverflowError. If you're noticing inordinately long sort times > for your job (i.e. this is a common pattern for your data), then you > might consider applying HADOOP-3308 and HADOOP-3442 (the former so > the latter applies cleanly). Really sorry you hit this; let me know > if the sort times with the 0.17.1 branch are inordinately long, so > this can get another iteration if it needs it. -C
signature.asc
Description: This is a digitally signed message part.
