On Aug 18, 2011, at 12:28 AM, Owen O'Malley wrote:
>
> This vote is still running with no votes other than mine.
>
> I've tested with and without security on a 60 node cluster and I'm seeing
> some failures, but not that many. On a terasort with 15,000 maps and 200
> reduces, I ran the following cases:
>
> security + linux task controller : 2 failures (both mr-2651)
>
> no security + default task controller : 6-7 failures (seems to be a race
> condition in clean up)
>
> Even in the no security case, it is only losing 0.05% of the time.
We're seeing much much higher failure rates. In the 5-10% area. It
might very well be because we have more cores/faster boxes.
> It isn't perfect, but this is the code that Yahoo is currently running. I
> think we should release it.
Y! can afford the task failures. The rest of us can't. So -1.