On Aug 18, 2011, at 12:28 AM, Owen O'Malley wrote: > > This vote is still running with no votes other than mine. > > I've tested with and without security on a 60 node cluster and I'm seeing > some failures, but not that many. On a terasort with 15,000 maps and 200 > reduces, I ran the following cases: > > security + linux task controller : 2 failures (both mr-2651) > > no security + default task controller : 6-7 failures (seems to be a race > condition in clean up) > > Even in the no security case, it is only losing 0.05% of the time.
We're seeing much much higher failure rates. In the 5-10% area. It might very well be because we have more cores/faster boxes. > It isn't perfect, but this is the code that Yahoo is currently running. I > think we should release it. Y! can afford the task failures. The rest of us can't. So -1.