Hi, On Thu, Aug 12, 2010 at 10:31 AM, Hemanth Yamijala <yhema...@gmail.com> wrote: > Hi, > > On Thu, Aug 12, 2010 at 3:35 AM, Bobby Dennett > <bdennett+softw...@gmail.com> wrote: >> From what I've read/seen, it appears that, if not the "default" >> scheduler, most installations are using Hadoop's Fair Scheduler. Based >> on features and our requirements, we're leaning towards using the >> Capacity Scheduler; however, there is some concern that it may not be >> as "stable" as there doesn't appear to be as much talk about it, >> compared to the Fair Scheduler. >> >> Has anyone hit any nasty issues with regards to the Capacity Scheduler >> and, in general, are there any "gotchas" to look out for with either >> scheduler? >> >> We're ramping up the number of users on our Hadoop clusters, >> particularly in regards to Hive. Our goal is to ensure that production >> processes continue to run with a majority of the cluster during peak >> usage times, while personal users share the remaining capacity. The >> Capacity Scheduler's support of queues and for memory-intensive jobs >> is appealing but we are curious about drawbacks and/or potential >> issues. > > FWIW, Yahoo! is running capacity scheduler for a reasonably long time > now. However, there have been many patches on top of the base Hadoop > 0.20.2 version to capacity scheduler that make it 'stable' and work at > large scale effectively. Looking at the change log of the yahoo hadoop > distribution could possibly give an idea of which patches are useful > to pick up and apply to an older version. The good news is that most > of these patches have 0.20 versions that are available on JIRA and > would apply reasonably cleanly. >
Allen cautions the part about patches applying cleanly to 0.20 might not be very true. Thanks for that heads-up, Allen ! Thanks Hemanth