Arun, How much time do you think it would take to have a version of 0.20 with the security features in it ready ? In a different thread, Owen has started discussing plans around 0.22. Do you think this effort would affect 0.22 release ?
I do agree that this would be very useful for folks who want security sooner. And the fact that Yahoo! have been running it at scale for a good while now is also assuring. Thanks hemanth On Tue, Aug 24, 2010 at 5:57 AM, Arun C Murthy <[email protected]> wrote: > Even with the work on hadoop-0.22 (trunk) starting in earnest it is fairly > obvious, given our past history, that it will take a while for us to get it > stable and deployable - for e.g. it took us nearly 6 months to deploy > hadoop-0.20. > > In the interim I'd like to propose we push a hadoop-0.20-security release > off the Yahoo! patchset (http://github.com/yahoo/hadoop-common). This will > ensure the community benefits from all the work done at Yahoo! for over 12 > months *now*, and ensures that we do not have to wait until hadoop-0.22 > which has all of these patches. > > Some salient aspects: > a) Full-fledged security implementation deployed at scale (4000 nodes) in > production. > b) Lots of work on the stabilizing and optimizing the NameNode and > JobTracker for over 12 months. This has been critical in deploying Hadoop at > scale i.e. clusters of 4000 nodes. For e.g. we have a 50% improvement in CPU > utilization on the JobTracker vis-a-vis the hadoop-0.20.2 release. > c) Several new features in the scheduler (CapacityScheduler), Map-Reduce > framework, better support for multi-tenancy etc. > d) Several performance and stability improvements to the system e.g. > iterative ls, robustness against rogue clients/jobs/users etc. > > Also, given the huge number of features and enhancements I'd like to propose > we create a new 0.20-security branch and commit the Yahoo patchset there for > the release. > > This has been proposed earlier by Doug and did not get far due to concerns > about the effect this would have on development on trunk. However, I > believe, we have a case for demonstrable progress on trunk now, and it would > be useful to have an interim, fully-tested Apache Hadoop release available > to the community. > > Conceivably, one could imagine a Hadoop Security + Append release soon > after. At this point a Hadoop Security release alone would add tremendous > value for the reasons above. Presently we would like to get this release out > quickly to focus the majority of our efforts on trunk. > > Thoughts? > > Arun > >
