HDFS-918 and HDFS-347 are absolutely critical for random read performance. The smarter sites are already running HDFS-347 (I guess they aren't running "Hadoop" then?), and soon they will be testing and running HDFS-918 as well. Opening 1 socket for every read just isn't really scalable.
-ryan On Fri, Jun 17, 2011 at 12:17 AM, Eric Baldeschwieler <[email protected]> wrote: > Hi Folks, > > I'd like to start a conversation on mainline planning and the next release of > Apache Hadoop beyond 0.22. > > The Yahoo! Hadoop team has been working hard to complete several big Hadoop > projects, including: > > - HDFS Federation [HDFS-1052] > - Already merged into trunk > > - Next Generation Map-Reduce [MR-279] > - Passing most tests now and discussing merging into trunk > > - The merging of our previous work on Hadoop with security into mainline > [http://yhoo.it/i9Ww8W] > - This is mostly done, but owen and others are doing a scrub to close out > the remaining issues > > All of these projects are now reaching a place where we would like to combine > them with the good work already in 0.22 and put out a new apache release, > perhaps 0.23. We think the best way to accomplish that is to finish the > merge in the next few weeks and then cut a release from trunk. > > Yahoo stands ready to help us (the Apache Hadoop Community) turn this new > release into a stable release by running it through its 9 month test and burn > in process. The result of that will be another stable release such as 0.18, > 0.20 or 0.20.203 (hadoop with security). We have Yahoo!s support for this > substantial investment because this new release will have a great combination > of new features for small and very large sites alike: > - New Write Pipeline - HBase support [also in 0.21 & 0.22] > - Federation - Scale up to larger clusters and the ability to experiment > with new namenode approaches > - Next Gen MapReduce - Scaleup, performance improvements, ability to > experiment with new processing frameworks > > I think this effort will produce a great new Apache Hadoop release for the > community. I'm starting this thread to collect feedback and hopefully folks' > endorsement for merging in MR-279 and putting together this new release. > Feedback please? > > Thanks, > > E14 > >
