+1 Great addition to HDFS. Thanks all contributors for the nice work.
Regards, Uma On 9/22/15, 3:40 PM, "Zhe Zhang" <zhezh...@cloudera.com> wrote: >Hi, > >I'd like to propose a vote to merge the HDFS-7285 feature branch back to >trunk. Since November 2014 we have been designing and developing this >feature under the umbrella JIRAs HDFS-7285 and HADOOP-11264, and have >committed approximately 210 patches. > >The HDFS-7285 feature branch was created to support the first phase of >HDFS >erasure coding (HDFS-EC). The objective of HDFS-EC is to significantly >reduce storage space usage in HDFS clusters. Instead of always creating 3 >replicas of each block with 200% storage space overhead, HDFS-EC provides >data durability through parity data blocks. With most EC configurations, >the storage overhead is no more than 50%. Based on profiling results of >production clusters, we decided to support EC with the striped block >layout >in the first phase, so that small files can be better handled. This means >dividing each logical HDFS file block into smaller units (striping cells) >and spreading them on a set of DataNodes in round-robin fashion. Parity >cells are generated for each stripe of original data cells. We have made >changes to NameNode, client, and DataNode to generalize the block concept >and handle the mapping between a logical file block and its internal >storage blocks. For further details please see the design doc on >HDFS-7285. >HADOOP-11264 focuses on providing flexible and high-performance codec >calculation support. > >The nightly Jenkins job of the branch has reported several successful >runs, >and doesn't show new flaky tests compared with trunk. We have posted >several versions of the test plan including both unit testing and cluster >testing, and have executed most tests in the plan. The most basic >functionalities have been extensively tested and verified in several real >clusters with different hardware configurations; results have been very >stable. We have created follow-on tasks for more advanced error handling >and optimization under the umbrella HDFS-8031. We also plan to implement >or >harden the integration of EC with existing features such as WebHDFS, >snapshot, append, truncate, hflush, hsync, and so forth. > >Development of this feature has been a collaboration across many companies >and institutions. I'd like to thank J. Andreina, Takanobu Asanuma, >Vinayakumar B, Li Bo, Takuya Fukudome, Uma Maheswara Rao G, Rui Li, Yi >Liu, >Colin McCabe, Xinwei Qin, Rakesh R, Gao Rui, Kai Sasaki, Walter Su, Tsz Wo >Nicholas Sze, Andrew Wang, Yong Zhang, Jing Zhao, Hui Zheng and Kai Zheng >for their code contributions and reviews. Andrew and Kai Zheng also made >fundamental contributions to the initial design. Rui Li, Gao Rui, Kai >Sasaki, Kai Zheng and many other contributors have made great efforts in >system testing. Many thanks go to Weihua Jiang for proposing the JIRA, and >ATM, Todd Lipcon, Silvius Rus, Suresh, as well as many others for >providing >helpful feedbacks. > >Following the community convention, this vote will last for 7 days (ending >September 29th). Votes from Hadoop committers are binding but non-binding >votes are very welcome as well. And here's my non-binding +1. > >Thanks, >--- >Zhe Zhang