[
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333677#comment-14333677
]
Zhe Zhang commented on HDFS-7285:
---------------------------------
I'm seeing a lot of conflicts when rebasing against trunk. Somehow git decides
to re-apply HDFS-7723. Below is the output of {{git rebase -i apache/trunk}}.
{code}
1 pick 5c27789 HDFS-7347. Configurable erasure coding policy for individual
files and directories ( Contributed by Zhe Zhang )
2 pick ae4e4d4 HDFS-7339. Allocating and persisting block groups in NameNode.
Contributed by Zhe Zhang
3 pick eb3132b HDFS-7652. Process block reports for erasure coded blocks.
Contributed by Zhe Zhang
4 pick 2477b02 Fix Compilation Error in TestAddBlockgroup.java after the merge
5 pick 0ae52c8 HADOOP-11514. Raw Erasure Coder API for concrete encoding and
decoding (Kai Zheng via umamahesh)
6 pick f9e1cc2 HADOOP-11534. Minor improvements for raw erasure coders (
Contributed by Kai Zheng )
7 pick c36a7a9 HADOOP-11541. Raw XOR coder
8 pick 93fc299 Added the missed entry for commit of HADOOP-11541
9 pick 2516efd HDFS-7716. Erasure Coding: extend BlockInfo to handle EC info.
Contributed by Jing Zhao.
10 pick e746443 HADOOP-11542. Raw Reed-Solomon coder in pure Java. Contributed
by Kai Zheng
11 pick 1611bb2 HDFS-7723. Quota By Storage Type namenode implemenation.
(Contributed by Xiaoyu Yao)
{code}
I'll just re-apply 1~10 on top of current trunk.
> Erasure Coding Support inside HDFS
> ----------------------------------
>
> Key: HDFS-7285
> URL: https://issues.apache.org/jira/browse/HDFS-7285
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Weihua Jiang
> Assignee: Zhe Zhang
> Attachments: ECAnalyzer.py, ECParser.py,
> HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf,
> HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf,
> fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice
> of data reliability, comparing to the existing HDFS 3-replica approach. For
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks,
> with storage overhead only being 40%. This makes EC a quite attractive
> alternative for big data storage, particularly for cold data.
> Facebook had a related open source project called HDFS-RAID. It used to be
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for
> cold files that are intended not to be appended anymore; 3) the pure Java EC
> coding implementation is extremely slow in practical use. Due to these, it
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that
> gets rid of any external dependencies, makes it self-contained and
> independently maintained. This design lays the EC feature on the storage type
> support and considers compatible with existing HDFS features like caching,
> snapshot, encryption, high availability and etc. This design will also
> support different EC coding schemes, implementations and policies for
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel
> ISA-L library), an implementation can greatly improve the performance of EC
> encoding/decoding and makes the EC solution even more attractive. We will
> post the design document soon.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)