[ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391207#comment-14391207
 ] 

Zhe Zhang commented on HDFS-7285:
---------------------------------

Yesterday we had another offline meetup. I think the discussion was very 
productive. Below please find the summary:
*Attendees*: Nicholas, Jing, Zhe

*Project phasing*
We went over the list of subtasks under this JIRA and separated them into 3 
categories:
# Basic EC functionalities under the striping layout. Those subtasks were kept 
under this umbrella JIRA. The goal is for the HDFS-7285 branch to be ready for 
merging into trunk upon their completion.
# Follow-on tasks for EC+striping (including code and performance optimization, 
as well as support for advanced HDFS features). Those subtasks were moved under 
HDFS-8031. Following the common practice, those follow-on tasks are targeted 
for trunk, after HDFS-7285 is merged.
# EC with non-striping / contiguous block layout. Those subtasks were moved to 
HDFS-8030, which represents the 2nd phase of the erasure coding project.

Extending from the initial [PoC prototype | 
https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14339006&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14339006],
 the following _basic EC functionalities_ will be finished under this JIRA 
([~szetszwo] please let me know if I missed anything from your list):
* A striped block group is distributed evenly on racks
* NN handles striped block groups in existing block management logics:
** Missing and corrupted blocks
** To-invalidate blocks
** Lease recovery
** DN decommissioning
* NN periodically distributes tasks to DN to reconstruct missing striped blocks
* DN executes the reconstruction task by pulling data from peer DNs
* Client can read a striped block group even if some blocks are missing, 
through decoding
* Client should handle DN failures during writing
* Basic command for directory-level EC configuration (similar to a zone)
* Correctly handle striped block groups in file system statistics and metrics
* Documentation
* More comprehensive testing
* _Optional_: instead of hard-coding, incorporate the {{ECSchema}} class with 
1~2 schemas

*Key remaining tasks*
We think the following remaining tasks are _key_ in terms of complexity and 
amount of work:
# Client writing: the basic striped writing logic is close to complete (patch 
available under HDFS-7889), but it's challenging to handle failures during 
writing in an elegant way. 
# Client reading: the logic isn't too complex but amount of work is non-trivial
# DN reconstruction: logic is clean but work has not been started yet

*Client design*
We also dived into more details of the design of client reading/writing paths, 
and are synced on the overall approach. A few points were raised and will be 
addressed:
# Cell size in striping currently has default value of 1M. We should study its 
impact more carefully. Intuitively, a smaller value (like 128K) might be more 
suitable.
# Pread in striping format should always try to fetch data in parallel, when 
the requested range spans multiple striping cells.
# Stateful read in striping format should maintain multiple block readers to 
minimize overhead of creating new readers.

> Erasure Coding Support inside HDFS
> ----------------------------------
>
>                 Key: HDFS-7285
>                 URL: https://issues.apache.org/jira/browse/HDFS-7285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Weihua Jiang
>            Assignee: Zhe Zhang
>         Attachments: ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch, 
> HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, 
> HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf, 
> fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to