[
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600737#comment-14600737
]
Zhe Zhang commented on HDFS-7285:
---------------------------------
Thanks Jing and Walter for the helpful discussion!
bq. Then why not bringing the BlockInfoXXX - BlockInfoXXXUC inheritance back
and just make the inheritance structure like HDFS-7285 branch?
This is because several places in trunk are relying on the {{BlockInfo}} -
{{BlockInfoUC}} inheritance. As discussed under HDFS-8499, this
multi-inheritance problem is fundamentally hard. HDFS-8499 patch keeps the
{{BlockInfo}} - {{BlockInfoUC}} inheritance to minimize change to trunk. This
structure also makes it easier to share common code because the code difference
along the contiguous-striped dimension is smaller than the UC dimension.
I'm open to revisiting the {{BlockInfo}} structure based on discussion here.
With either structure discussed above, I think we should solve the
{{BlockInfo}} multi-inheritance problem more completely as a follow-on.
bq. We can use BlockInfo as an abstraction for complete and UC blocks. We need
to change FileWithStripedBlocksFeature#blocks to BlockInfo as well.
The PoC patch already does that. As Jing and myself commented under HDFS-8058,
the downside is weaker type safety. For example, on the API level,
{{setBlocks}} allows some other method to assign an array of {{BlockInfo}} to
the INode; it's not easy to verify whether there are mixed types. My current
thought is that we can create an abstraction {{BlocksInAFile}}, with a type and
an array of {{BlockInfo}}. This will serve as a central place to control type
safety. I'll post a patch under HDFS-8058 to demonstrate the idea.
bq. I saw you cast BlockInfo to BlockInfoStriped multiple times in BlockManager
in github branch. They can be eliminated.
This is a good point. We can use {{isStriped}} and {{getStripedBlockStorageOp}}
instead.
> Erasure Coding Support inside HDFS
> ----------------------------------
>
> Key: HDFS-7285
> URL: https://issues.apache.org/jira/browse/HDFS-7285
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Weihua Jiang
> Assignee: Zhe Zhang
> Attachments: ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch,
> HDFS-EC-Merge-PoC-20150624.patch, HDFS-bistriped.patch,
> HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf,
> HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf,
> HDFSErasureCodingPhaseITestPlan.pdf, fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice
> of data reliability, comparing to the existing HDFS 3-replica approach. For
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks,
> with storage overhead only being 40%. This makes EC a quite attractive
> alternative for big data storage, particularly for cold data.
> Facebook had a related open source project called HDFS-RAID. It used to be
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for
> cold files that are intended not to be appended anymore; 3) the pure Java EC
> coding implementation is extremely slow in practical use. Due to these, it
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that
> gets rid of any external dependencies, makes it self-contained and
> independently maintained. This design lays the EC feature on the storage type
> support and considers compatible with existing HDFS features like caching,
> snapshot, encryption, high availability and etc. This design will also
> support different EC coding schemes, implementations and policies for
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel
> ISA-L library), an implementation can greatly improve the performance of EC
> encoding/decoding and makes the EC solution even more attractive. We will
> post the design document soon.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)