[
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272010#comment-14272010
]
Kai Zheng commented on HDFS-7337:
---------------------------------
Still continued to address Zhe's above comments.
bq. I guess ECBlock is for testing purpose? An erasure coded block should have
all properties of a regular block. I think we can just add a couple of flags to
the Block class.
Somehow you're right the ECBlock class isn't finalized as the whole bundle of
codes were attached for our looking at and discussing. I'm working on this and
will decouple ECBlock from HDFS block. It's possible because the codec
framework has already nice arrangement to delegate how to pull/extract chunks
(ECChunk) from ECBlock. It's the caller's (ECWorker or ECClient) responsibility
to handle how to extract/collect bytes chunks from an actual HDFS block. When
decoupled, the ECBlock or similar would be very lightweight and won't need so
many fields at all. I will have new codes for us discussion further.
bq. It's not quite clear to me why we need ErasureCoderCallback. Is it for
async codec calculation? If codec calculations are done on small packets, I
think sync operations are fine.
The ErasureCoderCallback maybe better named to avoid such confusion. It's not
relevant to sync or async. It's basically for the codec caller (ECWorker or
ECClient) to handle how to get chunks from blocks. Codec will call it to pull
chunks from blocks. It can be regarded as data sources provider. In ECWorker in
transforming case, many chunks can be pulled from the transformed blocks, thus
the enclosed bytes level encode() or decode() in raw coder can be called in
many places in a while loop. In ECClient in stripping ec case, it's similar
until the application finishes to write/read data from a BlockGroup.
> Configurable and pluggable Erasure Codec and schema
> ---------------------------------------------------
>
> Key: HDFS-7337
> URL: https://issues.apache.org/jira/browse/HDFS-7337
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Zhe Zhang
> Assignee: Kai Zheng
> Attachments: HDFS-7337-prototype-v1.patch,
> HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip,
> PluggableErasureCodec.pdf
>
>
> According to HDFS-7285 and the design, this considers to support multiple
> Erasure Codecs via pluggable approach. It allows to define and configure
> multiple codec schemas with different coding algorithms and parameters. The
> resultant codec schemas can be utilized and specified via command tool for
> different file folders. While design and implement such pluggable framework,
> it’s also to implement a concrete codec by default (Reed Solomon) to prove
> the framework is useful and workable. Separate JIRA could be opened for the
> RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation
> to make concrete vendor libraries transparent to the upper layer. This JIRA
> focuses on high level stuffs that interact with configuration, schema and etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)