[
https://issues.apache.org/jira/browse/HDFS-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577545#comment-14577545
]
Zhe Zhang commented on HDFS-8494:
---------------------------------
Thanks for the discussion folks.
I remember discussing this in an offline meetup a while ago with [~jingzhao]
and [~szetszwo]. The conclusion at that time was that we should avoid storing
EC schema for each block.
But I think the question is worth discussing again. A medium size cluster could
have several 100M blocks. 4b a block translates to a few GB of memory overhead.
On the other side of the tradeoff, as Walter analyzed, we need to go through a
few hops to lookup the {{cellSize}}. The most expensive part is probably the
path traversal and {{getECZone}}. So if we don't need to get {{cellSize}}
frequently, I'd suggest keeping in in the zone. We can certainly add a public
method in {{BlockInfoStriped}} to simplify the code.
> Remove hard-coded chunk size in favor of ECZone
> -----------------------------------------------
>
> Key: HDFS-8494
> URL: https://issues.apache.org/jira/browse/HDFS-8494
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: HDFS-7285
> Reporter: Kai Sasaki
> Assignee: Kai Sasaki
> Fix For: HDFS-7285
>
> Attachments: HDFS-8494-HDFS-7285-01.patch,
> HDFS-8494-HDFS-7285-02.patch
>
>
> It is necessary to remove hard-coded values inside NameNode configured in
> {{HdfsConstants}}. In this JIRA, we can remove {{chunkSize}} gracefully in
> favor of HDFS-8375.
> Because {{cellSize}} is now originally stored only in {{ErasureCodingZone}},
> {{BlockInfoStriped}} can receive {{cellSize}} in addition to {{ECSchema}}
> when its initialization.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)