[ 
https://issues.apache.org/jira/browse/HDFS-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577545#comment-14577545
 ] 

Zhe Zhang commented on HDFS-8494:
---------------------------------

Thanks for the discussion folks. 

I remember discussing this in an offline meetup a while ago with [~jingzhao] 
and [~szetszwo]. The conclusion at that time was that we should avoid storing 
EC schema for each block. 

But I think the question is worth discussing again. A medium size cluster could 
have several 100M blocks. 4b a block translates to a few GB of memory overhead. 
On the other side of the tradeoff, as Walter analyzed, we need to go through a 
few hops to lookup the {{cellSize}}. The most expensive part is probably the 
path traversal and {{getECZone}}. So if we don't need to get {{cellSize}} 
frequently, I'd suggest keeping in in the zone. We can certainly add a public 
method in {{BlockInfoStriped}} to simplify the code.

> Remove hard-coded chunk size in favor of ECZone
> -----------------------------------------------
>
>                 Key: HDFS-8494
>                 URL: https://issues.apache.org/jira/browse/HDFS-8494
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: HDFS-7285
>            Reporter: Kai Sasaki
>            Assignee: Kai Sasaki
>             Fix For: HDFS-7285
>
>         Attachments: HDFS-8494-HDFS-7285-01.patch, 
> HDFS-8494-HDFS-7285-02.patch
>
>
> It is necessary to remove hard-coded values inside NameNode configured in 
> {{HdfsConstants}}. In this JIRA, we can remove {{chunkSize}} gracefully in 
> favor of HDFS-8375.
> Because {{cellSize}} is now originally stored only in {{ErasureCodingZone}}, 
> {{BlockInfoStriped}} can receive {{cellSize}} in addition to {{ECSchema}} 
> when its initialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to