[
https://issues.apache.org/jira/browse/HDFS-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230738#comment-14230738
]
Haohui Mai commented on HDFS-7437:
----------------------------------
In the current implementation, there is implicit dependency between
{{INodeFile}} and the block management layer. An {{INodeFile}} instance
contains a list of {{BlockInfo}} objects which identifies the blocks that the
file contains. These {{BlockInfo}} objects also contains information of (1) the
locations of the blocks on DNs, and (2) the pipeline-related state of the block
(e.g., {{BlockInfoUnderConstruction}}).
The v8 patch is a combined patch that breaks the implicit dependency between
{{INodeFile}} and the block management layer. This effort is a prerequisite
step to allow block management layer, such as standalone block manager
(HDFS-5477), off-heap data structures for block management (HDFS-7244).
The scope of the changes are the following:
* An {{BlockInfo}} object contains the inode id of the {{INodeFile}} instead of
the reference of the {{INodeFile}} directly. The object also stores the
replication factor, while in the current implementation it is available through
{{BlockCollection#getReplication()}}.
* An {{INodeFile}} object stores the {{Block}} objects instead of {{BlockInfo}}
objects. A {{Block}} object only contains the block id, size and the generation
stamp of the block.
* When operations need information that is previously available from the
{{BlockInfo}} objects stored in {{INodeFile}}, they have to look up the
information by calling {{BlockManager#getStoredBlock()}}.
* Information stored in corresponding {{Block}} / {{BlockInfo}} pairs, such as
size of the blocks and generation stamps are updated consistently.
> Storing block ids instead of BlockInfo object in INodeFile
> ----------------------------------------------------------
>
> Key: HDFS-7437
> URL: https://issues.apache.org/jira/browse/HDFS-7437
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Haohui Mai
> Assignee: Haohui Mai
> Attachments: HDFS-7437.000.patch, HDFS-7437.001.patch,
> HDFS-7437.002.patch, HDFS-7437.003.patch, HDFS-7437.004.patch,
> HDFS-7437.005.patch, HDFS-7437.006.patch, HDFS-7437.007.patch,
> HDFS-7437.008.patch
>
>
> Currently {{INodeFile}} stores the lists of blocks as references of
> {{BlockInfo}} instead of the block ids. This creates implicit dependency
> between the namespace and the block manager.
> The dependency blocks several recent efforts, such as separating the block
> manager out as a standalone service, moving block information off heap, and
> optimizing the memory usage of block manager.
> This jira proposes to decouple the dependency by storing block ids instead of
> object reference in {{INodeFile}} objects.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)