[ 
https://issues.apache.org/jira/browse/HDFS-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230738#comment-14230738
 ] 

Haohui Mai commented on HDFS-7437:
----------------------------------

In the current implementation, there is implicit dependency between 
{{INodeFile}} and the block management layer. An {{INodeFile}} instance 
contains a list of {{BlockInfo}} objects which identifies the blocks that the 
file contains. These {{BlockInfo}} objects also contains information of (1) the 
locations of the blocks on DNs, and (2) the pipeline-related state of the block 
(e.g., {{BlockInfoUnderConstruction}}).

The v8 patch is a combined patch that breaks the implicit dependency between 
{{INodeFile}} and the block management layer. This effort is a prerequisite 
step to allow block management layer, such as standalone block manager 
(HDFS-5477), off-heap data structures for block management (HDFS-7244).

The scope of the changes are the following:

* An {{BlockInfo}} object contains the inode id of the {{INodeFile}} instead of 
the reference of the {{INodeFile}} directly. The object also stores the 
replication factor, while in the current implementation it is available through 
{{BlockCollection#getReplication()}}.
* An {{INodeFile}} object stores the {{Block}} objects instead of {{BlockInfo}} 
objects. A {{Block}} object only contains the block id, size and the generation 
stamp of the block.
* When operations need information that is previously available from the 
{{BlockInfo}} objects stored in {{INodeFile}}, they have to look up the 
information by calling {{BlockManager#getStoredBlock()}}.
* Information stored in corresponding {{Block}} / {{BlockInfo}} pairs, such as 
size of the blocks and generation stamps are updated consistently.



> Storing block ids instead of BlockInfo object in INodeFile
> ----------------------------------------------------------
>
>                 Key: HDFS-7437
>                 URL: https://issues.apache.org/jira/browse/HDFS-7437
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haohui Mai
>            Assignee: Haohui Mai
>         Attachments: HDFS-7437.000.patch, HDFS-7437.001.patch, 
> HDFS-7437.002.patch, HDFS-7437.003.patch, HDFS-7437.004.patch, 
> HDFS-7437.005.patch, HDFS-7437.006.patch, HDFS-7437.007.patch, 
> HDFS-7437.008.patch
>
>
> Currently {{INodeFile}} stores the lists of blocks as references of 
> {{BlockInfo}} instead of the block ids. This creates implicit dependency 
> between the namespace and the block manager.
> The dependency blocks several recent efforts, such as separating the block 
> manager out as a standalone service, moving block information off heap, and 
> optimizing the memory usage of block manager.
> This jira proposes to decouple the dependency by storing block ids instead of 
> object reference in {{INodeFile}} objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to