[ 
https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-10419:
--------------------------------
    Attachment: Evolving NN using new block-container layer.pdf

I have attached a doc that describes how the existing NN can be modified to 
plug in the new block-container layer provided by HDFS-7240. Two key milestone 
are describe: First milestone is where the Container Map is kept in NN (gets us 
to almost 2x scalability since container map is 1/40th of original block map 
assuming an average actual block size of 50MB); this milestone does NOT require 
removing the FSN/BM lock. The 2nd milestone is where the container map and 
block management is completely removed which gets us to 2x scalability. After 
the 2nd milestone, the NN can be evolved in several directions for further 
scalability.


> Building HDFS on top of Ozone's storage containers
> --------------------------------------------------
>
>                 Key: HDFS-10419
>                 URL: https://issues.apache.org/jira/browse/HDFS-10419
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: Evolving NN using new block-container layer.pdf
>
>
> In HDFS-7240, Ozone defines storage containers to store both the data and the 
> metadata. The storage container layer provides an object storage interface 
> and aims to manage data/metadata in a distributed manner. More details about 
> storage containers can be found in the design doc in HDFS-7240.
> HDFS can adopt the storage containers to store and manage blocks. The general 
> idea is:
> # Each block can be treated as an object and the block ID is the object's key.
> # Blocks will still be stored in DataNodes but as objects in storage 
> containers.
> # The block management work can be separated out of the NameNode and will be 
> handled by the storage container layer in a more distributed way. The 
> NameNode will only manage the namespace (i.e., files and directories).
> # For each file, the NameNode only needs to record a list of block IDs which 
> are used as keys to obtain real data from storage containers.
> # A new DFSClient implementation talks to both NameNode and the storage 
> container layer to read/write.
> HDFS, especially the NameNode, can get much better scalability from this 
> design. Currently the NameNode's heaviest workload comes from the block 
> management, which includes maintaining the block-DataNode mapping, receiving 
> full/incremental block reports, tracking block states (under/over/miss 
> replicated), and joining every writing pipeline protocol to guarantee the 
> data consistency. These work bring high memory footprint and make NameNode 
> suffer from GC. HDFS-5477 already proposes to convert BlockManager as a 
> service. If we can build HDFS on top of the storage container layer, we not 
> only separate out the BlockManager from the NameNode, but also replace it 
> with a new distributed management scheme.
> The storage container work is currently in progress in HDFS-7240, and the 
> work proposed here is still in an experimental/exploring stage. We can do 
> this experiment in a feature branch so that people with interests can be 
> involved.
> A design doc will be uploaded later explaining more details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to