[
https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wang XL updated HDFS-10419:
---------------------------
Description:
In HDFS-7240, Ozone defines storage containers to store both the data and the
metadata. The storage container layer provides an object storage interface and
aims to manage data/metadata in a distributed manner. More details about
storage containers can be found in the design doc in HDFS-7240.
HDFS can adopt the storage containers to store and manage blocks. The general
idea is:
# Each block can be treated as an object and the block ID is the object's key.
# Blocks will still be stored in DataNodes but as objects in storage
containers.
# The block management work can be separated out of the NameNode and will be
handled by the storage container layer in a more distributed way. The NameNode
will only manage the namespace (i.e., files and directories).
# For each file, the NameNode only needs to record a list of block IDs which
are used as keys to obtain real data from storage containers.
# A new DFSClient implementation talks to both NameNode and the storage
container layer to read/write.
HDFS, especially the NameNode, can get much better scalability from this
design. Currently the NameNode's heaviest workload comes from the block
management, which includes maintaining the block-DataNode mapping, receiving
full/incremental block reports, tracking block states (under/over/miss
replicated), and joining every writing pipeline protocol to guarantee the data
consistency. These work bring high memory footprint and make NameNode suffer
from GC. HDFS-5477 already proposes to convert BlockManager as a service. If we
can build HDFS on top of the storage container layer, we not only separate out
the BlockManager from the NameNode, but also replace it with a new distributed
management scheme.
The storage container work is currently in progress in HDFS-7240, and the work
proposed here is still in an experimental/exploring stage. We can do this
experiment in a feature branch so that people with interests can be involved.
A design doc will be uploaded later explaining more details.
was:
In HDFS-7240, Ozone defines storage containers to store both the data and the
metadata. The storage container layer provides an object storage interface and
aims to manage data/metadata in a distributed manner. More details about
storage containers can be found in the design doc in HDFS-7240.
HDFS can adopt the storage containers to store and manage blocks. The general
idea is:
# Each block can be treated as an object and the block ID is the object's key.
# Blocks will still be stored in DataNodes but as objects in storage containers.
# The block management work can be separated out of the NameNode and will be
handled by the storage container layer in a more distributed way. The NameNode
will only manage the namespace (i.e., files and directories).
# For each file, the NameNode only needs to record a list of block IDs which
are used as keys to obtain real data from storage containers.
# A new DFSClient implementation talks to both NameNode and the storage
container layer to read/write.
HDFS, especially the NameNode, can get much better scalability from this
design. Currently the NameNode's heaviest workload comes from the block
management, which includes maintaining the block-DataNode mapping, receiving
full/incremental block reports, tracking block states (under/over/miss
replicated), and joining every writing pipeline protocol to guarantee the data
consistency. These work bring high memory footprint and make NameNode suffer
from GC. HDFS-5477 already proposes to convert BlockManager as a service. If we
can build HDFS on top of the storage container layer, we not only separate out
the BlockManager from the NameNode, but also replace it with a new distributed
management scheme.
The storage container work is currently in progress in HDFS-7240, and the work
proposed here is still in an experimental/exploring stage. We can do this
experiment in a feature branch so that people with interests can be involved.
A design doc will be uploaded later explaining more details.
> Building HDFS on top of new storage layer (HDDS)
> ------------------------------------------------
>
> Key: HDFS-10419
> URL: https://issues.apache.org/jira/browse/HDFS-10419
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Jing Zhao
> Assignee: Jing Zhao
> Priority: Major
> Attachments: Evolving NN using new block-container layer.pdf
>
>
> In HDFS-7240, Ozone defines storage containers to store both the data and the
> metadata. The storage container layer provides an object storage interface
> and aims to manage data/metadata in a distributed manner. More details about
> storage containers can be found in the design doc in HDFS-7240.
> HDFS can adopt the storage containers to store and manage blocks. The general
> idea is:
> # Each block can be treated as an object and the block ID is the object's
> key.
> # Blocks will still be stored in DataNodes but as objects in storage
> containers.
> # The block management work can be separated out of the NameNode and will be
> handled by the storage container layer in a more distributed way. The
> NameNode will only manage the namespace (i.e., files and directories).
> # For each file, the NameNode only needs to record a list of block IDs which
> are used as keys to obtain real data from storage containers.
> # A new DFSClient implementation talks to both NameNode and the storage
> container layer to read/write.
> HDFS, especially the NameNode, can get much better scalability from this
> design. Currently the NameNode's heaviest workload comes from the block
> management, which includes maintaining the block-DataNode mapping, receiving
> full/incremental block reports, tracking block states (under/over/miss
> replicated), and joining every writing pipeline protocol to guarantee the
> data consistency. These work bring high memory footprint and make NameNode
> suffer from GC. HDFS-5477 already proposes to convert BlockManager as a
> service. If we can build HDFS on top of the storage container layer, we not
> only separate out the BlockManager from the NameNode, but also replace it
> with a new distributed management scheme.
> The storage container work is currently in progress in HDFS-7240, and the
> work proposed here is still in an experimental/exploring stage. We can do
> this experiment in a feature branch so that people with interests can be
> involved.
> A design doc will be uploaded later explaining more details.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]