[ https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403116#comment-16403116 ]
Sanjay Radia commented on HDFS-10419: ------------------------------------- In the " [VOTE] Merging branch HDFS-7240 to trunk" thread [~andrew.wang] asked: {quote}*Sanjay says*: >- NN on top HDSL where the NN uses the new block layer (Both Daryn and Owen >acknowledge the >benefit of the >>new block layer). We have two choices here >** a) Evolve NN so that it can interact with both old and new block layer, >** b) Fork and create new NN that works only with new block layer, the old NN will continue to work with old >>block layer. >There are trade-offs but clearly the 2nd option has least impact on the old >HDFS code. *Andrew asks*: Are you proposing that we pursue the 2nd option to integrate HDSL with HDFS? {quote} Originally I would have preferred (a), but Owen made a strong case for (b) in my discussions with his last week. I believe approach (a) or (b) will depend strongly on what we want to do. For example if we do milestone-1 and get the 2x scalability and decide to stop there then clearly go with option (a) - it will require little refactoring and one can run old and new HDFS side-by-side. If you are planning to follow up milestone-1 with say the caching the working set of the namespace, then forking the NN code (ie option b) might be better, and the new NN will have to keep pulling over features and bug fixes from the old NN.. Konstantine has proposed other alternatives and we would evaluate (a) or (b) for his alternative. I am not locked into any particular path or how we would do it. > Building HDFS on top of new storage layer (HDSL) > ------------------------------------------------ > > Key: HDFS-10419 > URL: https://issues.apache.org/jira/browse/HDFS-10419 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Jing Zhao > Assignee: Jing Zhao > Priority: Major > Attachments: Evolving NN using new block-container layer.pdf > > > In HDFS-7240, Ozone defines storage containers to store both the data and the > metadata. The storage container layer provides an object storage interface > and aims to manage data/metadata in a distributed manner. More details about > storage containers can be found in the design doc in HDFS-7240. > HDFS can adopt the storage containers to store and manage blocks. The general > idea is: > # Each block can be treated as an object and the block ID is the object's key. > # Blocks will still be stored in DataNodes but as objects in storage > containers. > # The block management work can be separated out of the NameNode and will be > handled by the storage container layer in a more distributed way. The > NameNode will only manage the namespace (i.e., files and directories). > # For each file, the NameNode only needs to record a list of block IDs which > are used as keys to obtain real data from storage containers. > # A new DFSClient implementation talks to both NameNode and the storage > container layer to read/write. > HDFS, especially the NameNode, can get much better scalability from this > design. Currently the NameNode's heaviest workload comes from the block > management, which includes maintaining the block-DataNode mapping, receiving > full/incremental block reports, tracking block states (under/over/miss > replicated), and joining every writing pipeline protocol to guarantee the > data consistency. These work bring high memory footprint and make NameNode > suffer from GC. HDFS-5477 already proposes to convert BlockManager as a > service. If we can build HDFS on top of the storage container layer, we not > only separate out the BlockManager from the NameNode, but also replace it > with a new distributed management scheme. > The storage container work is currently in progress in HDFS-7240, and the > work proposed here is still in an experimental/exploring stage. We can do > this experiment in a feature branch so that people with interests can be > involved. > A design doc will be uploaded later explaining more details. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org