[jira] [Commented] (HDFS-7240) Object store in HDFS

Konstantin Shvachko (JIRA) Sun, 12 Nov 2017 21:21:21 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249112#comment-16249112
 ]


Konstantin Shvachko commented on HDFS-7240:
-------------------------------------------

We had a F2F meeting with Ozone authors. Anu is publishing his notes. The focus 
was on the following issues:
* Should Ozone be a part of HDFS or a separate project?
* How Ozone can help addressing scalable RPC performance?
* Can Ozone be used as a block management layer for HDFS?
* Migration to Ozone from HDFS

h3. Ozone as a block management layer
I think we made pretty good progress in understanding the role of Ozone and the 
future of HDFS.
On large production Hadoop clusters such as LinkedIn's and others, traced via 
multiple publications, we see that
# We read 90% of data that we write. No cold metadata
# RPC load on the NameNode increases proportionally to the growth of storage, 
which is exponential.

Thus, the idea of a NameNode with partial namespace in memory does not fully 
solve these growth problems. Because a) it is still limited by the single NN 
performance, and b) we will still have to provision NN to keep most of the 
namespace in memory.

We came to the following high-level roadmap for evolving HDFS:
# NameNode with the block management delegated to Ozone layer. There is a 
prototype of such NN, which is believed to show 30-50% performance improvement. 
POC would be good.
# A single NameNode with namespace implemented as KV-collection. The 
KV-collection is partitionable in memory, which allows breaking the single lock 
restriction of current NN. Performance gains not measured yet.
# Split the KV-namespace into two or more physical NNs.

_Important requirement:_ we should provide a *no-data-copy migration of the 
clusters* along the entire transformation.
It is not feasible to DistCp a e.g. 100PB cluster, since it requires a 
prolonged down-time and is expensive - doubles the amount of hardware involved.
Thus, an upgrade should keep the data blocks on the same DataNodes, and may 
need to provide an offline tool to convert metadata (fsimage) to new format.

There is a lot to design here, but it looks to me like a gradual path from 
current single NN to distributed namespace architecture. So if people agree 
with the direction in general I'll be glad to create a Wiki page describing 
this intention so that folks could comment and discuss.
Could Ozone authors ([~anu], [~jnp], [~sanjay.radia]) please confirm our common 
understanding of the roadmap.

h3. Merging Ozone to HDFS
There are pros and cons to merging Ozone into Hadoop vs a separate project. The 
pros include (please expand):
* Code sharing
* Ozone should improve DataNode pipeline code
* Better testing for Ozone within Hadoop

Some cons:
* As part of HDFS it will need to support standard HDFS features, like 
security, snapshots, erasure codes, etc. While as a separate project it can 
implement them later
* As a separate project Ozone can benefit from frequent release cycles
* Bugs in Ozone can affect HDFS and vice versa
* Incompatible changes may be allowed in Ozone on early stages, but not allowed 
in Hadoop
* Rolling upgrades are required for HDFS, which may not be possible for Ozone
 
The roadmap above sets Ozone as a step to partitioned NameNode, which solves 
both RPC scalability and cluster growth problems for big Hadoop installations. 
This validates merging Ozone to Hadoop for me. Given the cons though I'm not 
sure when is the right time. I think we should at least have a design doc for 
security before merging in order to avoid API changes.

> Object store in HDFS
> --------------------
>
>                 Key: HDFS-7240
>                 URL: https://issues.apache.org/jira/browse/HDFS-7240
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jitendra Nath Pandey
>            Assignee: Jitendra Nath Pandey
>         Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-7240) Object store in HDFS

Reply via email to