[
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263515#comment-16263515
]
Anu Engineer commented on HDFS-7240:
------------------------------------
h1. Ozone - Second community meeting
Time: Friday, November 17, 2017, at 4:00 pm PST
_Participants: Arpit Agarwal, Robert Boyd, Wei Chiu, Marton Elek, Anu Engineer,
Aaron Fabri, Manoj Govindassamy, Virajith Jalaparti, Aaron Myers, Jitendra
Pandey, Sanjay Radia, Chao Sun, Bharat Viswanadham, Andrew Wang, Lei (Eddy)
Xu, Wei Yan, Xiaoyu Yao. \[Apologies to anyone that I might have missed who
joined over phone\]_
We started the meeting with discussing the notions of ozone's block storage
layer and followed by the deep dive into the code.
We discussed the notions of the block layer, which is similar to HDFS block
layer, Ozone's container layer and how replication works via pipelines. Then we
did a code walk-thru of the ozone codebase, starting with KSM, SCM, Container
layer and Rest handler.
We had some technical questions about containers. Is the unit of replication
the containers, and if we can truncate a block that is already part of
containers, say block three inside a container. Both of these were answered in
affirmative, that the unit of the replication is indeed a container and you can
terminate block three inside the container without any issues.
Once we finished the technical discussion, we discussed some of the merge
issues; essentially the question was whether we should postpone the merge of
ozone into HDFS.
* Andrew Wang wanted to know how this would benefit the enterprise customers?.
** It was pointed out that customers can use the storage via a Hadoop
Compatible filesystem (FileSystem or FileContext), and more important, apps
such as Hive and spark, etc which those APIs will work (we are testing Hive
and Spark). In fact, all the data in ozone is expected to come via Hive, YARN,
Spark, etc. Making ozone work seamlessly via such Hadoop frameworks is very
important because it enables real customer use.
* ATM objected to the Ozone’s merge, as wanted to see the new block layer
integrated with the existing NN. He argued that creating the block layer is
just the first phase, and separation of block layer inside Namenode needs to be
done. He further argued that we should merge after Namenode block separation is
completely done.
** Sanjay refuted that a project of this size can only be implemented in
phases. Fixing HDFS’s scalability in a fundamental way requires fixing both the
Namespace layer and the block layer. We provide a simpler namespace (Key-Value)
as an intermediate step to allow real customer usage via spark and hive, and
also as a way of stabilizing the new block layer. This is a good consistency
point for integration to start working on integrating with a hierarchical
namespace of the NN.
* Aaron Fabbri was concerned that code is new and may not be stable and that
the support load for HDFS is quite high. This would further destabilize HDFS.
** Sanjay’s Response It was pointed out that the feature is not on by default
and that the code is in a separate module. Indeed new shareable parts like the
new netty protocol-engine in the DN will replace the old thread-based protocol
engine only with HDFS community’s blessing after it has been sufficiently been
tested via the Ozone path. Further, Ozone can be kept disabled if so desired
by a customer.
* ATM’s concern is that connecting the NN to the new block layer will require
separating the NSM/BM lock (a good thing to do) which is very hard to do.
** Sanjay’s response. This issue was raised and explained at yesterday’s
meeting. A very strong coupling was added between block layer and namespace
layer when we wrote the new block pipeline as part of the append work in 2010:
the block length of each replica at finalizing time, esp under failures, has to
be consistent. This is done in the central NN today (due to lack of raft/paxos
like protocol in the original block layer). The new block-container layer uses
the raft for consistency and no longer needs a central agent like the NN. Thus
the new block-container layer’s built-in consistent state management eliminates
this coupling and hence simplifies the separation of the lock.
> Object store in HDFS
> --------------------
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Jitendra Nath Pandey
> Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch,
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch,
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch,
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf,
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS.
> As part of the federation work (HDFS-1052) we separated block storage as a
> generic storage layer. Using the Block Pool abstraction, new kinds of
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]