[
https://issues.apache.org/jira/browse/HDFS-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209751#comment-15209751
]
Anu Engineer commented on HDFS-10195:
-------------------------------------
[~cnauroth] and [~jingzhao] Thanks for the review comments.
Both of you have excellent questions on why this code is structured
this way. Some of them are related to future patches and hence this would
be an excellent time to explain what I am thinking.
bq. I wonder if the interface/impl split is really necessary for some things
like ContainerLocationManager/ContainerLocationManagerImpl.
You are right, we could very well live without it. However given how passionate
in
HDFS world we are generally about block placement policy, I thought it was a
good idea to
introduce the flexibility from the word go. :D
bq. My current feeling is that the init/shutdown/getLock do not need to be APIs
in ContainerManager. They are more related to the ContainerManager
implementation instead of container operations.
The motivation for this comes from the Ozone version. In order to support
different versions of ozone side-by-side, the idea is to extend the dispatcher
interface {{container.common.ContainerDispatcher}} interface to take a
registration function.
Along with that we will modify the messages on wire to have which version of
ozone it wants to
talk to.This will allow us to lookup the dispatcherMap.get(version) to get the
right interface to dispatch to.
But this leads to the issue that most ozone versions will generally share lots
of code,
so ContainerManager became an interface with one generic implementation of
ContainerManagerImpl.
This should really be called {{genericContainerManagerImpl}}, I expect us to
specialize classes from this class
when we have to support future versions of ozone.
Hence OzoneMain is designed to work against an interface so that it is easy to
switch or add new ContainerImpl.
bq. In ContainerManagerImpl, maybe we do not need to use another two class
attributes for readLock and writeLock. How about letting ContainerManagerImpl
implement the RwLock interface?
Very good suggestion, I think it will make for a more cleaner code.
bq. IIUC the patch creates a leveldb instance for each container's metadata?
Can we use one single leveldb instance for all the container's metadata, to
avoid loading many db instances at the same time and/or the performance penalty
when loading a new db store. We can assign individual prefix for each
container's metadata keys to separate containers in the db
You are absolutely right, we create different DBs for each container. The
suggestion that you have is *exactly*
how we have done the {{OzoneMetadataManager}}. That is used in stand alone
version and MiniDFSCluster.
However the reason why we did *not* choose the same option in this case is two
fold.
* The replication pipeline would need to replicate by containers, and having
one single DB may not be the most
effective way to replicate. It might make sense for us to stream a full DB file
when a new node joins and that is
possible only in case of separate DB.
* Mostly different containers are mapped to different *pipelines*, so ideally
we should be having a DB per pipeline.
However that is an optimization we can deal with in future.
* I do share the concern that you expressed, would this many DB files scale
properly ?
In the KeyManagerImpl patch you will see that we will have a cache, which will
hold
(default 1024) levelDB handles. So in most cases if the read and writes are
coming into
a container -- then we will pay the cost opening a DB only during first access.
I am going to rely on SCM to make sure that datanodes does not have too many
containers.
bq. It will be helpful if we have javadoc explaining the content and format of
each file (data file, metadata file, and metadata db file etc.).
Sure will do.
bq. We also need to handle the case that the metadata file is missing. In that
case readContainerInfo may throw a NPE which cannot be caught by the current
catch section.
Will fix in the next patch, along with all the test code updates that chris
suggested.
Sorry for the length of reply, but all these questions are excellent and I
wanted to share my thoughts to make sure I am on the right path.
> Ozone: Add container persistence
> --------------------------------
>
> Key: HDFS-10195
> URL: https://issues.apache.org/jira/browse/HDFS-10195
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ozone
> Affects Versions: HDFS-7240
> Reporter: Anu Engineer
> Assignee: Anu Engineer
> Fix For: HDFS-7240
>
> Attachments: HDFS-10195-HDFS-7240.001.patch
>
>
> Adds file based persistence for containers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)