[ 
https://issues.apache.org/jira/browse/HDFS-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209751#comment-15209751
 ] 

Anu Engineer commented on HDFS-10195:
-------------------------------------

[~cnauroth] and [~jingzhao] Thanks for the review comments.

Both of you have excellent questions on why this code is structured
this way. Some of them are related to future patches and hence this would
be an excellent time to explain what I am thinking.


bq. I wonder if the interface/impl split is really necessary for some things 
like ContainerLocationManager/ContainerLocationManagerImpl. 

You are right, we could very well live without it. However given how passionate 
in 
HDFS world we are generally about block placement policy, I thought it was a 
good idea to 
introduce the flexibility from the word go. :D

bq. My current feeling is that the init/shutdown/getLock do not need to be APIs 
in ContainerManager. They are more related to the ContainerManager 
implementation instead of container operations.

The motivation for this comes from the Ozone version. In order to support
different versions of ozone side-by-side, the idea is to extend the dispatcher
interface {{container.common.ContainerDispatcher}} interface to take a 
registration function.
Along with that we will modify the messages on wire to have which version of 
ozone it wants to 
talk to.This will allow us to lookup the dispatcherMap.get(version) to get the 
right interface to  dispatch to.

But this leads to the issue that most ozone versions will generally share lots 
of code,
so ContainerManager became an interface with one generic implementation of 
ContainerManagerImpl. 
This should really be called {{genericContainerManagerImpl}}, I expect us to 
specialize classes from this class 
when we have to support future versions of ozone.

Hence OzoneMain is designed to work against an interface so that it is easy to 
switch or add new ContainerImpl.

bq. In ContainerManagerImpl, maybe we do not need to use another two class 
attributes for readLock and writeLock. How about letting ContainerManagerImpl 
implement the RwLock interface?

Very good suggestion, I think it will make for a more cleaner code.

bq. IIUC the patch creates a leveldb instance for each container's metadata? 
Can we use one single leveldb instance for all the container's metadata, to 
avoid loading many db instances at the same time and/or the performance penalty 
when loading a new db store. We can assign individual prefix for each 
container's metadata keys to separate containers in the db

You are absolutely right, we create different DBs for each container. The 
suggestion that you have is *exactly* 
how we have done the {{OzoneMetadataManager}}. That is used in stand alone 
version and MiniDFSCluster. 
However the reason why we did *not* choose the same option in this case is two 
fold.

* The replication pipeline would need to replicate by containers, and having 
one single DB may not be the most 
effective way to replicate. It might make sense for us to stream a full DB file 
when a new node joins and that is 
possible only in case of separate DB.

* Mostly different containers are mapped to different *pipelines*, so ideally 
we should be having a DB per pipeline. 
However that is an optimization we can deal with in future.

* I do share the concern that you expressed, would this many DB files scale 
properly ? 
In the KeyManagerImpl patch you will see that we will have a cache, which will 
hold
(default 1024) levelDB  handles. So in most cases if the read and writes are 
coming into 
a container -- then we will pay the cost opening a DB only during first access. 
I am going to rely on SCM to make sure that datanodes does not have too many 
containers.

bq. It will be helpful if we have javadoc explaining the content and format of 
each file (data file, metadata file, and metadata db file etc.).

Sure will do.

bq. We also need to handle the case that the metadata file is missing. In that 
case readContainerInfo may throw a NPE which cannot be caught by the current 
catch section.

Will fix in the next patch, along with all the test code updates that chris 
suggested.

Sorry for the length of reply, but all these questions are excellent and I 
wanted to share my thoughts to make sure I am on the right path.

> Ozone: Add container persistence
> --------------------------------
>
>                 Key: HDFS-10195
>                 URL: https://issues.apache.org/jira/browse/HDFS-10195
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>    Affects Versions: HDFS-7240
>            Reporter: Anu Engineer
>            Assignee: Anu Engineer
>             Fix For: HDFS-7240
>
>         Attachments: HDFS-10195-HDFS-7240.001.patch
>
>
> Adds file based persistence for containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to