[ 
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407671#comment-13407671
 ] 

Aaron T. Myers commented on HDFS-3077:
--------------------------------------

I just finished a review of the latest patch. Overall it looks really good. 
Great test coverage, too.

Some comments:

# If the following is supposed to be a list of host:port pairs, I suggest we 
call it something other than "*.edits.dir". Also, if the default is just a 
path, is it really supposed to be a list of host:port pairs? Or is this comment 
supposed to be referring to DFS_JOURNALNODE_RPC_ADDRESS_KEY?
{code}
+  // This is a comma separated host:port list of addresses hosting the journal 
service
+  public static final String  DFS_JOURNALNODE_EDITS_DIR_KEY = 
"dfs.journalnode.edits.dir";
+  public static final String  DFS_JOURNALNODE_EDITS_DIR_DEFAULT = 
"/tmp/hadoop/dfs/journalnode/";
{code}
# Could use a class comment and method comments in AsyncLogger.
# Missing an @param comment for AsyncLoggerSet#createNewUniqueEpoch.
# I think this won't substitute in the correct hostname in a multi-node setup 
with host-based principal names:
{code}
+        SecurityUtil.getServerPrincipal(conf
+            .get(DFSConfigKeys.DFS_JOURNALNODE_USER_NAME_KEY),
+            NameNode.getAddress(conf).getHostName()) };
{code}
# In IPCLoggerChannel, I wonder if you also shouldn't ensure that httpPort is 
not yet set here:
{code}
// Fill in HTTP port. TODO: is there a more elegant place to put this?
 httpPort = ret.getHttpPort();
{code}
# Is there no need for IPCLoggerChannel to have a way of closing its associated 
proxy?
# Could use some comments in JNStorage.
# Seems a little odd that JNStorage relies on a few static functions of 
NNStorage. Is there some better place those functions could live?
# I don't understand why JNStorage#analyzeStorage locks the storage directory 
after formatting it. What, if anything, relies on that behavior? Where is it 
unlocked? Might want to add a comment explaining it.
# Patch needs to be rebased on trunk, e.g. PersistentLong was renamed to 
PersistentLongFile.
# This line kind of creeps me out in the constructor of the Journal class. 
Maybe make a no-args version of Storage#getStorageDir that asserts there's only 
one dir?
{code}
File currentDir = storage.getStorageDir(0).getCurrentDir();
{code}
# In general this patch seems to be mixing in protobufs in a few places where 
non-proto classes seem more appropriate, notably in the Journal and 
JournalNodeRpcServer classes. Perhaps we should create non-proto analogs for 
these protos and add translator methods?
# This seems really goofy. Just make another non-proto class and use a 
translator?
{code}
// Return the partial builder instead of the proto, since
{code}
# I notice that there's a few TODOs left in this patch. It would be useful to 
know which of these you think need to be fixed before we commit this for real, 
versus those you'd like to leave in and do as follow-ups.
# Instead of putting all of these classes in the o.a.h.hdfs.qjournal packages, 
I recommend you try to separate these out into o.a.h.hdfs.qjoural.client, which 
implements the NN side of things, and o.a.h.hdfs.qjournal.server, which 
implements the JN side of things. I think doing so would make it easier to 
navigate the code.
# Could definitely use some method comments in the Journal class.
# Recommend renaming Journal#journal to something like Journal#logEdits or 
Journal#writeEdits.
# In JournalNode#getOrCreateJournal, this log message could be more helpful: 
LOG.info("logDir: " + logDir);
# Seems like all of the timeouts in QuorumJournalManager should be configurable.
# I think you already have the config key to address this TODO in 
QJournalProtocolPB: // TODO: need to add a new principal for loggers
# s/BackupNode/JournalNode/g:
{code}
+ * Protocol used to journal edits to a remote node. Currently,
+ * this is used to publish edits from the NameNode to a BackupNode.
{code}
# Use an HTML comment in journalstatus.jsp, instead of Java comments within a 
code block.
# Could use some more content for the journalstatus.jsp page. :)
# A few spots in the tests you catch expected IOEs, but don't verify that you 
received the IOE you
 actually expect.
# Really solid tests overall, but how about one that actually works with HA? 
You currently have a test for two entirely separate NNs, but not one that uses 
an HA mini cluster.
                
> Quorum-based protocol for reading and writing edit logs
> -------------------------------------------------------
>
>                 Key: HDFS-3077
>                 URL: https://issues.apache.org/jira/browse/HDFS-3077
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: ha, name-node
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-3077-partial.txt, hdfs-3077.txt, hdfs-3077.txt, 
> qjournal-design.pdf, qjournal-design.pdf
>
>
> Currently, one of the weak points of the HA design is that it relies on 
> shared storage such as an NFS filer for the shared edit log. One alternative 
> that has been proposed is to depend on BookKeeper, a ZooKeeper subproject 
> which provides a highly available replicated edit log on commodity hardware. 
> This JIRA is to implement another alternative, based on a quorum commit 
> protocol, integrated more tightly in HDFS and with the requirements driven 
> only by HDFS's needs rather than more generic use cases. More details to 
> follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to