[ 
https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961593#comment-14961593
 ] 

Jing Zhao commented on HDFS-7964:
---------------------------------

Thanks for rebasing the patch, Daryn. The patch looks good to me. Some minor 
comments:
# The following code uses whether the current thread holds the monitor to 
decide whether the edit should be async/sync. This way may be not direct to 
follow, also make it hard to guarantee the correctness of future code. Can we 
simply make the decision based on the op itself?
{code}
    // only rpc calls not explicitly sync'ed on the log will be async.
    if (rpcCall != null && !Thread.holdsLock(this)) {
      edit = new AsyncEdit(this, op, rpcCall);
    } else {
      edit = new SyncEdit(this, op);
    }
{code}
# If requests keeps coming but the traffic is slow, the sync will happen only 
when the buffer is full, which means the response may be delayed? This may be a 
rare case in practice but maybe we should avoid it here. Can we make each 
iteration of the loop either fill the buffer or drain the pending queue?
{code}
        if (edit != null) {
          // sync if requested by edit log.
          doSync = edit.logEdit();
          syncWaitQ.add(edit);
        } else {
          // sync when editq runs dry, but have edits pending a sync.
          doSync = !syncWaitQ.isEmpty();
        }
{code}
# The class InvalidOp has not been used. We can either remove it or use it in 
{{OP_INVALID}}.
# Maybe we can do some further cleanup for {{RollingUpgradeOp}}. E.g., after 
adding classes like {{RollingUpgradeStartOp}} and {{RollingUpgradeFinalizeOp}}, 
we can put {{getInstance}} methods there and remove {{getStartInstance}} and 
{{getFinalizeInstance}}.
# Is the main reason of having {{OpInstanceCache#get}} to minimize the code 
change?
# It will be helpful to add a comment to explain the calculation logic of 
{{editsBatchedInSync}}.

> Add support for async edit logging
> ----------------------------------
>
>                 Key: HDFS-7964
>                 URL: https://issues.apache.org/jira/browse/HDFS-7964
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HDFS-7964.patch, HDFS-7964.patch
>
>
> Edit logging is a major source of contention within the NN.  LogEdit is 
> called within the namespace write log, while logSync is called outside of the 
> lock to allow greater concurrency.  The handler thread remains busy until 
> logSync returns to provide the client with a durability guarantee for the 
> response.
> Write heavy RPC load and/or slow IO causes handlers to stall in logSync.  
> Although the write lock is not held, readers are limited/starved and the call 
> queue fills.  Combining an edit log thread with postponed RPC responses from 
> HADOOP-10300 will provide the same durability guarantee but immediately free 
> up the handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to