[
https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961593#comment-14961593
]
Jing Zhao commented on HDFS-7964:
---------------------------------
Thanks for rebasing the patch, Daryn. The patch looks good to me. Some minor
comments:
# The following code uses whether the current thread holds the monitor to
decide whether the edit should be async/sync. This way may be not direct to
follow, also make it hard to guarantee the correctness of future code. Can we
simply make the decision based on the op itself?
{code}
// only rpc calls not explicitly sync'ed on the log will be async.
if (rpcCall != null && !Thread.holdsLock(this)) {
edit = new AsyncEdit(this, op, rpcCall);
} else {
edit = new SyncEdit(this, op);
}
{code}
# If requests keeps coming but the traffic is slow, the sync will happen only
when the buffer is full, which means the response may be delayed? This may be a
rare case in practice but maybe we should avoid it here. Can we make each
iteration of the loop either fill the buffer or drain the pending queue?
{code}
if (edit != null) {
// sync if requested by edit log.
doSync = edit.logEdit();
syncWaitQ.add(edit);
} else {
// sync when editq runs dry, but have edits pending a sync.
doSync = !syncWaitQ.isEmpty();
}
{code}
# The class InvalidOp has not been used. We can either remove it or use it in
{{OP_INVALID}}.
# Maybe we can do some further cleanup for {{RollingUpgradeOp}}. E.g., after
adding classes like {{RollingUpgradeStartOp}} and {{RollingUpgradeFinalizeOp}},
we can put {{getInstance}} methods there and remove {{getStartInstance}} and
{{getFinalizeInstance}}.
# Is the main reason of having {{OpInstanceCache#get}} to minimize the code
change?
# It will be helpful to add a comment to explain the calculation logic of
{{editsBatchedInSync}}.
> Add support for async edit logging
> ----------------------------------
>
> Key: HDFS-7964
> URL: https://issues.apache.org/jira/browse/HDFS-7964
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Affects Versions: 2.0.2-alpha
> Reporter: Daryn Sharp
> Assignee: Daryn Sharp
> Attachments: HDFS-7964.patch, HDFS-7964.patch
>
>
> Edit logging is a major source of contention within the NN. LogEdit is
> called within the namespace write log, while logSync is called outside of the
> lock to allow greater concurrency. The handler thread remains busy until
> logSync returns to provide the client with a durability guarantee for the
> response.
> Write heavy RPC load and/or slow IO causes handlers to stall in logSync.
> Although the write lock is not held, readers are limited/starved and the call
> queue fills. Combining an edit log thread with postponed RPC responses from
> HADOOP-10300 will provide the same durability guarantee but immediately free
> up the handlers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)