[ 
https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14967625#comment-14967625
 ] 

Jing Zhao commented on HDFS-7964:
---------------------------------

Thanks for updating the patch, Daryn.

bq. It's ensuring correctness by preventing a deadlock with the background 
thread. IIRC, there is also a call synchronized on the edit log call that must 
know the current txid (rolling?) which isn't possible when async.

Currently I only find startSegment and endSegment holding the lock and thus 
SyncEdit will be created for them. These two calls also requires the sync 
semantically. Thus I'm thinking if we can change the condition 
{{!Thread.holdsLock(this)}} to {{if op is not start/endSegment}}. We can still 
add an assertion to make sure the lock is held when creating SyncEdits and not 
held when creating AsyncEdits. Did I miss something here?

bq. Do you mean drain only as many edits from the pending queue as were present 
at the beginning of the cycle?

My original concern was mainly about the latency for a single request when 
there is not a lot of traffic to the NN. {{doSync = edit.logEdit()}} means the 
sync only happens when the buffer is full, thus if the request keeps coming 
into the pending queue (thus editPendingQ is always non-empty) the first 
request needs to wait for several extra iterations until the buffer is filled. 
But actually this extra latency is small so the current code should be fine.

> Add support for async edit logging
> ----------------------------------
>
>                 Key: HDFS-7964
>                 URL: https://issues.apache.org/jira/browse/HDFS-7964
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch
>
>
> Edit logging is a major source of contention within the NN.  LogEdit is 
> called within the namespace write log, while logSync is called outside of the 
> lock to allow greater concurrency.  The handler thread remains busy until 
> logSync returns to provide the client with a durability guarantee for the 
> response.
> Write heavy RPC load and/or slow IO causes handlers to stall in logSync.  
> Although the write lock is not held, readers are limited/starved and the call 
> queue fills.  Combining an edit log thread with postponed RPC responses from 
> HADOOP-10300 will provide the same durability guarantee but immediately free 
> up the handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to