[ 
https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-7964:
------------------------------
    Attachment: HDFS-7964.patch

The change is simpler than it appears.  We've been running with this patch on 
2.6 at production load since early this year.

{{FSEditLog}} required minor changes to split a few methods to allow overrides 
in subclasses.  No functional changes.  There is zero-risk.

{{FSEditLogAsync}} manages a queue and thread for syncing.  For RPC requests, 
logEdit adds to the queue, and logSync is a no-op.  The thread may immediately 
service another call.  However, the prior calls response is postponed so the 
IPC machinery will not send the response when the handler thread completes.  
The sync thread will trigger the response after sync'ing.

The thread-local edit log op cache must be disabled for async behavior.  The 
cache has been altered such that disabling it returns new instances every time. 
 This is done by adding the edit op's class to {{FSEditLogOpCodes}} so the 
class can be instantiated.  The enabled cache's enum map is now trivial to 
build.

The sync thread is designed to maximize the transactions per sync.  It will 
consume queued edits and call logEdit, but not logSync, until the queue runs 
dry or the edit log stream requires a sync (the rate of edits is so high, or IO 
is so slow that maximizing the batches is desirable).

Many tests involving edits logs have been parameterized to run with async edit 
logging off & on.  I've run all tests with async on.

> Add support for async edit logging
> ----------------------------------
>
>                 Key: HDFS-7964
>                 URL: https://issues.apache.org/jira/browse/HDFS-7964
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HDFS-7964.patch
>
>
> Edit logging is a major source of contention within the NN.  LogEdit is 
> called within the namespace write log, while logSync is called outside of the 
> lock to allow greater concurrency.  The handler thread remains busy until 
> logSync returns to provide the client with a durability guarantee for the 
> response.
> Write heavy RPC load and/or slow IO causes handlers to stall in logSync.  
> Although the write lock is not held, readers are limited/starved and the call 
> queue fills.  Combining an edit log thread with postponed RPC responses from 
> HADOOP-10300 will provide the same durability guarantee but immediately free 
> up the handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to