[ 
https://issues.apache.org/jira/browse/HADOOP-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471764
 ] 

Raghu Angadi commented on HADOOP-1003:
--------------------------------------


> b) Another Server thread that waits for pending commits to be synced and 
> replies back to clients. 

 This extra thread is not required. IPC threads can do the job.


> Proposal to batch commits to edits log.
> ---------------------------------------
>
>                 Key: HADOOP-1003
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1003
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Raghu Angadi
>         Assigned To: Sameer Paranjpye
>
> Right now most expensive namenode operations are that require commits to 
> edits log. e.g. creating a file, deleting, renaming etc. Most of the time is 
> spent in fsync() of edits file (multiple fsync() calls in the case of 
> multiple image directories). During this time whole namesystem is under lock 
> and even non-mutating operations like open() are blocked.
> On a local filesystem, each fsync could take in the order of milliseconds. My 
> understanding is that guarantee namenode provides is that edits log is synced 
> before replying to the client. Without any changes to  current locking 
> structure, I was thinking of the following for batching multiple edits : 
>      a) a facility in RPC Server to postpone responding to a particular call 
> (communication with ThreadLocals may be). This is strictly not required but 
> without it, number operations batched would be limited to number of IPC 
> threads.
>      b) Another Server thread that waits for pending commits to be synced and 
> replies back to clients. 
>      c)  fsync manager that periodically syncs the edit log and informs 
> waiting RPCs. The sync thread can dynamically decide to wait longer or 
> shorter based on the load so that we don't increase the latency when namenode 
> is lightly loaded. Event simple policy of 'sync if there are any mutations' 
> will also work but that might reduce the hard disk life.
>  
> All the synchronization between these threads is a bit complicated but it can 
> be stable. My main concern is whether the guarantee we are providing enough 
> for namenode operation. I think it is enough.  
> In terms of throughput, number of creates a namenode can do should be on the 
> same range as number of opens it can do.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to