[jira] [Updated] (HDFS-5241) Provide alternate queuing audit logger to reduce logging contention

Daryn Sharp (JIRA) Tue, 24 Sep 2013 15:53:20 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Daryn Sharp updated HDFS-5241:
------------------------------

    Attachment: HDFS-5241.patch

No tests, requesting feedback before investing the time.

Provides an option to enable async logging via a single background thread.  The 
performance gains are impressive under an ideal read heavy load:
* fair lock = 26k op/s
* unfair lock = 58k op/s
* unfair lock + unbuffered appender = 120k ops/sec

A single thread consuming log messages from a queue populated by the 100 rpc 
handlers is sufficient to improve performance.  Additional threads showed no 
significant improvement.

The problem is 100 threads colliding on log4j's synch'ed method.  The 
contention is so high and the logging call takes enough time, that the thread's 
futex has to call into the kernel.  The context switch and rescheduling wait 
ruins performance.  By comparison, the time spent waiting to add a log message 
to the queue is negligible.  The futexes stay in userland.

The performance sweet spot is a queue sized to the number of handlers.  As long 
as the background thread can log messages faster than a handler can process the 
next call, the handler is guaranteed a spot in the queue w/o a context switch.

It's a configurable undocumented option for now since the audit log becomes 
prone to data loss and slight offset of timestamps.

The call queue tends to run relatively dry so I expect my other connection 
handling patches like HADOOP-9956 will have a larger impact.


                
> Provide alternate queuing audit logger to reduce logging contention
> -------------------------------------------------------------------
>
>                 Key: HDFS-5241
>                 URL: https://issues.apache.org/jira/browse/HDFS-5241
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HDFS-5241.patch
>
>
> The default audit logger has extremely poor performance.  The internal 
> synchronization of log4j causes massive contention between the call handlers 
> (100 by default) which drastically limits the throughput of the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5241) Provide alternate queuing audit logger to reduce logging contention

Reply via email to