[jira] [Commented] (LOGCXX-500) Logging in Timing-Critical Applications

Denys Smolianiuk (JIRA) Thu, 23 Aug 2018 07:47:18 -0700


    [ 
https://issues.apache.org/jira/browse/LOGCXX-500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590335#comment-16590335
 ]


Denys Smolianiuk commented on LOGCXX-500:
-----------------------------------------

Submitted new PR with such change. Existing Windows projects should build as if 
this option is disabled.

Thanks,

Denys Smolianiuk

 

> Logging in Timing-Critical Applications
> ---------------------------------------
>
>                 Key: LOGCXX-500
>                 URL: https://issues.apache.org/jira/browse/LOGCXX-500
>             Project: Log4cxx
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.10.0
>            Reporter: Thorsten Schöning
>            Priority: Minor
>         Attachments: config.xml, main.cpp, non_blocking.diff, 
> non_blocking_wo_debian_control.diff
>
>
> The following has been arrived on the mailing list, providing it here as well 
> mainly to additionally collect the given patches etc.:
> {quote}Hello All,
> I'd like to share some experience as well as some patches with regard 
> to using log4cxx in timing-critical application. First few words about 
> our requirements: it's a service which must generate some network 
> packets with up to hundred of microseconds precision. Thus, it's very 
> important to have predictable code timing. One can argue that log4cxx 
> is not very well suited for such applications, but surprisingly it 
> works pretty well after some light tuning.
> So, what were the issues?
> Basically from library user's point of view they looked the same: one 
> of a sudden logging done with LOG4CXX_DEBUG() macro could take 
> unexpectedly long time to complete. For example the same trace which 
> takes several μs for 99% of the time would take hundreds microseconds 
> and even few milliseconds sometimes. After further investigation this 
> has been traced down to few of the root-causes:
> 1. Asyns logger (which we have been using of course) has internal queue 
> to pass log entries to background disk-writer thread. This queue is 
> mutex-protected which might seem fine unless you think a little bit 
> more about it. First of all, someone calling LOG4CXX_DEBUG() to simply 
> put something into the log might not expect to be blocked inside 
> waiting for a mutex at all. Second point is that, although there were 
> measures taken to minimize time disk-thread holds that lock, 
> OS-schedulers often work in a way that thread which is blocked on a 
> mutex gets de-scheduled. With normal OS-scheduler quantum that means 
> that the logging thread can be preempted for milliseconds.
> 2. There are some mutexes protecting internal states of both loggers 
> and appenders. This means that two separate threads calling 
> LOG4CXX_DEBUG() can block each other. Even if they are using different 
> loggers they would block on appender! This has the same consequences 
> for execution timing and the performance as described above.
> 3. std::stringstream class constructor has some internal locks on it's 
> own. Unfortunately each MessageBuffer has it's own instance of this 
> class. And also unfortunately MessageBuffer is created inside 
> LOG4CXX_DEBUG() macro. There is optimization to not create stringstream 
> for logging simple strings, but as soon as your log statement has 
> single '<<' operator it's created.
> 4. Dynamic memory allocations. Unfortunately there are still quite few 
> of them even though memory pool is used in some other places. Thus, 
> hidden calls to new and malloc induce unpredictable delays.
> So, what we did to mitigate these problems?
> 1. Natural solution for this issue was to use atomic queue. There are 
> few of them available, but we made use of boost::lockfree::queue as it 
> can serve as a drop-in replacement allowing us to keep all present 
> functionality.
> 2. After looking more into the code it has appeared that two concurrent 
> calls to LOG4CXX_DEBUG() from within different threads are not harmful 
> because internal structures of logger and appender are not being 
> changed there. What only really requires protection is concurrency 
> between logging and configuring. Thus, we came to a solution - 
> read-write locks where logging calls act as readers and 
> configuration/exiting calls are writers. With such approach multiple 
> threads calling LOG4CXX_DEBUG() became free of any contention.
> 3. This problem also has one simple solution - make one static 
> std::stringstream object per thread using std::thread_local. 
> Unfortunately we found one drawback - thread_local memory is not 
> released if thread is not detached or joined. As there is some code 
> which does neither of this we made static stringstream a xml file 
> configuration option. Also, there could be an issue with using multiple 
> MessageBuffer instances from within single thread, but LOG4CXX_DEBUG() 
> is not doing that.
> 4. At this time we didn't do anything to address dynamic memory 
> allocation issue.
> So, if you want to give our patch a try it is attached. It's based on 
> log4cxx 0.10.0-12 as found in Debian. There are new SHARED_MUTEX and 
> LOCK_R/LOCK_W macros defined in mutex.h and synchronized.h for easy 
> switching between RW and convenient mutexes for benchmarking. Also, 
> there is an test application which spawns two threads doing logging 
> with some small sleep between iterations. It prints to stdout if trace 
> statement took more than 500 microseconds. It might look familiar too 
> you because it's based on one of the examples from StackOverflow.
> In our testing modified log4cxx has about 20% better performance 
> overall, but more importantly for us it has much less cases when log 
> statement takes excessive time to complete. The second part is only 
> true for CPUs with >2 cores where all threads can physically run in 
> parallel. Unfortunately we still see rare cases with prolonged logging 
> calls, we have traced that down to dynamic memory allocation issue.
> Any thoughts?
> Denys Smolianiuk
> Senior SW Engineer
> Harmonic Inc.{quote}
> http://mail-archives.apache.org/mod_mbox/logging-dev/201808.mbox/<1534433474.4460.1%40smtp-mail.outlook.com>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (LOGCXX-500) Logging in Timing-Critical Applications

Reply via email to