[
https://issues.apache.org/jira/browse/LOGCXX-500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594967#comment-16594967
]
Thorsten Schöning commented on LOGCXX-500:
------------------------------------------
If you don't want to fix them, just leave the error messages here so one can
have a look later.
> Logging in Timing-Critical Applications
> ---------------------------------------
>
> Key: LOGCXX-500
> URL: https://issues.apache.org/jira/browse/LOGCXX-500
> Project: Log4cxx
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.10.0
> Reporter: Thorsten Schöning
> Priority: Minor
> Attachments: config.xml, main.cpp, non_blocking.diff,
> non_blocking_wo_debian_control.diff
>
>
> The following has been arrived on the mailing list, providing it here as well
> mainly to additionally collect the given patches etc.:
> {quote}Hello All,
> I'd like to share some experience as well as some patches with regard
> to using log4cxx in timing-critical application. First few words about
> our requirements: it's a service which must generate some network
> packets with up to hundred of microseconds precision. Thus, it's very
> important to have predictable code timing. One can argue that log4cxx
> is not very well suited for such applications, but surprisingly it
> works pretty well after some light tuning.
> So, what were the issues?
> Basically from library user's point of view they looked the same: one
> of a sudden logging done with LOG4CXX_DEBUG() macro could take
> unexpectedly long time to complete. For example the same trace which
> takes several μs for 99% of the time would take hundreds microseconds
> and even few milliseconds sometimes. After further investigation this
> has been traced down to few of the root-causes:
> 1. Asyns logger (which we have been using of course) has internal queue
> to pass log entries to background disk-writer thread. This queue is
> mutex-protected which might seem fine unless you think a little bit
> more about it. First of all, someone calling LOG4CXX_DEBUG() to simply
> put something into the log might not expect to be blocked inside
> waiting for a mutex at all. Second point is that, although there were
> measures taken to minimize time disk-thread holds that lock,
> OS-schedulers often work in a way that thread which is blocked on a
> mutex gets de-scheduled. With normal OS-scheduler quantum that means
> that the logging thread can be preempted for milliseconds.
> 2. There are some mutexes protecting internal states of both loggers
> and appenders. This means that two separate threads calling
> LOG4CXX_DEBUG() can block each other. Even if they are using different
> loggers they would block on appender! This has the same consequences
> for execution timing and the performance as described above.
> 3. std::stringstream class constructor has some internal locks on it's
> own. Unfortunately each MessageBuffer has it's own instance of this
> class. And also unfortunately MessageBuffer is created inside
> LOG4CXX_DEBUG() macro. There is optimization to not create stringstream
> for logging simple strings, but as soon as your log statement has
> single '<<' operator it's created.
> 4. Dynamic memory allocations. Unfortunately there are still quite few
> of them even though memory pool is used in some other places. Thus,
> hidden calls to new and malloc induce unpredictable delays.
> So, what we did to mitigate these problems?
> 1. Natural solution for this issue was to use atomic queue. There are
> few of them available, but we made use of boost::lockfree::queue as it
> can serve as a drop-in replacement allowing us to keep all present
> functionality.
> 2. After looking more into the code it has appeared that two concurrent
> calls to LOG4CXX_DEBUG() from within different threads are not harmful
> because internal structures of logger and appender are not being
> changed there. What only really requires protection is concurrency
> between logging and configuring. Thus, we came to a solution -
> read-write locks where logging calls act as readers and
> configuration/exiting calls are writers. With such approach multiple
> threads calling LOG4CXX_DEBUG() became free of any contention.
> 3. This problem also has one simple solution - make one static
> std::stringstream object per thread using std::thread_local.
> Unfortunately we found one drawback - thread_local memory is not
> released if thread is not detached or joined. As there is some code
> which does neither of this we made static stringstream a xml file
> configuration option. Also, there could be an issue with using multiple
> MessageBuffer instances from within single thread, but LOG4CXX_DEBUG()
> is not doing that.
> 4. At this time we didn't do anything to address dynamic memory
> allocation issue.
> So, if you want to give our patch a try it is attached. It's based on
> log4cxx 0.10.0-12 as found in Debian. There are new SHARED_MUTEX and
> LOCK_R/LOCK_W macros defined in mutex.h and synchronized.h for easy
> switching between RW and convenient mutexes for benchmarking. Also,
> there is an test application which spawns two threads doing logging
> with some small sleep between iterations. It prints to stdout if trace
> statement took more than 500 microseconds. It might look familiar too
> you because it's based on one of the examples from StackOverflow.
> In our testing modified log4cxx has about 20% better performance
> overall, but more importantly for us it has much less cases when log
> statement takes excessive time to complete. The second part is only
> true for CPUs with >2 cores where all threads can physically run in
> parallel. Unfortunately we still see rare cases with prolonged logging
> calls, we have traced that down to dynamic memory allocation issue.
> Any thoughts?
> Denys Smolianiuk
> Senior SW Engineer
> Harmonic Inc.{quote}
> http://mail-archives.apache.org/mod_mbox/logging-dev/201808.mbox/<1534433474.4460.1%40smtp-mail.outlook.com>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)