xendo opened a new pull request, #1410:
URL: https://github.com/apache/logging-log4j2/pull/1410
A service that we maintain occasionally hangs when the rate of incoming logs
is higher than the throughput of writing logs to a disk. We have a discard
policy at ERROR level and use default `SynchronizeEnqueueWhenQueueFull`. We
believe that this is due to corrupted LMAX disruptor. We did some initial
attempts to produce minimal reproducer (using jcstress) but were not able to
succeed so far.
While reviewing the log4j code we noticed that there are code paths that
avoid SynchronizeEnqueueWhenQueueFull and may be responsible for RingBuffer
corruption, I prepared a small change that should prevent any unsynchronized
access and wanted to get initial feedback from log4j maintainers. I also wanted
to use this pull request to pick you brain on what else we can do to
troubleshoot the issue futher.
Writer threads hang on following stack trace:
```
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
at
com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
at
com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
at com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)
at com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:326)
at
org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:236)
- locked <0x000000052818a2c0> (a java.lang.Object)
at
org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
at
org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
at
org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
at
org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
at
org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
at
org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
at
org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2205)
at
org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2159)
at
org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2142)
at
org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2022)
at
org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1875)
at org.apache.logging.slf4j.Log4jLogger.error(Log4jLogger.java:299)
```
While the event processor is runnable, but is not making any progress, i.e.
no logs are produced
```
java.lang.Thread.State: RUNNABLE
at
com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
at
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
at java.lang.Thread.run(Thread.java:750)
```
Benchmark results
---
I was a bit surprised, but comparing to previous results of those
benchmarks, I didn't see a significant drop in performance for the
`ENQUEUE_UNSYNCHRONIZED` . Benchmarks were run on m5.4xl instance (16 vCPU)
```
Benchmark
(asyncLoggerType) (queueFullPolicy) Mode Cnt
Score Error Units
ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads
ASYNC_CONTEXT ENQUEUE thrpt 3
1473717.926 ± 910941.806 ops/s
ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads
ASYNC_CONTEXT ENQUEUE_UNSYNCHRONIZED thrpt 3
1265437.867 ± 184317.376 ops/s
ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads
ASYNC_CONTEXT SYNCHRONOUS thrpt 3
1726704.651 ± 579862.302 ops/s
ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads
ASYNC_CONTEXT ENQUEUE_WITH_DISCARD_BUFFER thrpt 3
1483961.304 ± 536377.123 ops/s
ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads
ASYNC_CONTEXT ENQUEUE_UNSYNCHRONIZED_WITH_DISCARD_BUFFER thrpt 3
1270870.061 ± 749036.480 ops/s
ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads
ASYNC_CONFIG ENQUEUE thrpt 3
1327449.173 ± 295188.828 ops/s
ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads
ASYNC_CONFIG ENQUEUE_UNSYNCHRONIZED thrpt 3
1113508.048 ± 127087.930 ops/s
ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads
ASYNC_CONFIG SYNCHRONOUS thrpt 3
1762200.783 ± 491492.622 ops/s
ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads
ASYNC_CONFIG ENQUEUE_WITH_DISCARD_BUFFER thrpt 3
1332089.400 ± 1275770.505 ops/s
ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads
ASYNC_CONFIG ENQUEUE_UNSYNCHRONIZED_WITH_DISCARD_BUFFER thrpt 3
1126223.983 ± 464606.799 ops/s
ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread
ASYNC_CONTEXT ENQUEUE thrpt 3
1466317.665 ± 522556.954 ops/s
ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread
ASYNC_CONTEXT ENQUEUE_UNSYNCHRONIZED thrpt 3
1477131.094 ± 371409.396 ops/s
ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread
ASYNC_CONTEXT SYNCHRONOUS thrpt 3
1429398.368 ± 426180.449 ops/s
ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread
ASYNC_CONTEXT ENQUEUE_WITH_DISCARD_BUFFER thrpt 3
1459565.150 ± 1185274.825 ops/s
ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread
ASYNC_CONTEXT ENQUEUE_UNSYNCHRONIZED_WITH_DISCARD_BUFFER thrpt 3
1502960.667 ± 380188.217 ops/s
ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread
ASYNC_CONFIG ENQUEUE thrpt 3
1579091.646 ± 623213.747 ops/s
ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread
ASYNC_CONFIG ENQUEUE_UNSYNCHRONIZED thrpt 3
1480321.360 ± 524851.514 ops/s
ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread
ASYNC_CONFIG SYNCHRONOUS thrpt 3
1656552.128 ± 543691.433 ops/s
ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread
ASYNC_CONFIG ENQUEUE_WITH_DISCARD_BUFFER thrpt 3
1580294.283 ± 436382.888 ops/s
ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread
ASYNC_CONFIG ENQUEUE_UNSYNCHRONIZED_WITH_DISCARD_BUFFER thrpt 3
1550912.637 ± 148770.858 ops/s
```
Checklist
---
* Base your changes on `2.x` branch if you are targeting Log4j 2; use `main`
otherwise
DONE
* `./mvnw verify` succeeds (if it fails due to code formatting issues
reported by Spotless, simply run `spotless:apply` and retry)
DONE
* Changes contain an entry file in the `src/changelog/.2.x.x` directory
TBD
* Tests for the changes are provided
DONE
* [Commits are
signed](https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits)
(optional, but highly recommended)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]