xendo opened a new pull request, #1410:
URL: https://github.com/apache/logging-log4j2/pull/1410

   A service that we maintain occasionally hangs when the rate of incoming logs 
is higher than the throughput of  writing logs to a disk. We have a discard 
policy at ERROR level and use default `SynchronizeEnqueueWhenQueueFull`.  We 
believe that this is due to corrupted LMAX disruptor. We did some initial 
attempts to produce minimal reproducer (using jcstress) but were not able to 
succeed so far.
   
   While reviewing the log4j code we noticed that there are code paths that 
avoid SynchronizeEnqueueWhenQueueFull and may be responsible for RingBuffer 
corruption, I prepared a small change that should prevent any unsynchronized 
access and wanted to get initial feedback from log4j maintainers. I also wanted 
to use this pull request to pick you brain on what else we can do to 
troubleshoot the issue futher.
   
   
   Writer threads hang on following stack trace: 
   
   ```
      java.lang.Thread.State: TIMED_WAITING (parking)                           
                                                                                
                                                                                
                             
           at sun.misc.Unsafe.park(Native Method)                               
                                                                                
                                                                                
                             
           at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)          
                                                                                
                                                                                
               
           at 
com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136) 
                                                                                
                                                                                
               
           at 
com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105) 
                                                                                
                                                                                
               
           at com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)   
                                                                                
                                                                                
                             
           at com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:326) 
                                                                                
                                                                                
                             
           at 
org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:236)
                                                                                
                                                         
           - locked <0x000000052818a2c0> (a java.lang.Object)                   
                                                                                
                                                                                
                             
           at 
org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
                                                                                
                                                                                
     
           at 
org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)   
                                                                                
                                                                                
               
           at 
org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
                                                                                
                                                                             
           at 
org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67) 
                                                                                
                                                                                
               
           at 
org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)     
                                                                                
                                                                                
               
           at 
org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)       
                                                                                
                                                                                
               
           at 
org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2205)
                                                                                
                                                                                
            
           at 
org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2159)
                                                                                
                                                                                
 
           at 
org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2142)
                                                                                
                                                                                
         
           at 
org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2022)
                                                                                
                                                                                
               
           at 
org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1875)
                                                                                
                                                                                
             
           at org.apache.logging.slf4j.Log4jLogger.error(Log4jLogger.java:299) 
   ```
   
   While the event processor is runnable, but is not making any progress, i.e. 
no logs are produced
   
   ```
      java.lang.Thread.State: RUNNABLE                                          
                                                                                
                                                                                
                             
           at 
com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
                                                                                
                                                                                
             
           at 
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)        
                                                                                
                                                                                
               
           at java.lang.Thread.run(Thread.java:750)
   ```
   
   Benchmark results
   ---
   
   I was a bit surprised, but comparing to previous results of those 
benchmarks, I didn't see a significant drop in performance for the 
`ENQUEUE_UNSYNCHRONIZED` . Benchmarks were run on m5.4xl instance (16 vCPU)
   
   
   ```
   Benchmark                                                      
(asyncLoggerType)                           (queueFullPolicy)   Mode  Cnt       
 Score         Error  Units
   ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads      
ASYNC_CONTEXT                                     ENQUEUE  thrpt    3  
1473717.926 ±  910941.806  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads      
ASYNC_CONTEXT                      ENQUEUE_UNSYNCHRONIZED  thrpt    3  
1265437.867 ±  184317.376  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads      
ASYNC_CONTEXT                                 SYNCHRONOUS  thrpt    3  
1726704.651 ±  579862.302  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads      
ASYNC_CONTEXT                 ENQUEUE_WITH_DISCARD_BUFFER  thrpt    3  
1483961.304 ±  536377.123  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads      
ASYNC_CONTEXT  ENQUEUE_UNSYNCHRONIZED_WITH_DISCARD_BUFFER  thrpt    3  
1270870.061 ±  749036.480  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads       
ASYNC_CONFIG                                     ENQUEUE  thrpt    3  
1327449.173 ±  295188.828  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads       
ASYNC_CONFIG                      ENQUEUE_UNSYNCHRONIZED  thrpt    3  
1113508.048 ±  127087.930  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads       
ASYNC_CONFIG                                 SYNCHRONOUS  thrpt    3  
1762200.783 ±  491492.622  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads       
ASYNC_CONFIG                 ENQUEUE_WITH_DISCARD_BUFFER  thrpt    3  
1332089.400 ± 1275770.505  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads       
ASYNC_CONFIG  ENQUEUE_UNSYNCHRONIZED_WITH_DISCARD_BUFFER  thrpt    3  
1126223.983 ±  464606.799  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread           
ASYNC_CONTEXT                                     ENQUEUE  thrpt    3  
1466317.665 ±  522556.954  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread           
ASYNC_CONTEXT                      ENQUEUE_UNSYNCHRONIZED  thrpt    3  
1477131.094 ±  371409.396  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread           
ASYNC_CONTEXT                                 SYNCHRONOUS  thrpt    3  
1429398.368 ±  426180.449  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread           
ASYNC_CONTEXT                 ENQUEUE_WITH_DISCARD_BUFFER  thrpt    3  
1459565.150 ± 1185274.825  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread           
ASYNC_CONTEXT  ENQUEUE_UNSYNCHRONIZED_WITH_DISCARD_BUFFER  thrpt    3  
1502960.667 ±  380188.217  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread            
ASYNC_CONFIG                                     ENQUEUE  thrpt    3  
1579091.646 ±  623213.747  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread            
ASYNC_CONFIG                      ENQUEUE_UNSYNCHRONIZED  thrpt    3  
1480321.360 ±  524851.514  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread            
ASYNC_CONFIG                                 SYNCHRONOUS  thrpt    3  
1656552.128 ±  543691.433  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread            
ASYNC_CONFIG                 ENQUEUE_WITH_DISCARD_BUFFER  thrpt    3  
1580294.283 ±  436382.888  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread            
ASYNC_CONFIG  ENQUEUE_UNSYNCHRONIZED_WITH_DISCARD_BUFFER  thrpt    3  
1550912.637 ±  148770.858  ops/s
   
   ```
   
   
   Checklist
   ---
   
   * Base your changes on `2.x` branch if you are targeting Log4j 2; use `main` 
otherwise
   DONE
   * `./mvnw verify` succeeds (if it fails due to code formatting issues 
reported by Spotless, simply run `spotless:apply` and retry)
   DONE
   * Changes contain an entry file in the `src/changelog/.2.x.x` directory
   TBD
   * Tests for the changes are provided
   DONE
   * [Commits are 
signed](https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits)
 (optional, but highly recommended)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to