[jira] [Commented] (LOG4J2-928) Lock-free synchronous sub-microsecond appender

ASF GitHub Bot (JIRA) Wed, 05 Jul 2017 10:09:33 -0700

    [ 
https://issues.apache.org/jira/browse/LOG4J2-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075093#comment-16075093
 ]


ASF GitHub Bot commented on LOG4J2-928:
---------------------------------------

Github user garydgregory commented on the issue:

    https://github.com/apache/logging-log4j2/pull/87
  
    Thank you for running the benchmarks. Is this worth doing over on the
    _current_ Java 8 runtime (_131)?
    
    Gary
    
    On Wed, Jul 5, 2017 at 6:59 AM, Remko Popma <[email protected]>
    wrote:
    
    > As promised, here are the throughput results on enterprise hardware.
    >
    > Hardware: total 64 cores (4x 16-core) Intel(R) Xeon(R) CPU E5-2697A v4 @
    > 2.60GHz
    > Java Hotspot 64-bit server 1.8.0_112
    >
    > Results look good: as soon as we introduce concurrency, the lock-free MMap
    > appender starts outperforming the other file appenders.
    >
    > *1 thread*
    >
    > {code}
    > -sh-4.2$ java -jar benchmarks.jar ".*Log4j2AppenderThroughput.*" -wi 5 -f 
1 -i 10
    >
    > Benchmark                                                  (loggerType)  
(messageSizeBytes)   Mode  Samples          Score        Error  Units
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender             File   
              150  thrpt       10    1344494.961 ± 221110.545  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender              RAF   
              150  thrpt       10    1563396.748 ±  12497.136  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender             MMap   
              150  thrpt       10    1223215.440 ±  14661.487  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end              File   
              150  thrpt       10    1008537.505 ±   9213.181  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end               RAF   
              150  thrpt       10     998564.075 ±  12207.661  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end              MMap   
              150  thrpt       10     785114.881 ±   8827.768  ops/s
    > {code}
    >
    > *4 threads*
    >
    > {code}
    > -sh-4.2$ java -jar benchmarks.jar ".*Log4j2AppenderThroughput.*" -wi 5 -f 
1 -i 10 -t 4
    >
    > Benchmark                                                  (loggerType)  
(messageSizeBytes)   Mode  Samples         Score          Error  Units
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender             File   
              150  thrpt       10   1678485.793 ±   260632.925  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender              RAF   
              150  thrpt       10   1412574.559 ±   447410.040  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender             MMap   
              150  thrpt       10   1386414.915 ±   333117.992  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end              File   
              150  thrpt       10    959662.994 ±   333541.548  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end               RAF   
              150  thrpt       10    929405.390 ±    66673.436  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end              MMap   
              150  thrpt       10   1013681.438 ±   155916.949  ops/s
    > {code}
    >
    > *8 threads*
    >
    > {code}
    > -sh-4.2$ java -jar benchmarks.jar ".*Log4j2AppenderThroughput.*" -wi 5 -f 
1 -i 10 -t 8
    >
    > Benchmark                                                  (loggerType)  
(messageSizeBytes)   Mode  Samples         Score         Error  Units
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender             File   
              150  thrpt       10   1164538.492 ±   62482.965  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender              RAF   
              150  thrpt       10   1194865.200 ±   72251.118  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender             MMap   
              150  thrpt       10   1338960.644 ±   71104.991  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end              File   
              150  thrpt       10    944670.900 ±   69190.255  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end               RAF   
              150  thrpt       10    930248.735 ±   28523.086  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end              MMap   
              150  thrpt       10    998676.286 ±   36327.805  ops/s
    > {code}
    >
    > *16 threads*
    >
    > {code}
    > -sh-4.2$ java -jar benchmarks.jar ".*Log4j2AppenderThroughput.*" -wi 5 -f 
1 -i 10 -t 16
    >
    > Benchmark                                                  (loggerType)  
(messageSizeBytes)   Mode  Samples         Score        Error  Units
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender             File   
              150  thrpt       10   1214786.965 ±   7217.940  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender              RAF   
              150  thrpt       10   1192036.036 ±  13696.830  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender             MMap   
              150  thrpt       10   1548961.297 ±  27141.206  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end              File   
              150  thrpt       10   1003235.059 ±  10365.921  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end               RAF   
              150  thrpt       10    994834.287 ±   9057.470  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end              MMap   
              150  thrpt       10   1323043.186 ±  19960.127  ops/s
    > {code}
    >
    > *32 threads*
    >
    > {code}
    > -sh-4.2$ java -jar benchmarks.jar ".*Log4j2AppenderThroughput.*" -wi 5 -f 
1 -i 10 -t 32
    >
    > Benchmark                                                  (loggerType)  
(messageSizeBytes)   Mode  Samples         Score        Error  Units
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender             File   
              150  thrpt       10   1372729.015 ±  40762.524  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender              RAF   
              150  thrpt       10   1185351.707 ±  16992.542  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender             MMap   
              150  thrpt       10   1784377.615 ± 179359.793  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end              File   
              150  thrpt       10    918029.971 ±  38434.750  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end               RAF   
              150  thrpt       10   1122310.566 ± 152840.084  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end              MMap   
              150  thrpt       10   1898762.678 ±  81312.274  ops/s
    > {code}
    >
    > *64 threads*
    >
    > {code}
    > -sh-4.2$ java -jar benchmarks.jar ".*Log4j2AppenderThroughput.*" -wi 5 -f 
1 -i 10 -t 64
    >
    > Benchmark                                                  (loggerType)  
(messageSizeBytes)   Mode  Samples         Score         Error  Units
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender             File   
              150  thrpt       10   1193371.846 ±   32893.485  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender              RAF   
              150  thrpt       10   1231136.529 ±   26895.411  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.appender             MMap   
              150  thrpt       10   1409277.639 ±   91900.204  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end              File   
              150  thrpt       10    881261.726 ±   35023.552  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end               RAF   
              150  thrpt       10    929691.738 ±   22616.873  ops/s
    > o.a.l.l.p.j.Log4j2AppenderThroughputBenchmark.end2end              MMap   
              150  thrpt       10   1216108.561 ±   41321.530  ops/s
    > {code}
    >
    > —
    > You are receiving this because you are subscribed to this thread.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/logging-log4j2/pull/87#issuecomment-313111330>,
    > or mute the thread
    > 
<https://github.com/notifications/unsubscribe-auth/ABIfN9oh2p20v8fnX4yBVzi-pzelEgr1ks5sK5bPgaJpZM4OEhNj>
    > .
    >



> Lock-free synchronous sub-microsecond appender
> ----------------------------------------------
>
>                 Key: LOG4J2-928
>                 URL: https://issues.apache.org/jira/browse/LOG4J2-928
>             Project: Log4j 2
>          Issue Type: New Feature
>          Components: Appenders
>            Reporter: Remko Popma
>
> _(This is a work in progress.)_
> *Goal*
> It should be possible to create synchronous file appenders with (nearly) the 
> same performance as async loggers.
> *Background*
> The key to the async loggers performance is the lock-free queue provided by 
> the LMAX Disruptor. Multiple threads can add events to this queue without 
> contending on a lock. This means throughput scales linearly with the number 
> of threads: more threads = more throughput.
> With a lock-based design, on the other hand, performance maxes out with a 
> single thread logging. Adding more threads does not help. In fact, total 
> logging throughput goes down slightly when multiple threads are logging (see 
> the Async Appenders in the [async performance 
> comparison|http://logging.apache.org/log4j/2.x/manual/async.html#Asynchronous_Throughput_Comparison_with_Other_Logging_Packages]).
>  Lock contention means that multiple threads together end up logging slower 
> than a single thread.
> *Currently only async loggers are lock-free*
> Log4j2 provides good performance with async loggers, but this approach has 
> several drawbacks:
> * dependency on external LMAX disruptor library
> * possibility of data loss: log events that have been put on the queue but 
> not flushed to disk yet may be lost in the event of an application crash
> This ticket proposes a new feature to address these issues.
> *Proposal: a lock-free synchronous appender*
> For a single-threaded application the current MemoryMappedFileAppender has 
> performance comparable to Async Loggers (TODO: perf test).
> However, the current implementation uses locks to control concurrency, and 
> suffers from lock contention in multi-threaded scenarios.
> For inspiration for a lock-free solution, we can look at 
> [Aeron|https://github.com/real-logic/Aeron], specifically Aeron's design for 
> Log Buffers. Martin Thompson's September 2014 Strangeloop 
> [presentation|https://www.youtube.com/watch?v=tM4YskS94b0] gives details on 
> the design (especially the section 16:45-23:30 is relevant).
> The way this works, is that instead of using locks, concurrency is handled 
> with a protocol where threads "reserve" blocks of memory atomically. Each 
> thread (having serialized the log event) knows how many bytes it wants to 
> write. It then atomically moves the buffer tail pointer by that many bytes 
> using a CAS operation. After the tail has been moved, the thread is free to 
> write the message payload bytes to the area of the buffer that it just 
> reserved, without needing to worry about other threads. Between threads, the 
> only point of contention is the tail pointer, which is similar to the 
> disruptor. We can reasonably expect performance to scale linearly with the 
> number threads, like async loggers.
> *Still needs work*
> This looks promising, but there are a few snags. 
> # Needs the Unsafe. {{java.nio.ByteBuffer}} only provides relative, not 
> absolute bulk put operations. That is, it only allows appending byte arrays 
> at the current cursor location, not at some user-specified absolute location. 
> The above design requires random access to be thread-safe. Aeron works around 
> this by using {{sun.misc.Unsafe}}. Users should be aware of this so they can 
> decide on whether the performance gain is worth the risk. Also, this may make 
> the OSGi folks unhappy (see LOG4J2-238 discussion)... Not sure how serious we 
> are about making Log4j2 work on OSGi, but perhaps it is possible to mark the 
> package for this feature as optional in the OSGi manifest. An alternative may 
> be to put this appender in a separate module.
> # TBD: How many files/buffers to use? In his presentation Martin mentions 
> that using a single large memory mapped file will cause a lot of page faults, 
> page cache churn, and unspecified VM issues. He recommends cycling between 
> three smaller buffers, one active (currently written to), one dirty (full, 
> now being processed by a background thread) and one clean (to swap in when 
> the active buffer becomes full). I am not sure if the page fault problem will 
> occur for our use case: a Log4j appender is append-only, and there is no 
> separate thread or process reading this data at the same time. If it does, 
> and we decide on a similar design with three smaller buffers, we still need 
> to work out if these can be three different mapped regions in the same log 
> file, or if it is better to use a separate temporary file and copy from the 
> temporary file to the target log file in a background thread. I would prefer 
> to have a single file. Note that even with a single file we may want a 
> background thread for mapping a new region at every swap and occasionally 
> extending the file when the EOF is reached.
> Feedback welcome. I intend to update this ticket as I learn more.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (LOG4J2-928) Lock-free synchronous sub-microsecond appender

Reply via email to