Sorry I missed replying to your question, Ralph! > Did you add the no-op appender to Ceki’s project? Or are you using our > NullAppender? I have > my doubts about using that NullAppender as it does nothing - not even render > the Layout, so > it may get completely optimized away.
I added a no-op appender locally and ran the benchmark, and used the logback null-appender which also does not read event values. I agree that this test would be completely invalid using synchronous logging, however using the asynchronous approach I don't believe the JVM can optimize out the application thread pieces. There's definitely some ambiguity in the data -- I also tested using a wrapper around a file appender with a 1/1000 probability of writing a given event to the file appender and saw similar results, so I believe the test was at least somewhat valid. -ck On Thu, Aug 26, 2021, at 10:29, Carter Kozak wrote: > I also tried that with similar results, which leads me to believe we're > spending > too much time copying between buffers. > > We've proven that log4j can get LogEvents to the appender very quickly and > efficiently. > > Once we hit PatternLayout we write data to a StringBuilder. > AbstractStringBuilder used to store a char array in java 8, however with jep > 254 compact strings it was updated (similarly to String) to use a byte array > + byte representing the encoding. > We encode the StringBuilder value into a ByteBufferDestination > (Heap)ByteBuffer in StringBuilderEncoder using the following steps: > 1. copy the StringBuilder value into a CharBuffer (this would have been a > direct copy from char[] to char[] in java8, but requires conversion from > byte[]+coder now) > 2. encode data from the CharBuffer to the StringBuilderEncoder ByteBuffer > 3. copy contents of the StringBuilderEncoder ByteBuffer to the > ByteBufferDestination ByteBuffer (which is also always a heap buffer as far > as I can tell) > 4. Write the ByteBufferDestination to the FileOutputStream/RandomAccessFile > by extracting the heap byte-array. > > We're copying bytes around a lot, and I think older JVMs would have optimized > out a bit of the copying because the types exactly matched. > It's unclear why we need to copy through two separate ByteBuffers from > StringBuilderEncoder into ByteBufferDestination > I've generally had better I/O performance using direct byte-buffers than heap > buffers, although that experience largely comes from work on webservers, I > haven't tested if it holds true for file I/O. If we operate directly upon a > FileChannel, we can write direct buffers directly to files. The > DirectByteBuffer memory is guaranteed to be contiguous, so it should have > better performance around copying and manipulation as well. > > -ck > > On Thu, Aug 26, 2021, at 10:00, Volkan Yazıcı wrote: > > Ralph, maybe a dumb idea but... Can't we simply write to /dev/null to avoid > > hardware's influence in the test results? > > > > On Wed, Aug 25, 2021 at 8:05 PM Ralph Goers <[email protected]> > > wrote: > > > > > Did you add the no-op appender to Ceki’s project? Or are you using our > > > NullAppender? I have > > > my doubts about using that NullAppender as it does nothing - not even > > > render the Layout, so > > > it may get completely optimized away. > > > > > > I’d still like to know what kind of disks Remko did his original testing. > > > The page says he got > > > 18.495 million messages per second. Each message would be about 130 > > > characters long > > > from I can tell. That means the destination has to be able to support > > > about at least 2.4 GB > > > per second. > > > > > > In Ceki’s test the message is only 23 characters long (vs 100 in Remko’s). > > > That means the > > > message size is only going to be about 60 characters long. For me the > > > async test is writing > > > at about 1700 ops/ms which equates to 102 MBs, yet I am still seeing the > > > queue backed up. > > > So how we are writing must be terribly inefficient. Logback is getting > > > about 2000 ops/ms, > > > which is still only 120 MBs. > > > > > > Your test below would be generating about 197 MBs. > > > > > > One thing I did notice that made a big difference is that Ceki has Logback > > > configured to > > > use a buffer size of 256K while Log4j2 uses the default of 8K. Setting > > > Log4j2 to 256K > > > considerably improved the performance. > > > > > > > > > Ralph > > > > > > > On Aug 25, 2021, at 8:42 AM, Carter Kozak <[email protected]> wrote: > > > > > > > >> If we would move the appender performance aside, am I right to conclude > > > >> that the entire complex async. framework built atop Disruptor with all > > > its > > > >> whistles and bells is yielding a diminishing performance gain compared > > > to a > > > >> simple off the shelf blocking queue? > > > > > > > > I haven't seen any data that would give me such an impression -- the > > > file appender > > > > itself is slower than the thread producing events, which appears to > > > limit our throughput. > > > > Even then, log4j2 does much better in my testing on a linux workstation > > > with 20+ producer > > > > threads compared to logback. > > > > Any time the queue is constantly full, performance will suffer (and this > > > applies to standard > > > > blocking queues as well as disruptor, however disruptor will fare worse > > > when the queue > > > > is full). > > > > The results using a single thread are roughly reproducible on the > > > non-async test, so I think > > > > the problem is somewhere between the layout, and appender/manager. > > > > > > > > Results running locally with no-op appender implementations: > > > > > > > > 1 thread: > > > > > > > > Benchmark Mode Cnt Score > > > Error Units > > > > AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 4 3285.743 ± > > > 427.963 ops/ms > > > > AsyncWithFileAppenderBenchmark.logbackFile thrpt 4 2383.532 ± > > > 136.034 ops/ms > > > > > > > > 20 threads: > > > > > > > > Benchmark Mode Cnt Score > > > Error Units > > > > AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 4 1177.746 ± > > > 919.560 ops/ms > > > > AsyncWithFileAppenderBenchmark.logbackFile thrpt 4 602.614 ± > > > 47.485 ops/ms > > > > > > > > > > > >
