[
https://issues.apache.org/jira/browse/LOG4J2-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261948#comment-14261948
]
Remko Popma commented on LOG4J2-930:
------------------------------------
I've started to do some work on this. I've found that the current design makes
it a bit difficult to improve performance.
Specifically, {{PatternFormatter}} delegates part of the formatting work to
{{LogEventPatternConverter}} (selecting the attribute to format, and some
initial formatting) and some other part to {{FormattingInfo}} (alignment, and
enforcing field width). This makes it difficult to implement caching anywhere,
since none of the three components knows both the key and the final output.
Also, I think performance gains can be made by avoiding unnecessary char to
byte conversions. Ideally we want to cache the final result {{byte[]s}} for
each converter where possible and add these to the output buffer directly.
Currently I am experimenting with the following changes:
* Refactor pattern converters to have access to their associated formatting
info. This allows them to know the final formatted output so they can implement
caching where appropriate.
* Allow pattern converters to produce {{byte[]s}} directly, instead of
{{char[]s}}. This means changing the {{format}} method to take a custom
{{Buffer}} instead of a {{StringBuilder}}. There will be a binary and a
text-based Buffer implementation.
* Some converters can only be text based. For example, RegexReplacement and
EncodingPattern take strings as input and produce strings as output. If a
layout pattern contains any converter that requires text input then we switch
off binary output and caching, and fall back to the current behaviour and
performance. To support this, I am experimenting with adding a method to the
{{PatternConverter}} interface that indicates whether it requires text input.
* Caching: implementing a simple Least Recently Used
([LRU|https://sites.google.com/site/codingbughunter/algorithm-code-questions/leetcode-questions/leetcode-lru-cache-java])
cache seems straightforward and effective. Maximum cache size would be
controllable with a system property. The most complex aspect may be making it
thread-safe without locking. Doug Lea's
[ConcurrentDoublyLinkedList|http://www.java2s.com/Code/Java/Collections-Data-Structure/ConcurrentDoublyLinkedList.htm]
may be helpful.
I will try if the above changes have a significant enough impact on performance
to justify making them. I'll report back when I have some results. Any feedback
on the above ideas is welcome as usual.
> Improve PatternLayout performance
> ---------------------------------
>
> Key: LOG4J2-930
> URL: https://issues.apache.org/jira/browse/LOG4J2-930
> Project: Log4j 2
> Issue Type: Improvement
> Components: Appenders, Layouts
> Affects Versions: 2.1
> Reporter: Remko Popma
> Attachments: Perf-sync-logging2014-12-25.xlsx,
> perf-sync-formatting.png
>
>
> While doing performance analysis for LOG4J2-928 (MemoryMappedFileAppender), I
> got some surprising results. From experience with a work project I knew the
> latency of writing to a memory mapped file was quite similar to using the
> Disruptor, but I was getting no more than 200,000 msg/sec, while on the same
> machine async loggers can achieve a throughput of several million msg/sec.
> I started experimenting with binary messages and various layout options and
> it turns out that formatting can make a huge difference. Without formatting,
> the memory mapped file appender reached 3.5 million msg/sec.
> I'd like to share the results here. _(Results are for tests with
> RandomAccessFile (RAF), which had more stable results. For some reason memory
> mapped file test results were very noisy. Need to do more work there.)_
> !perf-sync-formatting.png!
> ||Configuration||Avg. msg/sec||
> |async loggers (for reference - I/O and formatting done in background
> thread)|4,719,766|
> |sync RAF, noflush, no layout, fixed byte[] BinaryMsg (run 1)|3,977,222|
> |sync RAF, noflush, no layout, fixed byte[] BinaryMsg (run 2)|3,979,520|
> |sync RAF, noflush, no layout, fixed string.getBytes() BinaryMsg|3,213,756|
> |sync RAF, noflush, PatternLayout {{%m%n}} |1,900,037|
> |sync RAF, noflush, PatternLayout {{%m %ex%n}} |1,755,573|
> |sync RAF, noflush, PatternLayout {{%d\{UNIX_MILLIS} %m%n}} |1,650,131|
> |sync RAF, noflush, PatternLayout %d\{UNIX_MILLIS} {{%m %ex %n}} |1,331,707|
> |sync RAF, noflush, PatternLayout {{%d %m %ex%n}} |1,194,184|
> |sync RAF, noflush, PatternLayout {{%d \[%t] %m %ex%n}} |1,027,927|
> |sync RAF, noflush, PatternLayout {{%d %p \[%t] %m %ex%n}} |939,122|
> |sync RAF, noflush, PatternLayout {{%d %p %c\{1.} \[%t] %m %ex%n}} |595,663|
> |sync RAF, noflush, PatternLayout {{%d %p %c\{1.} \[%t] %X\{aKey} %m %ex%n}}
> |573,651|
> |sync File (buffered), noflush, fixed byte\[] BinaryMsg, PatternLayout {{%d
> %p %c\{1.} \[%t] %X\{aKey} %m %ex%n}} |551,919|
> |sync RAF, flush, no layout, fixed byte\[] BinaryMsg |432,558|
> |sync File (non-buffered), noflush, fixed byte\[] BinaryMsg, PatternLayout
> {{%d %p %c\{1.} \[%t] %X\{aKey} %m %ex%n}} |244,701|
> |sync File (non-buffered), flush, fixed byte\[] BinaryMsg, PatternLayout {{%d
> %p %c\{1.} \[%t] %X\{aKey} %m %ex%n}} |238,975|
> |sync File (buffered), flush, fixed byte\[] BinaryMsg, PatternLayout {{%d %p
> %c\{1.} \[%t] %X\{aKey} %m %ex%n}} |232,540|
> |sync RAF, flush, PatternLayout {{%d %p %c\{1.} \[%t] %X\{aKey} %m %ex%n}}
> |205,833|
> Some things to note:
> * First note the big difference in performance (3.2 M to 1.9 M msg/s) when
> going from no layout to PatternLayout "%m%n".
> * Second, every added PatternLayout option makes throughput noticeably worse.
> * Buffered, non-flushing appenders are fast because they avoid disk I/O, but
> innocuous-looking formatting options can undo most of the gains.
> The formatting options with the most impact seem to be:
> * {{%d}} ( {{%m %ex%n}}: 1,755,573 vs {{%d %m %ex%n}} 1,194,184)
> * {{%t}} ({{%d %m %ex%n}} 1,194,184 vs {{%d \[%t] %m %ex%n}} 1,027,927)
> * {{%c}} ({{%d %p \[%t] %m %ex%n}} 939,122 vs {{%d %p %c\{1.} \[%t] %m
> %ex%n}} 595,663)
> Some of these are surprising. I expected time formatting to have some impact,
> but the logger ({{%c}})? (It could be the precision specifier that makes it
> slow, need to do more analysis here...)
> It should be possible to improve this a lot. One idea is to cache constant
> strings like the logger names and thread names. Another idea is to avoid
> copying data multiple times into various StringBuilders, and instead have a
> custom Buffer that produces a byte array result directly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]