First and foremost, fantastic job Carter!

For #573, I see that Gary and Ralph have already shared some remarks. I
would appreciate it if we can get this merged and cut the ribbon for 2.15.0
release.

Regarding `StringBuilder#toString().getBytes()`-vs-`CharsetEncoder`... That
is a tough one. In your hack branch there, I am not sure if using `Unsafe`
is a future-proof path forward. I was trying to wrap my mind around Daniel
Sun's fast-reflection <https://github.com/danielsun1106/fast-reflection>
(for the records, I couldn't) and was triggered by his ASM usage there. I
was curious if we could do something similar via ASM to hack
`CharsetEncoder`? (I am probably talking nonsense, I hope it anyway
triggers a practical idea for you.)

On Mon, Sep 20, 2021 at 11:09 PM Carter Kozak <cko...@ckozak.net> wrote:

> https://github.com/apache/logging-log4j2/pull/573 improves performance by
> double digits in my testing, unfortunately it's difficult and
> time-consuming to individually compare performance implications of each
> part of the change. I'd appreciate review/feedback on the change. When I
> apply this PR and benchmark with "-Dlog4j2.enable.direct.encoders=false"
> (results in StringBuilder.toString().getBytes(charset) rather than using
> CharsetEncoders) the performance in blocking mode matches logback, and
> async performance is much better.
>
> StringBuilder.toString.getBytes vs CharsetEncoder:
> I have some benchmarks here which illustrate the problem:
> https://github.com/carterkozak/stringbuilder-encoding-performance
> The CharsetEncoder itself doesn't seem to make a great deal of sense these
> days because Strings are no longer backed by char arrays, the conversion
> from bytes (in String/StringBuilder) into chars for the CharBuffer
> over-complicates the process. It could be feasible to add an API method to
> CharsetEncoder which takes a String/StringBuilder (charsequence +
> instanceof?), extracts the bytes+coder, and writes to an existing buffer
> without the CharBuffer conversion, but I haven't had a chance to sketch
> this out.
> I put together a quick and dirty hack showing a zero-gc fast-path for
> StringBuilder-to-bytes conversion, however it suffers from lack of
> intrinsics used in the String.getBytes(UTF_8) implementation:
> https://github.com/carterkozak/stringbuilder-encoding-performance/pull/3
>
> On Mon, Sep 20, 2021, at 15:34, Volkan Yazıcı wrote:
> > Carter, I am really curious about your ongoing crusade. Mind updating us
> on
> > it a bit, please? Where are we? What is your plan? How can we help you?
> >
> > On Wed, Sep 1, 2021 at 2:57 PM Carter Kozak <cko...@ckozak.net> wrote:
> >
> > > Thanks, Volkan!
> > >
> > > I had not seen InstantFormatter, it does look helpful, however I think
> it
> > > may
> > > have a bug. It appears to only compare milliseconds of Instant values
> > > while FixedDateFormat has some patterns which support microsecond
> > > and nanosecond precision. Currently I think this will batch together
> all
> > > events within a millisecond and assign them the microsecond value
> > > of the first cached event.
> > > I think this is what we want:
> > > https://github.com/apache/logging-log4j2/pull/576
> > >
> > > Good idea reaching out to Claes Redestad, it would be helpful to have
> > > someone more familiar with modern jvm string internals than us take
> > > a look if he's interested!
> > >
> > > -ck
> > >
> > > On Wed, Sep 1, 2021, at 03:44, Volkan Yazıcı wrote:
> > > > Great work Carter!
> > > >
> > > > Have you seen `o.a.l.l.layout.template.json.util.InstantFormatter`,
> > > > particularly its `Formatter#isInstantMatching` methods used for
> > > > invalidating the cache? I was thinking of making it even smarter,
> e.g.,
> > > if
> > > > the pattern only has seconds, compare `Instant`s by their seconds. I
> > > aspire
> > > > to pull it to the core, replace access to all individual formatters
> with
> > > > this one, and mark the rest deprecated. Another idea I was thinking
> about
> > > > is enhancing these individual formatters to contain the precision
> they
> > > > require and use that in `isInstantMatching` methods.
> > > >
> > > > Regarding your unicode character problems, shall we try pinging Claes
> > > > Redestad (@cl4es), who has recently enhanced String.format() in JDK
> 17
> > > > <https://twitter.com/cl4es/status/1432361530268528642>, via Twitter?
> > > >
> > > > On Mon, Aug 30, 2021 at 11:32 PM Carter Kozak <cko...@ckozak.net>
> wrote:
> > > >
> > > > > I've merged the fix for our FixedDateFormat caching bug which
> caused
> > > us to
> > > > > recompute the same millisecond-precision formatted timestamp
> > > unnecessarily
> > > > > each time our microsecond-precision clock incremented.
> > > > > https://issues.apache.org/jira/browse/LOG4J2-3153
> > > > >
> > > > > I've also been unwrapping a few layers of complexity, wrapping
> several
> > > > > layers of components with conditional logic makes it harder for the
> > > jit to
> > > > > optimize code, so we can improve things by using separate types
> based
> > > on
> > > > > the requested features:
> > > > > https://github.com/apache/logging-log4j2/pull/573
> > > > > TODO: I'm not happy with the way I unwrapped PatternFormatter
> objects
> > > in
> > > > > this PR, I think it could work better as an optional wrapper
> around the
> > > > > delegate LogEventPatternConverter (where the default FormattingInfo
> > > returns
> > > > > the delegate instance directly)
> > > > > TODO: simplify MessagePatternConverter  a bit, the body is giant
> for
> > > > > something that's generally relatively simple. The method is too
> large
> > > for
> > > > > me to read in a glance, so I imagine the jit will have a hard time
> > > making
> > > > > it fast as well. I don't really like the message-format feature
> which
> > > > > allows lookups in the formatted message text because it leaks
> details
> > > of
> > > > > the framework implementation/configuration into consumers of
> logging
> > > APIs
> > > > > (which may not even be log4j-core), however I'm not sure how
> > > reasonable it
> > > > > would be to change the default to disallow lookups given I'm sure
> folks
> > > > > depend on that behavior.
> > > > >
> > > > > I'm not sure what to do about the CharsetEncoder vs
> > > > > string.getBytes(charset) issue. The CharsetEncoder does not require
> > > > > allocation and outperforms getBytes when I add a unicode character
> to
> > > the
> > > > > message. When the message contains only ascii characters, getBytes
> > > performs
> > > > > better. Using CharBuffer.wrap(StringBuilder) produces substantially
> > > worse
> > > > > performance -- it shouldn't be copying the buffer in memory, but I
> > > suppose
> > > > > the heap buffer is more efficient to deal with. I need to do more
> > > research
> > > > > in this area.
> > > > >
> > > > > Thoughts/ideas/concerns?
> > > > > -ck
> > > > >
> > > >
> > >
> >
>
> -ck
>

Reply via email to