That's great news! Then I will be looking forward to your signal for
putting a ribbon onto 2.15.0.

I have followed the conversation with Claes Redestad from Oracle on Twitter
<https://twitter.com/carter_kozak/status/1433798391604162561>. My
conclusion was also that there apparently is no way to make CharsetEncoder
beat .toString().getBytes() in Java 9+, until a CE specialization gets
introduced similar to what has been done in .toString().getBytes().
Nevertheless, I was also (naively?) thinking about a similar branching
strategy like the one you proposed as a temporary solution: preJava9 ?
useCE() : useStringGetBytes(). Would you be able to get this done?

Another idea I thought of yesterday evening was to introduce our own
hand-written StringBuilder- or char[]-to-byte[] encoders for common cases,
i.e., ASCII and UTF-8. What do you think?


On Wed, Sep 22, 2021 at 3:48 AM Carter Kozak <cko...@ckozak.net> wrote:

> Thanks, Volkan!
>
> Rerunning the benchmarks on my branch (specifically the PatternLayout
> benchmarks in log4j-perf) produced much better improvements than I had
> anticipated. Beyond that, the date format cache invalidation problem
> resulted in a substantial speedup. I agree that it would be helpful to get
> a release out the door once this is merged.
>
> Re getBytes vs CharsetEncoder, I don't want to use the unsafe hack I put
> together in my benchmark project, that was just for experimentation :-)
> Future java releases (or changes in minor patch releases) could cause it to
> fail in frightening ways. We may be better off recommending the getBytes
> approach for now on some java versions (possibly by changing our default on
> java 9+).
> Claes has a potential change[1] which appears to buy us a great deal of
> performance in future Javas (assuming it is merged) and we may be able to
> engage for additional encoding APIs, for example something like this could
> avoid allocations and additional buffers:
>
> /** Returns the number of characters encoded to destination, or -1 if more
> space is needed (equivalent to CoderResult.OVERFLOW) */
> int CharSetEncoder.encode(charsequence, inputStart, inputLimit, byte[]
> destination, int destOffset, int destLimit)
> (I haven't put a great deal of thought into this API and it's getting
> late, so pardon any terrible ideas!)
>
> 1. https://github.com/openjdk/jdk/pull/5621
>
> -ck
>
> On Tue, Sep 21, 2021, at 15:31, Volkan Yazıcı wrote:
> > First and foremost, fantastic job Carter!
> >
> > For #573, I see that Gary and Ralph have already shared some remarks. I
> > would appreciate it if we can get this merged and cut the ribbon for
> 2.15.0
> > release.
> >
> > Regarding `StringBuilder#toString().getBytes()`-vs-`CharsetEncoder`...
> That
> > is a tough one. In your hack branch there, I am not sure if using
> `Unsafe`
> > is a future-proof path forward. I was trying to wrap my mind around
> Daniel
> > Sun's fast-reflection <https://github.com/danielsun1106/fast-reflection>
> > (for the records, I couldn't) and was triggered by his ASM usage there. I
> > was curious if we could do something similar via ASM to hack
> > `CharsetEncoder`? (I am probably talking nonsense, I hope it anyway
> > triggers a practical idea for you.)
>

Reply via email to