That's great news! Then I will be looking forward to your signal for putting a ribbon onto 2.15.0.
I have followed the conversation with Claes Redestad from Oracle on Twitter <https://twitter.com/carter_kozak/status/1433798391604162561>. My conclusion was also that there apparently is no way to make CharsetEncoder beat .toString().getBytes() in Java 9+, until a CE specialization gets introduced similar to what has been done in .toString().getBytes(). Nevertheless, I was also (naively?) thinking about a similar branching strategy like the one you proposed as a temporary solution: preJava9 ? useCE() : useStringGetBytes(). Would you be able to get this done? Another idea I thought of yesterday evening was to introduce our own hand-written StringBuilder- or char[]-to-byte[] encoders for common cases, i.e., ASCII and UTF-8. What do you think? On Wed, Sep 22, 2021 at 3:48 AM Carter Kozak <cko...@ckozak.net> wrote: > Thanks, Volkan! > > Rerunning the benchmarks on my branch (specifically the PatternLayout > benchmarks in log4j-perf) produced much better improvements than I had > anticipated. Beyond that, the date format cache invalidation problem > resulted in a substantial speedup. I agree that it would be helpful to get > a release out the door once this is merged. > > Re getBytes vs CharsetEncoder, I don't want to use the unsafe hack I put > together in my benchmark project, that was just for experimentation :-) > Future java releases (or changes in minor patch releases) could cause it to > fail in frightening ways. We may be better off recommending the getBytes > approach for now on some java versions (possibly by changing our default on > java 9+). > Claes has a potential change[1] which appears to buy us a great deal of > performance in future Javas (assuming it is merged) and we may be able to > engage for additional encoding APIs, for example something like this could > avoid allocations and additional buffers: > > /** Returns the number of characters encoded to destination, or -1 if more > space is needed (equivalent to CoderResult.OVERFLOW) */ > int CharSetEncoder.encode(charsequence, inputStart, inputLimit, byte[] > destination, int destOffset, int destLimit) > (I haven't put a great deal of thought into this API and it's getting > late, so pardon any terrible ideas!) > > 1. https://github.com/openjdk/jdk/pull/5621 > > -ck > > On Tue, Sep 21, 2021, at 15:31, Volkan Yazıcı wrote: > > First and foremost, fantastic job Carter! > > > > For #573, I see that Gary and Ralph have already shared some remarks. I > > would appreciate it if we can get this merged and cut the ribbon for > 2.15.0 > > release. > > > > Regarding `StringBuilder#toString().getBytes()`-vs-`CharsetEncoder`... > That > > is a tough one. In your hack branch there, I am not sure if using > `Unsafe` > > is a future-proof path forward. I was trying to wrap my mind around > Daniel > > Sun's fast-reflection <https://github.com/danielsun1106/fast-reflection> > > (for the records, I couldn't) and was triggered by his ASM usage there. I > > was curious if we could do something similar via ASM to hack > > `CharsetEncoder`? (I am probably talking nonsense, I hope it anyway > > triggers a practical idea for you.) >