I don't think we should get rid of the Java 8 specific performance
options in the 2.x branch. In 3.x, we can simplify things to default
to appropriate options for Java 11+ and also removing the less useful
variants.

On Thu, Sep 23, 2021 at 11:29 AM Carter Kozak <cko...@ckozak.net> wrote:
>
> Thanks, Ralph!
>
> You're correct that the direct encoderes are required for GC-free logging, 
> some of our consumers rely on that
> for niche (low latency) workloads where it's important to understand 
> precisely where a pause may occur.
> GC performance has become much better over the last several years, and the 
> throughput of the allocation-
> heavy workflow has increased significantly, however there are a few things we 
> should keep in mind:
>
> 1. Java 8 performs better with direct encoders (although afaict the 
> performance isn't any better than java 9+,
>     string.getBytes just has several new optimizations
> 2. If this[1] change is merged into openjdk, the direct encoders may once 
> again be the best-performing option,
>     I'm planning to put together a local build for comparison later.
> 3. My testing has primarily targetted UTF-8 as it's the only charset we 
> target at work, and the default charset
>     on most modern servers. The jdk has a great deal of optimization which 
> special cases UTF-8 for this reason,
>     especially from string.getBytes. Other charsets are likely to perform 
> differently.
>
> I'm in favor of changing the default value of "direct.encoders" based on the 
> java version, for example
> Java 8 should continue to use them, while versions after compact-strings was 
> introduced are better
> off without direct encoders until the jdk can be improved using something 
> like the linked PR.
>
> 1. https://github.com/openjdk/jdk/pull/5621#issuecomment-925413767
>
> -ck
>
> On Thu, Sep 23, 2021, at 11:51, Ralph Goers wrote:
> > I ran Ceki’s benchmarks against 2.14.1, 2.15.0-SNAPSHOT and Carter’s 
> > branch. I ran them with 16 threads on my
> > MacBook Pro and ran each test twice since the tests show some variability 
> > from time to time.
> >
> > I can draw 2 conclusions from this.
> > 1. The improvements Carter has made to 2.15.0 have already made a 
> > noticeable improvement.
> > 2. His new changes show a definite improvement on top of the 2.15.0 
> > improvements when enable direct encoders is false.
> >
> > These results definitely make me wonder if we should just remove the direct 
> > encoders logic at least in master. From what
> > I can tell it makes things complicated and makes no noticeable impact on 
> > performance in 2.15.0 even before Carter’s
> > changes. Are they required for gc-free? Can we make direct.encoders = false 
> > the default if it isn’t now?
> >
> > I apologize if the performance results don’t look good in your email. They 
> > are formatted with a fixed-width font but it is
> > likely they will be messed up once they reach your mail client.
> >
> > Ralph
> >
> > 2.14.1 - enable direct encoders = false
> >
> > Benchmark                                        Mode  Cnt    Score    
> > Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File       thrpt  10  1221.995 ± 
> > 150.876  ops/ms
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1495.072 ±  
> > 41.715  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile      thrpt  10  1529.807 ±  
> > 55.876  ops/ms
> > FileAppenderBenchmark.log4j1File                thrpt  10   681.690 ±  
> > 17.631  ops/ms
> > FileAppenderBenchmark.log4j2File                thrpt  10  1134.543 ±  
> > 35.065  ops/ms
> > FileAppenderBenchmark.logbackFile               thrpt  10  1546.811 ±  
> > 95.721  ops/ms
> >
> > Benchmark                                        Mode  Cnt    Score    
> > Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File       thrpt  10  1103.526 ±  
> > 78.201  ops/ms
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1402.669 ± 
> > 144.374  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile      thrpt  10  1475.666 ±  
> > 87.885  ops/ms
> > FileAppenderBenchmark.log4j1File                thrpt  10   582.841 ±  
> > 57.156  ops/ms
> > FileAppenderBenchmark.log4j2File                thrpt  10   860.587 ± 
> > 185.170  ops/ms
> > FileAppenderBenchmark.logbackFile               thrpt  10  1560.206 ± 
> > 323.229  ops/ms
> >
> > 2.14.1 - enable direct encoders = true
> >
> > Benchmark                                        Mode  Cnt    Score    
> > Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File       thrpt  10  1250.808 ± 
> > 116.848  ops/ms
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1564.926 ± 
> > 146.195  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile      thrpt  10  1703.731 ± 
> > 183.304  ops/ms
> > FileAppenderBenchmark.log4j1File                thrpt  10  736.189 ±   
> > 40.798  ops/ms
> > FileAppenderBenchmark.log4j2File                thrpt  10  1070.958 ±  
> > 97.173  ops/ms
> > FileAppenderBenchmark.logbackFile               thrpt  10  1350.507 ± 
> > 225.104  ops/ms
> >
> > Benchmark                                        Mode  Cnt    Score    
> > Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File       thrpt  10  1218.639 ± 
> > 141.343  ops/ms
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1525.081 ± 
> > 112.058  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile      thrpt  10  1724.925 ± 
> > 169.680  ops/ms
> > FileAppenderBenchmark.log4j1File                thrpt  10   680.453 ±  
> > 32.330  ops/ms
> > FileAppenderBenchmark.log4j2File                thrpt  10  1054.517 ±  
> > 52.147  ops/ms
> > FileAppenderBenchmark.logbackFile               thrpt  10  1353.180 ± 
> > 224.415  ops/ms
> >
> > 2.15.0-SNAPSHOT - enable direct encoders = false
> >
> > Benchmark                                        Mode  Cnt    Score    
> > Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File       thrpt  10  1121.838 ± 
> > 112.026  ops/ms
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1606.602 ± 
> > 217.776  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile      thrpt  10  1607.875 ± 
> > 226.656  ops/ms
> > FileAppenderBenchmark.log4j1File                thrpt  10  694.809 ±  
> > 33.380   ops/ms
> > FileAppenderBenchmark.log4j2File                thrpt  10  1500.557 ± 
> > 402.779  ops/ms
> > FileAppenderBenchmark.logbackFile               thrpt  10  1387.306 ± 
> > 213.081  ops/ms
> >
> > Benchmark                                        Mode  Cnt    Score    
> > Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File       thrpt  10  1199.591 ± 
> > 149.157 ops/ms
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1744.558 ± 
> > 190.732 ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile      thrpt  10  1718.862 ± 
> > 142.328 ops/ms
> > FileAppenderBenchmark.log4j1File                thrpt  10   604.785 ± 
> > 78.884  ops/ms
> > FileAppenderBenchmark.log4j2File                thrpt  10  1604.844 ± 
> > 712.807 ops/ms
> > FileAppenderBenchmark.logbackFile               thrpt  10  1578.756 ± 
> > 109.343 ops/ms
> >
> > 2.15.0-SNAPSHOT - enable direct encoders = true
> >
> > Benchmark                                        Mode  Cnt    Score    
> > Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File       thrpt  10  1244.029 ±  
> > 61.842  ops/ms
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1564.244 ± 
> > 263.322  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile      thrpt  10  1648.783 ± 
> > 217.936  ops/ms
> > FileAppenderBenchmark.log4j1File                thrpt  10  717.024 ±  
> > 43.631   ops/ms
> > FileAppenderBenchmark.log4j2File                thrpt  10  1513.402 ± 
> > 438.311  ops/ms
> > FileAppenderBenchmark.logbackFile               thrpt  10  1425.745 ± 
> > 231.201  ops/ms
> >
> > Benchmark                                        Mode  Cnt    Score    
> > Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File       thrpt  10  1207.924 ± 
> > 136.763  ops/ms
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1514.119 ±  
> > 98.725  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile      thrpt  10  1620.834 ± 
> > 238.498  ops/ms
> > FileAppenderBenchmark.log4j1File                thrpt  10   716.294 ±  
> > 65.527  ops/ms
> > FileAppenderBenchmark.log4j2File                thrpt  10  1523.430 ± 
> > 387.178  ops/ms
> > FileAppenderBenchmark.logbackFile               thrpt  10  1402.748 ± 
> > 312.713  ops/ms
> >
> > Carter 2.15.0-SNAPSHOT - enable direct encoders = false
> >
> > Benchmark                                        Mode  Cnt    Score    
> > Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File       thrpt  10  1351.481 ±  
> > 68.356  ops/ms
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  2048.458 ± 
> > 118.329  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile      thrpt  10  1762.185 ± 
> > 155.733  ops/ms
> > FileAppenderBenchmark.log4j1File                thrpt  10   693.222 ±  
> > 94.051  ops/ms
> > FileAppenderBenchmark.log4j2File                thrpt  10  1707.576 ± 
> > 299.891  ops/ms
> > FileAppenderBenchmark.logbackFile               thrpt  10  1439.624 ±  
> > 60.957  ops/ms
> >
> > Benchmark                                        Mode  Cnt    Score    
> > Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File       thrpt  10  1230.511 ± 
> > 111.351  ops/ms
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1950.005 ± 
> > 166.843  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile      thrpt  10  1783.227 ±  
> > 53.359  ops/ms
> > FileAppenderBenchmark.log4j1File                thrpt  10   702.814 ±  
> > 38.263  ops/ms
> > FileAppenderBenchmark.log4j2File                thrpt  10  1738.084 ± 
> > 344.135  ops/ms
> > FileAppenderBenchmark.logbackFile               thrpt  10  1463.187 ± 
> > 158.787  ops/ms
> >
> > Carter 2.15.0-SNAPSHOT - enable direct encoders = true
> >
> > Benchmark                                        Mode  Cnt    Score    
> > Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File       thrpt  10  1212.045 ± 
> > 165.745  ops/ms
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1671.374 ± 
> > 190.449  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile      thrpt  10  1637.298 ± 
> > 169.134  ops/ms
> > FileAppenderBenchmark.log4j1File                thrpt  10   708.106 ±  
> > 30.035  ops/ms
> > FileAppenderBenchmark.log4j2File                thrpt  10  1523.321 ± 
> > 429.586  ops/ms
> > FileAppenderBenchmark.logbackFile               thrpt  10  1418.236 ± 
> > 233.527  ops/ms
> >
> > Benchmark                                        Mode  Cnt    Score    
> > Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File       thrpt  10  1189.572 ± 
> > 117.298  ops/ms
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1499.633 ± 
> > 186.132  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile      thrpt  10  1631.038 ± 
> > 214.951  ops/ms
> > FileAppenderBenchmark.log4j1File                thrpt  10   697.329 ±  
> > 48.757  ops/ms
> > FileAppenderBenchmark.log4j2File                thrpt  10  1542.492 ± 
> > 395.759  ops/ms
> > FileAppenderBenchmark.logbackFile               thrpt  10  1321.256 ± 
> > 148.364  ops/ms
> >
> >
> > > On Sep 22, 2021, at 12:30 AM, Volkan Yazıcı <vol...@yazi.ci> wrote:
> > >
> > > That's great news! Then I will be looking forward to your signal for
> > > putting a ribbon onto 2.15.0.
> > >
> > > I have followed the conversation with Claes Redestad from Oracle on 
> > > Twitter
> > > <https://twitter.com/carter_kozak/status/1433798391604162561>. My
> > > conclusion was also that there apparently is no way to make CharsetEncoder
> > > beat .toString().getBytes() in Java 9+, until a CE specialization gets
> > > introduced similar to what has been done in .toString().getBytes().
> > > Nevertheless, I was also (naively?) thinking about a similar branching
> > > strategy like the one you proposed as a temporary solution: preJava9 ?
> > > useCE() : useStringGetBytes(). Would you be able to get this done?
> > >
> > > Another idea I thought of yesterday evening was to introduce our own
> > > hand-written StringBuilder- or char[]-to-byte[] encoders for common cases,
> > > i.e., ASCII and UTF-8. What do you think?
> > >
> > >
> > > On Wed, Sep 22, 2021 at 3:48 AM Carter Kozak <cko...@ckozak.net> wrote:
> > >
> > >> Thanks, Volkan!
> > >>
> > >> Rerunning the benchmarks on my branch (specifically the PatternLayout
> > >> benchmarks in log4j-perf) produced much better improvements than I had
> > >> anticipated. Beyond that, the date format cache invalidation problem
> > >> resulted in a substantial speedup. I agree that it would be helpful to 
> > >> get
> > >> a release out the door once this is merged.
> > >>
> > >> Re getBytes vs CharsetEncoder, I don't want to use the unsafe hack I put
> > >> together in my benchmark project, that was just for experimentation :-)
> > >> Future java releases (or changes in minor patch releases) could cause it 
> > >> to
> > >> fail in frightening ways. We may be better off recommending the getBytes
> > >> approach for now on some java versions (possibly by changing our default 
> > >> on
> > >> java 9+).
> > >> Claes has a potential change[1] which appears to buy us a great deal of
> > >> performance in future Javas (assuming it is merged) and we may be able to
> > >> engage for additional encoding APIs, for example something like this 
> > >> could
> > >> avoid allocations and additional buffers:
> > >>
> > >> /** Returns the number of characters encoded to destination, or -1 if 
> > >> more
> > >> space is needed (equivalent to CoderResult.OVERFLOW) */
> > >> int CharSetEncoder.encode(charsequence, inputStart, inputLimit, byte[]
> > >> destination, int destOffset, int destLimit)
> > >> (I haven't put a great deal of thought into this API and it's getting
> > >> late, so pardon any terrible ideas!)
> > >>
> > >> 1. https://github.com/openjdk/jdk/pull/5621
> > >>
> > >> -ck
> > >>
> > >> On Tue, Sep 21, 2021, at 15:31, Volkan Yazıcı wrote:
> > >>> First and foremost, fantastic job Carter!
> > >>>
> > >>> For #573, I see that Gary and Ralph have already shared some remarks. I
> > >>> would appreciate it if we can get this merged and cut the ribbon for
> > >> 2.15.0
> > >>> release.
> > >>>
> > >>> Regarding `StringBuilder#toString().getBytes()`-vs-`CharsetEncoder`...
> > >> That
> > >>> is a tough one. In your hack branch there, I am not sure if using
> > >> `Unsafe`
> > >>> is a future-proof path forward. I was trying to wrap my mind around
> > >> Daniel
> > >>> Sun's fast-reflection <https://github.com/danielsun1106/fast-reflection>
> > >>> (for the records, I couldn't) and was triggered by his ASM usage there. 
> > >>> I
> > >>> was curious if we could do something similar via ASM to hack
> > >>> `CharsetEncoder`? (I am probably talking nonsense, I hope it anyway
> > >>> triggers a practical idea for you.)
> > >>
> >
> >

Reply via email to