I don't think we should get rid of the Java 8 specific performance options in the 2.x branch. In 3.x, we can simplify things to default to appropriate options for Java 11+ and also removing the less useful variants.
On Thu, Sep 23, 2021 at 11:29 AM Carter Kozak <cko...@ckozak.net> wrote: > > Thanks, Ralph! > > You're correct that the direct encoderes are required for GC-free logging, > some of our consumers rely on that > for niche (low latency) workloads where it's important to understand > precisely where a pause may occur. > GC performance has become much better over the last several years, and the > throughput of the allocation- > heavy workflow has increased significantly, however there are a few things we > should keep in mind: > > 1. Java 8 performs better with direct encoders (although afaict the > performance isn't any better than java 9+, > string.getBytes just has several new optimizations > 2. If this[1] change is merged into openjdk, the direct encoders may once > again be the best-performing option, > I'm planning to put together a local build for comparison later. > 3. My testing has primarily targetted UTF-8 as it's the only charset we > target at work, and the default charset > on most modern servers. The jdk has a great deal of optimization which > special cases UTF-8 for this reason, > especially from string.getBytes. Other charsets are likely to perform > differently. > > I'm in favor of changing the default value of "direct.encoders" based on the > java version, for example > Java 8 should continue to use them, while versions after compact-strings was > introduced are better > off without direct encoders until the jdk can be improved using something > like the linked PR. > > 1. https://github.com/openjdk/jdk/pull/5621#issuecomment-925413767 > > -ck > > On Thu, Sep 23, 2021, at 11:51, Ralph Goers wrote: > > I ran Ceki’s benchmarks against 2.14.1, 2.15.0-SNAPSHOT and Carter’s > > branch. I ran them with 16 threads on my > > MacBook Pro and ran each test twice since the tests show some variability > > from time to time. > > > > I can draw 2 conclusions from this. > > 1. The improvements Carter has made to 2.15.0 have already made a > > noticeable improvement. > > 2. His new changes show a definite improvement on top of the 2.15.0 > > improvements when enable direct encoders is false. > > > > These results definitely make me wonder if we should just remove the direct > > encoders logic at least in master. From what > > I can tell it makes things complicated and makes no noticeable impact on > > performance in 2.15.0 even before Carter’s > > changes. Are they required for gc-free? Can we make direct.encoders = false > > the default if it isn’t now? > > > > I apologize if the performance results don’t look good in your email. They > > are formatted with a fixed-width font but it is > > likely they will be messed up once they reach your mail client. > > > > Ralph > > > > 2.14.1 - enable direct encoders = false > > > > Benchmark Mode Cnt Score > > Error Units > > AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1221.995 ± > > 150.876 ops/ms > > AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1495.072 ± > > 41.715 ops/ms > > AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1529.807 ± > > 55.876 ops/ms > > FileAppenderBenchmark.log4j1File thrpt 10 681.690 ± > > 17.631 ops/ms > > FileAppenderBenchmark.log4j2File thrpt 10 1134.543 ± > > 35.065 ops/ms > > FileAppenderBenchmark.logbackFile thrpt 10 1546.811 ± > > 95.721 ops/ms > > > > Benchmark Mode Cnt Score > > Error Units > > AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1103.526 ± > > 78.201 ops/ms > > AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1402.669 ± > > 144.374 ops/ms > > AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1475.666 ± > > 87.885 ops/ms > > FileAppenderBenchmark.log4j1File thrpt 10 582.841 ± > > 57.156 ops/ms > > FileAppenderBenchmark.log4j2File thrpt 10 860.587 ± > > 185.170 ops/ms > > FileAppenderBenchmark.logbackFile thrpt 10 1560.206 ± > > 323.229 ops/ms > > > > 2.14.1 - enable direct encoders = true > > > > Benchmark Mode Cnt Score > > Error Units > > AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1250.808 ± > > 116.848 ops/ms > > AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1564.926 ± > > 146.195 ops/ms > > AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1703.731 ± > > 183.304 ops/ms > > FileAppenderBenchmark.log4j1File thrpt 10 736.189 ± > > 40.798 ops/ms > > FileAppenderBenchmark.log4j2File thrpt 10 1070.958 ± > > 97.173 ops/ms > > FileAppenderBenchmark.logbackFile thrpt 10 1350.507 ± > > 225.104 ops/ms > > > > Benchmark Mode Cnt Score > > Error Units > > AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1218.639 ± > > 141.343 ops/ms > > AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1525.081 ± > > 112.058 ops/ms > > AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1724.925 ± > > 169.680 ops/ms > > FileAppenderBenchmark.log4j1File thrpt 10 680.453 ± > > 32.330 ops/ms > > FileAppenderBenchmark.log4j2File thrpt 10 1054.517 ± > > 52.147 ops/ms > > FileAppenderBenchmark.logbackFile thrpt 10 1353.180 ± > > 224.415 ops/ms > > > > 2.15.0-SNAPSHOT - enable direct encoders = false > > > > Benchmark Mode Cnt Score > > Error Units > > AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1121.838 ± > > 112.026 ops/ms > > AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1606.602 ± > > 217.776 ops/ms > > AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1607.875 ± > > 226.656 ops/ms > > FileAppenderBenchmark.log4j1File thrpt 10 694.809 ± > > 33.380 ops/ms > > FileAppenderBenchmark.log4j2File thrpt 10 1500.557 ± > > 402.779 ops/ms > > FileAppenderBenchmark.logbackFile thrpt 10 1387.306 ± > > 213.081 ops/ms > > > > Benchmark Mode Cnt Score > > Error Units > > AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1199.591 ± > > 149.157 ops/ms > > AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1744.558 ± > > 190.732 ops/ms > > AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1718.862 ± > > 142.328 ops/ms > > FileAppenderBenchmark.log4j1File thrpt 10 604.785 ± > > 78.884 ops/ms > > FileAppenderBenchmark.log4j2File thrpt 10 1604.844 ± > > 712.807 ops/ms > > FileAppenderBenchmark.logbackFile thrpt 10 1578.756 ± > > 109.343 ops/ms > > > > 2.15.0-SNAPSHOT - enable direct encoders = true > > > > Benchmark Mode Cnt Score > > Error Units > > AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1244.029 ± > > 61.842 ops/ms > > AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1564.244 ± > > 263.322 ops/ms > > AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1648.783 ± > > 217.936 ops/ms > > FileAppenderBenchmark.log4j1File thrpt 10 717.024 ± > > 43.631 ops/ms > > FileAppenderBenchmark.log4j2File thrpt 10 1513.402 ± > > 438.311 ops/ms > > FileAppenderBenchmark.logbackFile thrpt 10 1425.745 ± > > 231.201 ops/ms > > > > Benchmark Mode Cnt Score > > Error Units > > AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1207.924 ± > > 136.763 ops/ms > > AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1514.119 ± > > 98.725 ops/ms > > AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1620.834 ± > > 238.498 ops/ms > > FileAppenderBenchmark.log4j1File thrpt 10 716.294 ± > > 65.527 ops/ms > > FileAppenderBenchmark.log4j2File thrpt 10 1523.430 ± > > 387.178 ops/ms > > FileAppenderBenchmark.logbackFile thrpt 10 1402.748 ± > > 312.713 ops/ms > > > > Carter 2.15.0-SNAPSHOT - enable direct encoders = false > > > > Benchmark Mode Cnt Score > > Error Units > > AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1351.481 ± > > 68.356 ops/ms > > AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 2048.458 ± > > 118.329 ops/ms > > AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1762.185 ± > > 155.733 ops/ms > > FileAppenderBenchmark.log4j1File thrpt 10 693.222 ± > > 94.051 ops/ms > > FileAppenderBenchmark.log4j2File thrpt 10 1707.576 ± > > 299.891 ops/ms > > FileAppenderBenchmark.logbackFile thrpt 10 1439.624 ± > > 60.957 ops/ms > > > > Benchmark Mode Cnt Score > > Error Units > > AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1230.511 ± > > 111.351 ops/ms > > AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1950.005 ± > > 166.843 ops/ms > > AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1783.227 ± > > 53.359 ops/ms > > FileAppenderBenchmark.log4j1File thrpt 10 702.814 ± > > 38.263 ops/ms > > FileAppenderBenchmark.log4j2File thrpt 10 1738.084 ± > > 344.135 ops/ms > > FileAppenderBenchmark.logbackFile thrpt 10 1463.187 ± > > 158.787 ops/ms > > > > Carter 2.15.0-SNAPSHOT - enable direct encoders = true > > > > Benchmark Mode Cnt Score > > Error Units > > AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1212.045 ± > > 165.745 ops/ms > > AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1671.374 ± > > 190.449 ops/ms > > AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1637.298 ± > > 169.134 ops/ms > > FileAppenderBenchmark.log4j1File thrpt 10 708.106 ± > > 30.035 ops/ms > > FileAppenderBenchmark.log4j2File thrpt 10 1523.321 ± > > 429.586 ops/ms > > FileAppenderBenchmark.logbackFile thrpt 10 1418.236 ± > > 233.527 ops/ms > > > > Benchmark Mode Cnt Score > > Error Units > > AsyncWithFileAppenderBenchmark.log4j1File thrpt 10 1189.572 ± > > 117.298 ops/ms > > AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 10 1499.633 ± > > 186.132 ops/ms > > AsyncWithFileAppenderBenchmark.logbackFile thrpt 10 1631.038 ± > > 214.951 ops/ms > > FileAppenderBenchmark.log4j1File thrpt 10 697.329 ± > > 48.757 ops/ms > > FileAppenderBenchmark.log4j2File thrpt 10 1542.492 ± > > 395.759 ops/ms > > FileAppenderBenchmark.logbackFile thrpt 10 1321.256 ± > > 148.364 ops/ms > > > > > > > On Sep 22, 2021, at 12:30 AM, Volkan Yazıcı <vol...@yazi.ci> wrote: > > > > > > That's great news! Then I will be looking forward to your signal for > > > putting a ribbon onto 2.15.0. > > > > > > I have followed the conversation with Claes Redestad from Oracle on > > > Twitter > > > <https://twitter.com/carter_kozak/status/1433798391604162561>. My > > > conclusion was also that there apparently is no way to make CharsetEncoder > > > beat .toString().getBytes() in Java 9+, until a CE specialization gets > > > introduced similar to what has been done in .toString().getBytes(). > > > Nevertheless, I was also (naively?) thinking about a similar branching > > > strategy like the one you proposed as a temporary solution: preJava9 ? > > > useCE() : useStringGetBytes(). Would you be able to get this done? > > > > > > Another idea I thought of yesterday evening was to introduce our own > > > hand-written StringBuilder- or char[]-to-byte[] encoders for common cases, > > > i.e., ASCII and UTF-8. What do you think? > > > > > > > > > On Wed, Sep 22, 2021 at 3:48 AM Carter Kozak <cko...@ckozak.net> wrote: > > > > > >> Thanks, Volkan! > > >> > > >> Rerunning the benchmarks on my branch (specifically the PatternLayout > > >> benchmarks in log4j-perf) produced much better improvements than I had > > >> anticipated. Beyond that, the date format cache invalidation problem > > >> resulted in a substantial speedup. I agree that it would be helpful to > > >> get > > >> a release out the door once this is merged. > > >> > > >> Re getBytes vs CharsetEncoder, I don't want to use the unsafe hack I put > > >> together in my benchmark project, that was just for experimentation :-) > > >> Future java releases (or changes in minor patch releases) could cause it > > >> to > > >> fail in frightening ways. We may be better off recommending the getBytes > > >> approach for now on some java versions (possibly by changing our default > > >> on > > >> java 9+). > > >> Claes has a potential change[1] which appears to buy us a great deal of > > >> performance in future Javas (assuming it is merged) and we may be able to > > >> engage for additional encoding APIs, for example something like this > > >> could > > >> avoid allocations and additional buffers: > > >> > > >> /** Returns the number of characters encoded to destination, or -1 if > > >> more > > >> space is needed (equivalent to CoderResult.OVERFLOW) */ > > >> int CharSetEncoder.encode(charsequence, inputStart, inputLimit, byte[] > > >> destination, int destOffset, int destLimit) > > >> (I haven't put a great deal of thought into this API and it's getting > > >> late, so pardon any terrible ideas!) > > >> > > >> 1. https://github.com/openjdk/jdk/pull/5621 > > >> > > >> -ck > > >> > > >> On Tue, Sep 21, 2021, at 15:31, Volkan Yazıcı wrote: > > >>> First and foremost, fantastic job Carter! > > >>> > > >>> For #573, I see that Gary and Ralph have already shared some remarks. I > > >>> would appreciate it if we can get this merged and cut the ribbon for > > >> 2.15.0 > > >>> release. > > >>> > > >>> Regarding `StringBuilder#toString().getBytes()`-vs-`CharsetEncoder`... > > >> That > > >>> is a tough one. In your hack branch there, I am not sure if using > > >> `Unsafe` > > >>> is a future-proof path forward. I was trying to wrap my mind around > > >> Daniel > > >>> Sun's fast-reflection <https://github.com/danielsun1106/fast-reflection> > > >>> (for the records, I couldn't) and was triggered by his ASM usage there. > > >>> I > > >>> was curious if we could do something similar via ASM to hack > > >>> `CharsetEncoder`? (I am probably talking nonsense, I hope it anyway > > >>> triggers a practical idea for you.) > > >> > > > >