When we run the format and parse of java.time.DateTimeFormatter using 
`-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining`, we can see the following 
output:
```
@ 40 j.t.f.DTFB$CompositePrinterParser::format (116 bytes) inline (hot)
 @ 1 f.l.StringBuilder::length (5 bytes) inline (hot)
 @ 1 j.l.AbstractStringBuilder::length (5 bytes) accessor
 @ 48 f.t.f.DTFB$DateTimePrinterParser::format (0 bytes) failed to inline: 
virtual call
```
```
@ 37 j.t.f.DTFB$CompositePrinterParser::parse (135 bytes) inline (hot)
 @ 114 j.t.f.DTFB$DateTimePrinterParser::parse (0 bytes) failed to inline: 
virtual call
```
As seen in this log, both the 
DateTimeFormatterBuilder$CompositePrinterParser::format and 
DateTimeFormatterBuilder$CompositePrinterParser::parse methods are `failed to 
inline: virtual call`. We can eliminate this inline failure by manually 
unrolling the loop.
Once manually unrolled, inline optimizations can work, enabling optimizations 
like TypeProfile to take effect and thus improve performance.
Below is the log output after manually unrolling the loop:
```
@ 41 j.t.f.DTFB$CompositePrinterParser::format (40 bytes) inline (hot)
 @ 1 j.l.StringBuilder::length (5 bytes) inline (hot)
 @ 1 j.l.AbstractStringBuilder::length (5 bytes) accessor
 @ 22 j.t.f.DateTimePrinterParserFactory$$Lambda/0x00000ff801009df8::format (11 
bytes) inline (hot) 
 callee changed to j.t.f.DTFB$CompositePrinterParser::format (40 bytes)
 \-> TypeProfile (6212/6212 counts) = 
j/t/f/DateTimePrinterParserFactory$$Lambda+0x00000ff801009df8
 @ 7 j.t.f.DateTimePrinterParserFactory::lambda$createFormatter$11 (195 bytes) 
inline (hot)
 @ 6 j.t.f.DTFB$NumberPrinterParser::format (399 bytes) failed to inline: hot 
method too big 
 callee changed to 
j.t.f.DateTimePrinterParserFactory::lambda$createFormatter$11 (195 bytes) 
 \-> TypeProfile (7170/7170 counts) = j/t/f/DTFB$NumberPrinterParser
 @ 20 j.t.f.DTFB$CharLiteralPrinterParser::format (11 bytes) inline (hot) 
 callee changed to 
j.t.f.DateTimePrinterParserFactory::lambda$createFormatter$11 (195 bytes) 
 \-> TypeProfile (7170/7170 counts) = 
j/t/f/DateTimeFormatterBuilder$CharLiteralPrinterParser
```
```
 @ 37 j.t.f.DTFB$CompositePrinterParser::parse (13 bytes) inline (hot)
 @ 7 j.t.f.DateTimePrinterParserFactory$$Lambda/0x000000800100a950::parse (11 
bytes) inline (hot)
 callee changed to j.t.f.DTFB$CompositePrinterParser::parse (13 bytes) 
 \-> TypeProfile (130649/130649 counts) = 
j/t/f/DateTimePrinterParserFactory$$Lambda+0x000000800100a950
 @ 7 j.t.f.DateTimePrinterParserFactory::lambda$createParser$9 (217 bytes) 
inline (hot)
 @ 6 j.t.f.DTFB$NumberPrinterParser::parse (609 bytes) failed to inline: hot 
method too big
 callee changed to j.t.f.DateTimePrinterParserFactory::lambda$createParser$9 
(217 bytes)
 \-> TypeProfile (130884/130884 counts) = j/t/f/DTFB$NumberPrinterParser
 @ 26 j.t.f.DTFB$CharLiteralPrinterParser::parse (91 bytes) inline (hot)
 callee changed to j.t.f.DateTimePrinterParserFactory::lambda$createParser$9 
(217 bytes) 
 \-> TypeProfile (130884/130884 counts) = j/t/f/DTFB$CharLiteralPrinterParser
```
We see that the format and parse methods of both NumberPrinterParser and 
CharLiteralPrinterParser trigger TypeProfile optimization. 
We can choose to generate the code for the unrolling loop based on 
MethodHandle, the ClassFile API, or Gensrc.gmk. Using MethodHandle or the 
ClassFile API will make the code obscure and difficult to understand. I 
recommend using Gensrc.gmk. One advantage of Gensrc.gmk is that the initial 
performance is better than other implementations.
To better express my ideas, I submitted a draft PR: 
https://github.com/openjdk/jdk/pull/28465 
<https://github.com/openjdk/jdk/pull/28465 >, and I hope you can give me 
feedback.
-
Shaojin Wen

Reply via email to