When we run the format and parse of java.time.DateTimeFormatter using `-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining`, we can see the following output: ``` @ 40 j.t.f.DTFB$CompositePrinterParser::format (116 bytes) inline (hot) @ 1 f.l.StringBuilder::length (5 bytes) inline (hot) @ 1 j.l.AbstractStringBuilder::length (5 bytes) accessor @ 48 f.t.f.DTFB$DateTimePrinterParser::format (0 bytes) failed to inline: virtual call ``` ``` @ 37 j.t.f.DTFB$CompositePrinterParser::parse (135 bytes) inline (hot) @ 114 j.t.f.DTFB$DateTimePrinterParser::parse (0 bytes) failed to inline: virtual call ``` As seen in this log, both the DateTimeFormatterBuilder$CompositePrinterParser::format and DateTimeFormatterBuilder$CompositePrinterParser::parse methods are `failed to inline: virtual call`. We can eliminate this inline failure by manually unrolling the loop. Once manually unrolled, inline optimizations can work, enabling optimizations like TypeProfile to take effect and thus improve performance. Below is the log output after manually unrolling the loop: ``` @ 41 j.t.f.DTFB$CompositePrinterParser::format (40 bytes) inline (hot) @ 1 j.l.StringBuilder::length (5 bytes) inline (hot) @ 1 j.l.AbstractStringBuilder::length (5 bytes) accessor @ 22 j.t.f.DateTimePrinterParserFactory$$Lambda/0x00000ff801009df8::format (11 bytes) inline (hot) callee changed to j.t.f.DTFB$CompositePrinterParser::format (40 bytes) \-> TypeProfile (6212/6212 counts) = j/t/f/DateTimePrinterParserFactory$$Lambda+0x00000ff801009df8 @ 7 j.t.f.DateTimePrinterParserFactory::lambda$createFormatter$11 (195 bytes) inline (hot) @ 6 j.t.f.DTFB$NumberPrinterParser::format (399 bytes) failed to inline: hot method too big callee changed to j.t.f.DateTimePrinterParserFactory::lambda$createFormatter$11 (195 bytes) \-> TypeProfile (7170/7170 counts) = j/t/f/DTFB$NumberPrinterParser @ 20 j.t.f.DTFB$CharLiteralPrinterParser::format (11 bytes) inline (hot) callee changed to j.t.f.DateTimePrinterParserFactory::lambda$createFormatter$11 (195 bytes) \-> TypeProfile (7170/7170 counts) = j/t/f/DateTimeFormatterBuilder$CharLiteralPrinterParser ``` ``` @ 37 j.t.f.DTFB$CompositePrinterParser::parse (13 bytes) inline (hot) @ 7 j.t.f.DateTimePrinterParserFactory$$Lambda/0x000000800100a950::parse (11 bytes) inline (hot) callee changed to j.t.f.DTFB$CompositePrinterParser::parse (13 bytes) \-> TypeProfile (130649/130649 counts) = j/t/f/DateTimePrinterParserFactory$$Lambda+0x000000800100a950 @ 7 j.t.f.DateTimePrinterParserFactory::lambda$createParser$9 (217 bytes) inline (hot) @ 6 j.t.f.DTFB$NumberPrinterParser::parse (609 bytes) failed to inline: hot method too big callee changed to j.t.f.DateTimePrinterParserFactory::lambda$createParser$9 (217 bytes) \-> TypeProfile (130884/130884 counts) = j/t/f/DTFB$NumberPrinterParser @ 26 j.t.f.DTFB$CharLiteralPrinterParser::parse (91 bytes) inline (hot) callee changed to j.t.f.DateTimePrinterParserFactory::lambda$createParser$9 (217 bytes) \-> TypeProfile (130884/130884 counts) = j/t/f/DTFB$CharLiteralPrinterParser ``` We see that the format and parse methods of both NumberPrinterParser and CharLiteralPrinterParser trigger TypeProfile optimization. We can choose to generate the code for the unrolling loop based on MethodHandle, the ClassFile API, or Gensrc.gmk. Using MethodHandle or the ClassFile API will make the code obscure and difficult to understand. I recommend using Gensrc.gmk. One advantage of Gensrc.gmk is that the initial performance is better than other implementations. To better express my ideas, I submitted a draft PR: https://github.com/openjdk/jdk/pull/28465 <https://github.com/openjdk/jdk/pull/28465 >, and I hope you can give me feedback. - Shaojin Wen
