On Tue, 12 Aug 2025 11:32:42 GMT, Francesco Andreuzzi <d...@openjdk.org> wrote:
>> In this PR I propose to refresh the included headers in hotspot >> `precompiled.hpp`. The current set of precompiled headers was refreshed in >> 2018, 7 years ago. I repeated the same operations and measurements after >> refreshing the set of precompiled headers according to the current usage >> frequency. >> >> These are the results I observed. Depending on the platform, the improvement >> is between 10 and 20% in terms of total work (user+sys). The results are in >> seconds. >> >> >> linux-x64 GCC >> master real 81.39 user 3352.15 sys 287.49 >> JDK-8365053 real 81.94 user 3030.24 sys 295.82 >> >> linux-x64 Clang >> master real 43.44 user 2082.93 sys 130.70 >> JDK-8365053 real 38.44 user 1723.80 sys 117.68 >> >> linux-aarch64 GCC >> master real 1188.08 user 2015.22 sys 175.53 >> JDK-8365053 real 1019.85 user 1667.45 sys 171.86 >> >> linux-aarch64 clang >> master real 981.77 user 1645.05 sys 118.60 >> JDK-8365053 real 791.96 user 1262.92 sys 101.50 > > Francesco Andreuzzi has updated the pull request incrementally with two > additional commits since the last revision: > > - conditional includes > - variants I think I found a more sensible approach to tackle this problem. Using clang [`-ftime-trace`](https://clang.llvm.org/docs/analyzer/developer-docs/PerformanceInvestigation.html#performance-analysis-using-ftime-trace) we can get reports in [Trace Event format](https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview?tab=t.0) about each header. Example of one such file here: [shenandoahOldGC.json](https://github.com/user-attachments/files/21915502/shenandoahOldGC.json). These files can be processed (e.g. with [ClangBuildAnalyzer](https://github.com/aras-p/ClangBuildAnalyzer/tree/main)) to dig where time was spent during the build. Among the information we can get from `ClangBuildAnalyzer`, here is the interesting one: **** Expensive headers: 597169 ms: /jdk/src/hotspot/share/oops/access.inline.hpp (included 650 times, avg 918 ms), included via: 80x: oop.inline.hpp iterator.inline.hpp 70x: javaClasses.inline.hpp 40x: jfrEvents.hpp jfrEventClasses.hpp jfrEvent.hpp jfrNativeEventWriter.hpp jfrEventWriterHost.inline.hpp jfrEventWriterHost.hpp jfrWriterHost.inline.hpp jfrTraceId.inline.hpp javaThread.inline.hpp oopHandle.inline.hpp 39x: shenandoahHeap.inline.hpp javaClasses.inline.hpp 32x: g1CollectedHeap.inline.hpp g1ConcurrentMark.inline.hpp g1ConcurrentMarkBitMap.inline.hpp markBitMap.inline.hpp oop.inline.hpp iterator.inline.hpp 30x: ciUtilities.inline.hpp interfaceSupport.inline.hpp javaThread.inline.hpp oopHandle.inline.hpp ... 425714 ms: /jdk/src/hotspot/share/memory/iterator.inline.hpp (included 646 times, avg 659 ms), included via: 80x: oop.inline.hpp 70x: javaClasses.inline.hpp access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp oop.inline.hpp 40x: jfrEvents.hpp jfrEventClasses.hpp jfrEvent.hpp jfrNativeEventWriter.hpp jfrEventWriterHost.inline.hpp jfrEventWriterHost.hpp jfrWriterHost.inline.hpp jfrTraceId.inline.hpp javaThread.inline.hpp oopHandle.inline.hpp access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp oop.inline.hpp 39x: shenandoahHeap.inline.hpp javaClasses.inline.hpp access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp oop.inline.hpp 32x: g1CollectedHeap.inline.hpp g1ConcurrentMark.inline.hpp g1ConcurrentMarkBitMap.inline.hpp markBitMap.inline.hpp oop.inline.hpp 30x: ciUtilities.inline.hpp interfaceSupport.inline.hpp javaThread.inline.hpp oopHandle.inline.hpp access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp oop.inline.hpp ... 400304 ms: /jdk/src/hotspot/share/oops/oop.inline.hpp (included 1165 times, avg 343 ms), included via: 80x: <direct include> 70x: javaClasses.inline.hpp access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp 66x: oop.inline.hpp iterator.inline.hpp access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp 60x: javaClasses.inline.hpp access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp oop.inline.hpp iterator.inline.hpp instanceKlass.inline.hpp klass.inline.hpp classLoaderData.inline.hpp 40x: jfrEvents.hpp jfrEventClasses.hpp jfrEvent.hpp jfrNativeEventWriter.hpp jfrEventWriterHost.inline.hpp jfrEventWriterHost.hpp jfrWriterHost.inline.hpp jfrTraceId.inline.hpp javaThread.inline.hpp oopHandle.inline.hpp access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp 39x: shenandoahHeap.inline.hpp javaClasses.inline.hpp access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp ... [...] This should give us a clear understanding of which headers should go into `precompiled.hpp`, and uses all information available from the compiler itself, as opposed to counting the number of inclusions. Now, improvements in build time are comparable with the initial approach I tried in this PR, but I think this approach will prove more accurate in the long term. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26681#issuecomment-3209879924