On Tue, 12 Aug 2025 11:32:42 GMT, Francesco Andreuzzi <d...@openjdk.org> wrote:

>> In this PR I propose to refresh the included headers in hotspot 
>> `precompiled.hpp`. The current set of precompiled headers was refreshed in 
>> 2018, 7 years ago. I repeated the same operations and measurements after 
>> refreshing the set of precompiled headers according to the current usage 
>> frequency.
>> 
>> These are the results I observed. Depending on the platform, the improvement 
>> is between 10 and 20% in terms of total work (user+sys). The results are in 
>> seconds.
>> 
>> 
>> linux-x64 GCC
>> master      real 81.39 user 3352.15 sys 287.49
>> JDK-8365053 real 81.94 user 3030.24 sys 295.82
>> 
>> linux-x64 Clang
>> master      real 43.44 user 2082.93 sys 130.70
>> JDK-8365053 real 38.44 user 1723.80 sys 117.68
>> 
>> linux-aarch64 GCC
>> master      real 1188.08 user 2015.22 sys 175.53
>> JDK-8365053 real 1019.85 user 1667.45 sys 171.86
>> 
>> linux-aarch64 clang
>> master      real 981.77 user 1645.05 sys 118.60
>> JDK-8365053 real 791.96 user 1262.92 sys 101.50
>
> Francesco Andreuzzi has updated the pull request incrementally with two 
> additional commits since the last revision:
> 
>  - conditional includes
>  - variants

I think I found a more sensible approach to tackle this problem. Using clang 
[`-ftime-trace`](https://clang.llvm.org/docs/analyzer/developer-docs/PerformanceInvestigation.html#performance-analysis-using-ftime-trace)
 we can get reports in [Trace Event 
format](https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview?tab=t.0)
 about each header. Example of one such file here: 
[shenandoahOldGC.json](https://github.com/user-attachments/files/21915502/shenandoahOldGC.json).

These files can be processed (e.g. with 
[ClangBuildAnalyzer](https://github.com/aras-p/ClangBuildAnalyzer/tree/main)) 
to dig where time was spent during the build. Among the information we can get 
from `ClangBuildAnalyzer`, here is the interesting one:

**** Expensive headers:
597169 ms: /jdk/src/hotspot/share/oops/access.inline.hpp (included 650 times, 
avg 918 ms), included via:
  80x: oop.inline.hpp iterator.inline.hpp 
  70x: javaClasses.inline.hpp 
  40x: jfrEvents.hpp jfrEventClasses.hpp jfrEvent.hpp jfrNativeEventWriter.hpp 
jfrEventWriterHost.inline.hpp jfrEventWriterHost.hpp jfrWriterHost.inline.hpp 
jfrTraceId.inline.hpp javaThread.inline.hpp oopHandle.inline.hpp 
  39x: shenandoahHeap.inline.hpp javaClasses.inline.hpp 
  32x: g1CollectedHeap.inline.hpp g1ConcurrentMark.inline.hpp 
g1ConcurrentMarkBitMap.inline.hpp markBitMap.inline.hpp oop.inline.hpp 
iterator.inline.hpp 
  30x: ciUtilities.inline.hpp interfaceSupport.inline.hpp javaThread.inline.hpp 
oopHandle.inline.hpp 
  ...

425714 ms: /jdk/src/hotspot/share/memory/iterator.inline.hpp (included 646 
times, avg 659 ms), included via:
  80x: oop.inline.hpp 
  70x: javaClasses.inline.hpp access.inline.hpp barrierSet.inline.hpp 
objArrayOop.inline.hpp oop.inline.hpp 
  40x: jfrEvents.hpp jfrEventClasses.hpp jfrEvent.hpp jfrNativeEventWriter.hpp 
jfrEventWriterHost.inline.hpp jfrEventWriterHost.hpp jfrWriterHost.inline.hpp 
jfrTraceId.inline.hpp javaThread.inline.hpp oopHandle.inline.hpp 
access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp oop.inline.hpp 
  39x: shenandoahHeap.inline.hpp javaClasses.inline.hpp access.inline.hpp 
barrierSet.inline.hpp objArrayOop.inline.hpp oop.inline.hpp 
  32x: g1CollectedHeap.inline.hpp g1ConcurrentMark.inline.hpp 
g1ConcurrentMarkBitMap.inline.hpp markBitMap.inline.hpp oop.inline.hpp 
  30x: ciUtilities.inline.hpp interfaceSupport.inline.hpp javaThread.inline.hpp 
oopHandle.inline.hpp access.inline.hpp barrierSet.inline.hpp 
objArrayOop.inline.hpp oop.inline.hpp 
  ...

400304 ms: /jdk/src/hotspot/share/oops/oop.inline.hpp (included 1165 times, avg 
343 ms), included via:
  80x: <direct include>
  70x: javaClasses.inline.hpp access.inline.hpp barrierSet.inline.hpp 
objArrayOop.inline.hpp 
  66x: oop.inline.hpp iterator.inline.hpp access.inline.hpp 
barrierSet.inline.hpp objArrayOop.inline.hpp 
  60x: javaClasses.inline.hpp access.inline.hpp barrierSet.inline.hpp 
objArrayOop.inline.hpp oop.inline.hpp iterator.inline.hpp 
instanceKlass.inline.hpp klass.inline.hpp classLoaderData.inline.hpp 
  40x: jfrEvents.hpp jfrEventClasses.hpp jfrEvent.hpp jfrNativeEventWriter.hpp 
jfrEventWriterHost.inline.hpp jfrEventWriterHost.hpp jfrWriterHost.inline.hpp 
jfrTraceId.inline.hpp javaThread.inline.hpp oopHandle.inline.hpp 
access.inline.hpp barrierSet.inline.hpp objArrayOop.inline.hpp 
  39x: shenandoahHeap.inline.hpp javaClasses.inline.hpp access.inline.hpp 
barrierSet.inline.hpp objArrayOop.inline.hpp 
  ...

[...]


This should give us a clear understanding of which headers should go into 
`precompiled.hpp`, and uses all information available from the compiler itself, 
as opposed to counting the number of inclusions. Now, improvements in build 
time are comparable with the initial approach I tried in this PR, but I think 
this approach will prove more accurate in the long term.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26681#issuecomment-3209879924

Reply via email to