Hi all, Recently I've been picking up and resuming work on making LTO viable for HotSpot. The aim is to have LTO as a working option available so the benefits of enhanced optimization can be enjoyed by running Java code, though making LTO the default is not really a goal, at least not yet. The work has been going on decently well so far, but while working on it, there have been 2 rather longstanding problems I haven't really been able to solve. The first is related to https://github.com/openjdk/jdk/pull/22864 switching os::current_stack_pointer() to using runtime assembly, but we can visit that later. The bigger issue is related to the flatten attribute on the gcc compiler. In short, some G1 code (More specifically void G1ParScanThreadState::trim_queue_to_threshold(uint threshold), void G1ParScanThreadState::steal_and_trim_queue(G1ScannerTasksQueueSet* task_queues) and oop G1ParScanThreadState::copy_to_survivor_space(G1HeapRegionAttr region_attr, oop old, markWord old_mark)) is marked as flatten, which causes gcc to inline all calls inside those methods. This normally would be fine since the compilation unit boundary prevents inlining from across source files, but when LTO is active, the method bodies from other compilation units become available, and gcc then goes on a rampage, mass inlining everything it can find until there is nothing left to inline. On top of causing the JVM inflate to at least 60MB in the best case, it also causes build problems, notably JDK-8343698 and (suspected) JDK-8334616 and in general LTO is extremely slow, likely due to this problem. It would seem that we'd need to create NOINLINE wrappers for methods which are called by the flattened code but are not defined in the same source file (g1ParScanThreadState.cpp).
Problematically however, the call hierarchy for these 3 methods is downright *massive* since these methods are absolute monsters. This makes trying to find which methods these 3 call very tedious and error prone. After months and many different approaches, all of which have failed, I'm still no closer to finding out which code needs NOINLINE wrappers to prevent cross compilation unit inlining. Might there be a better way to figure out which methods are called from outside the g1ParScanThreadState.cpp source file than my current approach of manually looking through an IDE (Since all automated tools have failed in one way or another)? This is a very big blocker for working LTO with HotSpot, once I manage to get this and the stack pointer issue solved I believe I'll (hopefully) be able to get working LTO into the JDK soon. best regards, Julian
