On Wed, 25 Oct 2023 03:11:12 GMT, Quan Anh Mai <qa...@openjdk.org> wrote:
>> This is a feature requiested by @RogerRiggs and @cl4es . >> >> **Idea** >> >> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger >> stores (e.g. one long store) can lead to speedup. >> Recently, @cl4es and @RogerRiggs had to review a few PR's where people would >> try to get >> speedups by using >> Unsafe (e.g. `Unsafe.putLongUnaligned`), or >> ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). >> They have asked if we can do such an optimization in C2, rather than in the >> Java library code, or even user code. >> >> This patch here supports a few simple use-cases, like these: >> >> Merge consecutive array stores, with constants. We can combine the separate >> constants into a larger constant: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 >> >> Merge consecutive array stores, with a variable that was split (using >> shifts). We can essentially undo the >> splitting (i.e. shifting and truncation), and directly store the variable: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 >> >> The idea is that this would allow the introduction of a very simple API, >> without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 >> >> **Details** >> >> This draft currently implements the optimization in an additional special >> IGVN phase: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 >> >> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see >> `Compile::gather_nodes_for_merge_stores`). >> During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all >> other optimizations) of `StoreNode::Ideal`. >> We essentially try to establish a chain of mergable stores: >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 >> >> Mergable stores must have the same Opcode (implies they have the same >> element type and hence size). >> Further, mergable stores must have the same control (or be separated by only >> a RangeCheck). >> Further,... > > I imagine it would be beneficial if we could merge stores to fields and > stores from loads, which are common in object constructions. > > Thanks. @merykitty do you have examples for both? Maybe stores to fields already works. Merging loads and stores may be out of scope. That sounds a little much like SLP. We can still try to do that in a future RFE. We could even try to use (masked) vector instructions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1778600064