On Tue, 15 Oct 2024 10:47:55 GMT, Roman Kennke <[email protected]> wrote:
>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>>
>> It is also a follow-up to #20640, which now also includes (and supersedes)
>> #20603 and #20605, plus the Tiny Class-Pointers parts that have been
>> previously missing.
>>
>> Main changes:
>> - Introduction of the (experimental) flag UseCompactObjectHeaders. All
>> changes in this PR are protected by this flag. The purpose of the flag is to
>> provide a fallback, in case that users unexpectedly observe problems with
>> the new implementation. The intention is that this flag will remain
>> experimental and opt-in for at least one release, then make it on-by-default
>> and diagnostic (?), and eventually deprecate and obsolete it. However, there
>> are a few unknowns in that plan, specifically, we may want to further
>> improve compact headers to 4 bytes, we are planning to enhance the Klass*
>> encoding to support virtually unlimited number of Klasses, at which point we
>> could also obsolete UseCompressedClassPointers.
>> - The compressed Klass* can now be stored in the mark-word of objects. In
>> order to be able to do this, we are add some changes to GC forwarding (see
>> below) to protect the relevant (upper 22) bits of the mark-word. Significant
>> parts of this PR deal with loading the compressed Klass* from the mark-word.
>> This PR also changes some code paths (mostly in GCs) to be more careful when
>> accessing Klass* (or mark-word or size) to be able to fetch it from the
>> forwardee in case the object is forwarded.
>> - Self-forwarding in GCs (which is used to deal with promotion failure) now
>> uses a bit to indicate 'self-forwarding'. This is needed to preserve the
>> crucial Klass* bits in the header. This also allows to get rid of
>> preserved-header machinery in SerialGC and G1 (Parallel GC abuses
>> preserved-marks to also find all other relevant oops).
>> - Full GC forwarding now uses an encoding similar to compressed-oops. We
>> have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB,
>> we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the
>> GC forwarding at all).
>> - Instances can now have their base-offset (the offset where the field
>> layouter starts to place fields) at offset 8 (instead of 12 or 16).
>> - Arrays will now store their length at offset 8.
>> - CDS can now write and read archives with the compressed header. However,
>> it is not possible to read an archive that has been written with an opposite
>> setting of UseCompactObjectHeaders. Some build machinery is added so that
>> _co...
>
> Roman Kennke has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Fix aarch64.ad
Finished reviewing `src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp`,
line by line and comparing old snippets that got merged into the new function:
looks good to me, every (new) case handled
Only have some minor comments about comments.
src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 414:
> 412: // to the valid haystack bytes on the stack.
> 413: {
> 414: const Register haystack = rbx;
Keep `rax` as index for clarity? Although it is really used as a temp..
const Register index = rax;
const Register haystack = rbx;
copy_to_stack(haystack, haystack_len, false, index , XMM_TMP1, _masm);
src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1568:
> 1566: assert((COPIED_HAYSTACK_STACK_SIZE == 64), "Must be 64!");
> 1567:
> 1568: // Copy incoming haystack onto stack
Old comment was slightly more precise. Move here. i.e.
`// Copy incoming haystack onto stack (haystack <= 32 bytes)`
src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1634:
> 1632:
> 1633:
> 1634: // Copy the small (< 32 byte) haystack to the stack. Allows for vector
> reads without page fault
Just to be pedantic, its `(<=32)` - this function also handles 32bytes case.
- line 401:
__ cmpq(haystack_len, 0x20);
__ ja(L_bigSwitchTop);
- though line 293 (`highly_optimized_short_cases`) only seems to route16-byte
cases here:
```__ cmpq(haystack_len_p, isU ? 8 : 16);```
src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1659:
> 1657: Label L_moreThan8, L_moreThan16, L_moreThan24, L_adjustHaystack;
> 1658:
> 1659: assert(arrayOopDesc::base_offset_in_bytes(isU ? T_CHAR : T_BYTE) >= 8,
If we had to also optimize for header-size 16, it might be possible to remove
one jump here. Looks correct for either size.
-------------
PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2370735887
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802041876
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802044880
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802088545
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802073195