On Thu, 22 May 2025 08:11:09 GMT, Per Minborg <pminb...@openjdk.org> wrote:
>> This PR builds on a concept John Rose told me about some time ago. Instead >> of combining memory operations of various sizes, a single large and skewed >> memory operation can be made to clean up the tail of remaining bytes. >> >> This has the effect of simplifying and shortening the code. The number of >> branches to evaluate is reduced. > > Per Minborg has updated the pull request incrementally with one additional > commit since the last revision: > > Correct typo in comment src/java.base/share/classes/jdk/internal/foreign/SegmentBulkOperations.java line 110: > 108: SCOPED_MEMORY_ACCESS.setMemory(dst.sessionImpl(), > dst.unsafeGetBase(), dst.unsafeGetOffset(), len, value); > 109: } > 110: } Suggestion: final var sessionImpl = dst.sessionImpl(); final var unsafeGetBase = dst.unsafeGetBase(); final var unsafeGetOffset = dst.unsafeGetOffset(); final var bigEndian = !Architecture.isLittleEndian(); // Switch on log2(len) = 64 - Long.numberOfLeadingZeros(len) switch (64 - Long.numberOfLeadingZeros(len)) { case 0 -> sessionImpl.checkValidState(); // Implicit state check case 1 -> SCOPED_MEMORY_ACCESS.putByte(sessionImpl, unsafeGetBase, unsafeGetOffset, value); case 2 -> { SCOPED_MEMORY_ACCESS.putShortUnaligned(sessionImpl, unsafeGetBase, unsafeGetOffset, (short) longValue, bigEndian); SCOPED_MEMORY_ACCESS.putShortUnaligned(sessionImpl, unsafeGetBase, unsafeGetOffset + len - Short.BYTES, (short) longValue, bigEndian); } case 3 -> { SCOPED_MEMORY_ACCESS.putIntUnaligned(sessionImpl, unsafeGetBase, unsafeGetOffset, (int) longValue, bigEndian); SCOPED_MEMORY_ACCESS.putIntUnaligned(sessionImpl, unsafeGetBase, unsafeGetOffset + len - Integer.BYTES, (int) longValue, bigEndian); } default -> { if (len < NATIVE_THRESHOLD_FILL) { final int limit = (int) (len & (NATIVE_THRESHOLD_FILL - 8)); for (int offset = 0; offset < limit; offset += Long.BYTES) { SCOPED_MEMORY_ACCESS.putLongUnaligned(sessionImpl, unsafeGetBase, unsafeGetOffset + offset, longValue, bigEndian); } SCOPED_MEMORY_ACCESS.putLongUnaligned(sessionImpl, unsafeGetBase, unsafeGetOffset + len - Long.BYTES, longValue, bigEndian); } else { // Handle larger segments via native calls SCOPED_MEMORY_ACCESS.setMemory(sessionImpl, unsafeGetBase, unsafeGetOffset, len, value); } } } The current CodeSize is 370, which is greater than 325. It cannot be inlined during C2 optimization. We can extract the method calls used in each branch and declare them as local variables, which can reduce the CodeSize to 298. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25383#discussion_r2101921745