On Mon, 10 Jun 2024 13:04:08 GMT, Shaojin Wen <d...@openjdk.org> wrote:
>> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into >> primitive arrays by combining values into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that >> these two methods can be optimized by C2. > > # 1. Compare with unsafe branch > > 1. current (`5e815`) > https://github.com/wenshao/jdk/tree/optim_str_builder_append_202406 > 2. unsafe (`adc220`) > https://github.com/wenshao/jdk/tree/optim_str_builder_append_202406_unsafe > > I think the performance of the Unsafe branch may be the best data for the C2 > optimizer. @eme64 can help me see if C2 can do it? > > # 2. Benchmark Commands > > make test TEST="micro:java.lang.StringBuilders.toStringCharWithBool8" > make test TEST="micro:java.lang.StringBuilders.toStringCharWithNull8" > > > # 3. Implementation of Unsafe Branch > > class AbstractStringBuilder { > static final Unsafe UNSAFE = Unsafe.getUnsafe(); > > static final int NULL_LATIN1; > static final int TRUE_LATIN1; > static final int FALS_LATIN1; > > static final long NULL_UTF16; > static final long TRUE_UTF16; > static final long FALS_UTF16; > > static { > byte[] bytes4 = new byte[4]; > byte[] bytes8 = new byte[8]; > > bytes4[0] = 'n'; > bytes4[1] = 'u'; > bytes4[2] = 'l'; > bytes4[3] = 'l'; > NULL_LATIN1 = UNSAFE.getInt(bytes4, Unsafe.ARRAY_BYTE_BASE_OFFSET); > StringUTF16.inflate(bytes4, 0, bytes8, 0, 4); > NULL_UTF16 = UNSAFE.getLong(bytes8, Unsafe.ARRAY_BYTE_BASE_OFFSET); > > bytes4[0] = 't'; > bytes4[1] = 'r'; > bytes4[2] = 'u'; > bytes4[3] = 'e'; > TRUE_LATIN1 = UNSAFE.getInt(bytes4, Unsafe.ARRAY_BYTE_BASE_OFFSET); > StringUTF16.inflate(bytes4, 0, bytes8, 0, 4); > TRUE_UTF16 = UNSAFE.getLong(bytes8, Unsafe.ARRAY_BYTE_BASE_OFFSET); > > bytes4[0] = 'f'; > bytes4[1] = 'a'; > bytes4[2] = 'l'; > bytes4[3] = 's'; > FALS_LATIN1 = UNSAFE.getInt(bytes4, Unsafe.ARRAY_BYTE_BASE_OFFSET); > StringUTF16.inflate(bytes4, 0, bytes8, 0, 4); > FALS_UTF16 = UNSAFE.getLong(bytes8, Unsafe.ARRAY_BYTE_BASE_OFFSET); > } > > private AbstractStringBuilder appendNull() { > ensureCapacityInternal(count + 4); > int count = this.count; > byte[] val = this.value; > if (isLatin1()) { > UNSAFE.putInt( > val, > Unsafe.ARRAY_BYTE_BASE_OFFSET + count, > NULL_LATIN1); > } else { > UNSAFE.putLong( > val, > Unsafe.ARRAY_BYTE_BASE_OFFSET + (count << 1), > NULL_UTF16); > } > this.count = count + 4; > return this; > } > > public AbstractStringBuilder append(boolean b) { > int count = th... @wenshao > I think the performance of the Unsafe branch may be the best data for the C2 > optimizer. @eme64 can help me see if C2 can do it? Have you tried to see if the optimization actually was done/taken? You can use the `TraceMergeStores,` flag. Can you present the generated assembly code of the benchmarks, and explain the difference based on the generated assembly code? You can run JMH penchmarks with `perf`. These two blogs may help you: http://psy-lob-saw.blogspot.com/2015/07/jmh-perfasm.html https://shipilev.net/blog/2016/arrays-wisdom-ancients/#_meet_jmh_prof_perfasm @liach I don't think it makes a difference if it is `int` or `byte` constants. Or what exactly is the code change you are proposing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2158533469