On Fri, 23 Apr 2021 19:48:47 GMT, Kim Barrett <kbarr...@openjdk.org> wrote:
> Please review this change to the String Deduplication facility. It is being > changed to use OopStorage to hold weak references to relevant objects, > rather than bespoke data structures requiring dedicated processing phases by > supporting GCs. > > (The Shenandoah update was contributed by Zhengyu Gu.) > > This change significantly simplifies the interface between a given GC and > the String Deduplication facility, which should make it much easier for > other GCs to opt in. However, this change does not alter the set of GCs > providing support; currently only G1 and Shenandoah support string > deduplication. Adding support by other GCs will be in followup RFEs. > > Reviewing via the diffs might not be all that useful for some parts, as > several files have been essentially completely replaced, and a number of > files have been added or eliminated. The full webrev might be a better > place to look. > > The major changes are in gc/shared/stringdedup/* and in the supporting > collectors, but there are also some smaller changes in other places, most > notably classfile/{stringTable,javaClasses}. > > This change is additionally labeled for review by core-libs, although there > are no source changes there. This change injects a byte field of bits into > java.lang.String, using one of the previously unused padding bytes in that > class. (There were two unused bytes, now only one.) > > Testing: > mach5 tier1-2 with and without -XX:+UseStringDeduplication > > Locally (linux-x64) ran all of the existing tests that use string > deduplication with both G1 and Shenandoah. Note that > TestStringDeduplicationInterned.java is disabled for shenandoah, as it > currently fails, for reasons I haven't figured out but suspect are test > related. > > - gc/stringdedup/ -- these used to be in gc/g1 > - runtime/cds/SharedStringsDedup.java > - runtime/cds/appcds/cacheObject/DifferentHeapSizes.java > - runtime/cds/appcds/sharedStrings/* > > shenandoah-only: > - gc/shenandoah/jvmti/TestHeapDump.java > - gc/shenandoah/TestStringDedup.java > - gc/shenandoah/TestStringDedupStress.java > > Performance tested baseline, baseline + stringdedup, new with stringdedup, > finding no significant differences. Changes requested by iklam (Reviewer). src/hotspot/share/gc/shared/stringdedup/stringDedupTable.cpp line 557: > 555: // non-latin1, and deduplicating if we find a match. For > deduplication we > 556: // only care if the arrays consist of the same sequence of bytes. > 557: const jchar* chars = static_cast<jchar*>(value->base(T_CHAR)); The encoding of the shared strings always match CompactStrings. Otherwise the CDS archive will fail to map. E.g., $ java -XX:-CompactStrings -Xshare:dump $ java -XX:+CompactStrings -Xlog:cds -version ... [0.032s][info][cds] UseSharedSpaces: The shared archive file's CompactStrings setting (disabled) does not equal the current CompactStrings setting (enabled). [0.032s][info][cds] UseSharedSpaces: Unable to map shared spaces ... So you can avoid this `if` block when `CompactStrings == true`. The `!java_lang_String::is_latin1(found)` check below can be changed to an assert. Also, I think we need an explicit test case where you'd return `true` at line 564. I can write a new test case and email to you. I think it will involve dumping an archive with `-XX:-CompactStrings`. ------------- PR: https://git.openjdk.java.net/jdk/pull/3662