On Wed, 28 Apr 2021 00:44:19 GMT, Ioi Lam <ik...@openjdk.org> wrote: >> Please review this change to the String Deduplication facility. It is being >> changed to use OopStorage to hold weak references to relevant objects, >> rather than bespoke data structures requiring dedicated processing phases by >> supporting GCs. >> >> (The Shenandoah update was contributed by Zhengyu Gu.) >> >> This change significantly simplifies the interface between a given GC and >> the String Deduplication facility, which should make it much easier for >> other GCs to opt in. However, this change does not alter the set of GCs >> providing support; currently only G1 and Shenandoah support string >> deduplication. Adding support by other GCs will be in followup RFEs. >> >> Reviewing via the diffs might not be all that useful for some parts, as >> several files have been essentially completely replaced, and a number of >> files have been added or eliminated. The full webrev might be a better >> place to look. >> >> The major changes are in gc/shared/stringdedup/* and in the supporting >> collectors, but there are also some smaller changes in other places, most >> notably classfile/{stringTable,javaClasses}. >> >> This change is additionally labeled for review by core-libs, although there >> are no source changes there. This change injects a byte field of bits into >> java.lang.String, using one of the previously unused padding bytes in that >> class. (There were two unused bytes, now only one.) >> >> Testing: >> mach5 tier1-2 with and without -XX:+UseStringDeduplication >> >> Locally (linux-x64) ran all of the existing tests that use string >> deduplication with both G1 and Shenandoah. Note that >> TestStringDeduplicationInterned.java is disabled for shenandoah, as it >> currently fails, for reasons I haven't figured out but suspect are test >> related. >> >> - gc/stringdedup/ -- these used to be in gc/g1 >> - runtime/cds/SharedStringsDedup.java >> - runtime/cds/appcds/cacheObject/DifferentHeapSizes.java >> - runtime/cds/appcds/sharedStrings/* >> >> shenandoah-only: >> - gc/shenandoah/jvmti/TestHeapDump.java >> - gc/shenandoah/TestStringDedup.java >> - gc/shenandoah/TestStringDedupStress.java >> >> Performance tested baseline, baseline + stringdedup, new with stringdedup, >> finding no significant differences. > > src/hotspot/share/gc/shared/stringdedup/stringDedupTable.cpp line 557: > >> 555: // non-latin1, and deduplicating if we find a match. For >> deduplication we >> 556: // only care if the arrays consist of the same sequence of bytes. >> 557: const jchar* chars = static_cast<jchar*>(value->base(T_CHAR)); > > The encoding of the shared strings always match CompactStrings. Otherwise the > CDS archive will fail to map. E.g., > > > $ java -XX:-CompactStrings -Xshare:dump > $ java -XX:+CompactStrings -Xlog:cds -version > ... > [0.032s][info][cds] UseSharedSpaces: The shared archive file's CompactStrings > setting (disabled) does not equal the current > CompactStrings setting (enabled). > [0.032s][info][cds] UseSharedSpaces: Unable to map shared spaces > ... > > > So you can avoid this `if` block when `CompactStrings == true`. The > `!java_lang_String::is_latin1(found)` check below can be changed to an assert. > > Also, I think we need an explicit test case where you'd return `true` at line > 564. I can write a new test case and email to you. I think it will involve > dumping an archive with `-XX:-CompactStrings`.
Oh, I realized that my suggestion above is wrong. When `CompactStrings==true`, you can still have a string whose coder is UTF16: https://github.com/openjdk/jdk/blob/75a2354dc276e107d64516d20fc72bc7ef3d5f86/src/java.base/share/classes/java/lang/String.java#L343-L351 ------------- PR: https://git.openjdk.java.net/jdk/pull/3662