String.intern() would seem to provide better coverage considering that some
users may not use the G1 collector.

On Mon, Feb 8, 2021 at 3:19 PM Keith Turner <ke...@deenlo.com> wrote:

> Recently while running some large map reduce jobs I learned that
> Hadoop uses String.intern() in its RPC code (below is a link to an
> example on one place where Hadoop does this).  I learned this because
> when I ran jstack on NN, RM, and/or AM that were under distress
> sometimes I kept seeing RPC server threads that were in
> String.intern().  I never was quite sure if it was a problem though.
> Not saying String.intern() is bad or good, just sharing something I
> observed that I was uncertain about.
>
> May make sense to create some sort of stress test that could simulate
> the usage pattern of the TabletLocator and try the different options
> and see what happens.  If any long pauses or problems happen in the
> simulation, they may happen in the real environment.
>
>
> https://github.com/apache/hadoop/blob/ba631c436b806728f8ec2f54ab1e289526c90579/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/TaskStatus.java#L481
>
> https://github.com/apache/hadoop/blob/ba631c436b806728f8ec2f54ab1e289526c90579/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/StringInterner.java#L67
>
> On Mon, Feb 1, 2021 at 9:55 PM Christopher <ctubb...@apache.org> wrote:
> >
> > While code reviewing, I saw that
> > core/src/main/java/org/apache/accumulo/core/clientImpl/TabletLocator.java
> > was using a WeakHashMap to deduplicate some strings.
> >
> > This code can probably be removed in favor of one of the following two
> options:
> >
> > 1. Just explicitly use String.intern() - As of Java 7, there is no
> > longer a separate, fixed-size PermGen space, so intern'd strings will
> > be in the main heap, no longer constrained to a limited size pool.
> > These strings are still subject to garbage collection. It is
> > implemented as a HashMap internally (native implementation), with a
> > default bucket size of more than 60K, plenty big enough for the
> > interning that TabletLocator is doing... but this is configurable by
> > the user with JVM flags if it's not. Interning will use less memory as
> > WeakHashMap and similar performance, as long as the bucket size is big
> > enough.
> >
> > 2. Just use -XX:+UseStringDeduplication JVM flag - as of Java 9, G1 is
> > the new default Java garbage collector. This garbage collector has the
> > option to automatically attempt to deduplicate all strings behind the
> > scenes, by swapping out their underlying char arrays (so, it likely
> > won't affect == equality because the String object references
> > themselves won't change, unlike option 1). This is more passive than
> > option 1, but would apply to the entire JVM. G1GC also implements some
> > heuristics to prevent too much overhead.
> >
> > With both options, it's possible to output statistics.
> >
> > If I remove the WeakHashMap for the string deduplication in
> > TabletLocator, does anybody have an opinion on which option I should
> > replace it with? I'm leaning towards option 2 (adding it to
> > assemble/conf/accumulo-env.sh as one of the default flags).
>

Reply via email to