Sorry, but I have to take a step back here and withdraw from this
discussion. This nit is taking too much of my attention. It works, so
I'll leave it be, although I'm not fully comfortable with the presence
of the intern() calls. It works just fine locally without them and I
can't imagine a big speed difference compared to the rest of the layout
process. If anyone wants to take action, feel free.

On 03.07.2005 21:04:09 Andreas L. Delmelle wrote:
> > -----Original Message-----
> > From: Jeremias Maerki [mailto:[EMAIL PROTECTED]
> >
> > Thanks to Finn and Andreas for looking at this. Given what I see here
> > I'd say the two intern() calls in FOTreeBuilder would better be removed.
> > The Set the namespaces are stored in works with equals() anyway, so I
> > don't see the point of interning the Strings.
> Not exactly... Maybe they had better be removed, but let's make sure it
> isn't for the wrong reasons.
> > Or do I miss anything here?
> Could be. Ultimately, apart from the string's lengths and the number of
> identical copies that are going to be alive at a given point, this would
> depend on:
> a) the number of times the relevant portions of code --addElementMapping() +
> conditional in findFOMaker()-- are actually executed
> b) how many times those particular strings --URIs-- are going to be compared
> to other potentially interned strings: see my addition of measurement
> results with only one of the strings interned
> To repaint the full picture:
> Writing
> static final String s1 = "some-string-value";
> is the same as
> static final String s1 = "some-string-value".intern();
> All strings that are subsequently assigned the value of s1, will be
> reference strings pointing to the same canonical string.
> String s2 = s1;
> is exactly the same as:
> String s2 = s1.intern();
> (so: (s1 == s2) == true)
> which is precisely why the following is considered Bad
> (unless you need a really long string for a very brief moment, just once)
> String s3 = new String("some-string-value");
> (so: (s1 == s3) == false; s1.equals(s3) == true)
> Effects similar to those of the latter statement --different String
> instances with the same string value-- are inevitable when the Strings
> originate from a file or database, or are built at runtime using
> StringBuffer.toString() --anywhere the value isn't known at compile-time.
> Hence the option of intern() to allow the compiler to optimize the bytecode.
> Optimizations among which you'll find: using bytecode for reference
> comparison, unless when explicitly asked not to do so (by explicitly using
> only equals()).
> One thing I noticed was that, once both strings to be compared were
> guaranteed to be interned at compile-time it didn't even matter anymore
> whether or not the values were the same, so '== || equals()' gave exactly
> the same results as plain '=='. Apparently, the compiler could figure out
> that in this situation, equals() would never really be needed, or would lead
> to the same results anyway.
> Strange though, that it optimizes only partly when both strings are assigned
> the same hard-coded literal --in that case, when the string values are
> different, the results of || would indicate the equals() side *is* evaluated
> (maybe even the only side evaluated, because at compile-time the strings are
> known to be different, but given the source-code, the compiler somehow
> cannot exclude that they are going to be different at run-time...?)
> String.intern() should indeed be used with care. It's not a good idea to
> intern a string at random, but if that happens only in a relatively small
> number of situations, then it won't do much harm. If the call to intern() is
> going to be made many times, you have to take into account that it
> ultimately maps to a native (JNI) method.
> To be absolutely sure, it would probably be wise to check for a threshold:
> i.e. at what point does the overhead of interning really become a
> drawback --we already know from the measurements that it's still worthwhile
> to intern() once, if the number of subsequent comparisons is sufficiently
> large (10^8).
> AFAICT, the preferred option would be to introduce a layer --a pool of
> interned strings-- in between, where you go:
> if( !contains( string ) ) {
>   add( string.intern() );
> }
> ...
> return get( string );
> So, the use of equals() (via: contains()) on separate instances remains
> limited.
> The local table is guaranteed to return a reference string every time, but
> interning happens only the first time a given string value is encountered at
> run-time. Once the string is added to the local table, the call to intern()
> is avoided altogether and replaced with a faster get() --another drain
> caused by intern() is precisely the lookup in a much larger global table to
> check if an instance with that value already exists.
> The only thing to remain aware of is that, by implementing such a map, one
> might end up holding references to some string-values that are used only
> once, keeping them from being garbage-collected.
> The real fun begins when you create Strings as substrings of interned
> Strings...
> You could build one large string containing, say all possible names in a
> document. If you interned this string only once, and used String.substring()
> to create individual names later on, then from the POV of the compiler, all
> of them would be pointers into different parts of one and the same string,
> each of the names and all of their copies taking up the space of an int no
> matter how long they actually are. Even without knowing the exact value of
> the canonical string at compile-time, the compiler still will be able to
> generate much more efficient code in many places, at the 'cost' of only one
> intern() at run-time.
> Hope any of this is useful...
> Cheers,
> Andreas

Jeremias Maerki

Reply via email to