DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUGĀ·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=35052>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED ANDĀ·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=35052





------- Additional Comments From [EMAIL PROTECTED]  2005-07-12 01:27 -------
(In reply to comment #10)
> Logger.getLogger("logger1");
> 
> Using a string constant here, "logger1" was already interned

true

> "logger1".equals("logger1") is fast.  No need to explicitly do an intern()

I agree that the literal strings in code are automatically interned, and the
String.equals method should be checking for identity before bothering to do a
loop checking the string content [1]. There's still the overhead of a
(non-virtual) method invocation and return though.

However the main point of this is to optimise the comparisons executed as part
of Hashtable.get when looking up the logger from the map of all loggers. And
Hashtable.get may well do several *failed* comparisons before finding the right
logger to return.

The operation "string1" == "string2" is fast always; if the references are
different then the result is false.

However "string1".equals("string2") has a fair bit of overhead. The
String.equals method needs to do:
 * if (param instanceof String)
 * cast param to String
 * if string lengths differ: return false
 * otherwise compare each byte [1]
[1] or potentially compare hashcodes.

The reference comparison starts to look nice..

How many such non-equal comparisons occur in the Hashtable lookup? Well that
depends upon the hashcode collision rate inside the hashtable; obviously two
strings whose hashcodes cause them to be allocated to different buckets never
get compared. The java.util.HashMap class has a default loadFactory of 0.75. So
at a wild guess I would think only about 30% of lookups would hit a bucket with
more than one entry, and in 50% of those cases the right entry would be first.

So I agree this optimisation is not critical, but it could have a measurable
improvement.

> 2.  You're evaluating the string, you're doing:
> 
> int i = 1;
> Logger.getLogger("logger" + i);
> ...
> Logger.getLogger("logger" + i);
> 
> In which case, the cost of intern() is MUCH more expensive than intern anyway,
> so why bother?

True. However such usage really is a little bizarre.

> 
> 3.  You're storing the evaluated string and a combination of above.  Again,
> intern() is more expensive than equals() so why bother.
> 
> I don't understand the need for intern() at all.  Sounds like a premature
> optimization bug.

A very common pattern is:
  Logger.getLogger(someClass.getName());

I've checked with Sun java 1.3.1 and 1.5 and the string returned by
someClass.getName does get interned automatically like literal strings in the
code. The question is whether that behaviour should be relied on or not. I'm not
aware of anything in the java specs that mandate this string going into the
intern pool (unlike literals in the code which *must* go there) but I think it
likely that all Java implementations *will* do this; when the JVM is loading the
raw .class file into memory it needs to store the literal strings somewhere and
putting them in the intern pool seems as good a place as any.



So: first of all, string comparisons in the logger Hashtable lookup will not be
common (maybe 30% of lookups). And in all the scenarios where interning makes
sense, the strings seem to already be interned; comparing two equal interned
strings with .equals should be close to the speed of comparing them with ==.
Where the two strings are actually different (maybe 15% of lookups) there are
two cases:
* strings have different length (most of the time)
  --> minor win for the interned comparison
* strings have same length (pretty rare)
  --> major win for the interned comparison

Summary: From a theoretical point I tend to agree that this use of String.intern
won't have much effect and could have been omitted. Of course actually testing
this might be wise -- real stats win over theory any day :-). 

But on the other hand, String.intern only does active harm when the user is
generating large numbers of logger names dynamically - and I can't see any sane
reason for anyone to do that. 

[1] Actually, GNU Classpath doesn't! Currently comparison of a string object
with itself does a reasonable amount of work (though not a scan of the actual
string data). I've sent off an email querying this.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to