On 1/8/13 6:55 PM, Vitaly Davidovich wrote:
Also, I'm not sure how hot this method is in practice but allocating
StringBuilder seems a bit heavy (maybe it gets partially escape
analyzed out though). Can you instead loop over all the chars in the
string and build up the hash code as you go along? If you see a % then
you handle next 2 chars specially, like you do now. Or are you trying
to take advantage of String intrinsic support in the JIT? I guess if
perf is a concern you can write a micro benchmark comparing the
approaches ...
That did occur to me, but I guess we have to be consistent with the
value that people have already been using, and that means I have
to duplicate the code in String.hashCode() (that is what the original
implementation was calling) - I was trying to avoid that. Also,
String.hashCode() uses its internal char[] - whereas charAt() will
involve several additional bound checks - but
using toCharArray() may be better. Let me take another look at this, and
get back with another webrev.
Sent from my phone
On Jan 8, 2013 9:45 PM, "Vitaly Davidovich" <vita...@gmail.com
<mailto:vita...@gmail.com>> wrote:
Hi Kurchi,
In the hash method, I suggest you move handling of strings with %
into a separate method to keep the hash method small for common
case (no %). Otherwise, there's a chance this method won't get
inlined anymore due to its (newly increased) size.
- Yep, will do.
Also, I realize toLower does the same thing, but why does toUpper
return an int whereas it's really a char? Might be cleaner to
declare return type as char and do the cast inside the method as
needed.
- I followed the format of toLower(). But I agree this way it will be
cleaner.
Thanks a lot,
Kurchi
Thanks
Sent from my phone
On Jan 8, 2013 8:20 PM, "Kurchi Hazra"
<kurchi.subhra.ha...@oracle.com
<mailto:kurchi.subhra.ha...@oracle.com>> wrote:
Hi,
According to RFC 3986[1], hexadecimal digits encoded by a
'%' should be case-insensitive, for example,%A2 and %a2 should be
considered equal. Although, URI.equals() does take this into
consideration, the implementation of URI.hashCode() does not and
returns different hashcodes for two URIs that are similar in
all respects except for the case of the percent-encoded
hexadecimal digits.
This fix attempts to construct a normalized string from the
string representing a component before calculating its hashCode.
I converted to upper case for the normalization(and not lower
case) as required by [1].
For testing the fix, I added an additional test scenario
to an existing test (jdk/test/java/net/URI/Test.java). While I
was there, I also made
minor changes to the test so that it does not produce rawtype
and other lint warnings.
Bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7171415
Webrev: http://cr.openjdk.java.net/~khazra/7171415/webrev.00/
<http://cr.openjdk.java.net/%7Ekhazra/7171415/webrev.00/>
URI.compareTo() still suffers from the same problem - I am not
sure if it should be dealt with as a separate bug.
Thanks,
Kurchi
[1] http://tools.ietf.org/html/rfc3986#section-6.2.2.1