Thank you for this. Much appreciated.
I for one wasn't aware of this. I had to look into the JDK source because I
refused to believe you :-)
You are right.

I've done some basic grepping for

"Map[:space:]*<[:space:]*URL"

on the NetBeans source code and as I feared this pattern appears all over
the place.

The reason why it is not as bad as it seems is that the vast majority of
these are really jar URLs, and these URLs have their own URLStreamHandler
in the JDK which overrides the sameFile() method (used in equals()) and
hashCode() methods so that there's no longer any name lookup performed.
This is also true for the one you point to, namely PatchModuleFileManager.
Out of the box with JDK, the issue you point to affects "http(s)://",
"ftp://";, "file://", "mailto://";, but not "jar://".

On top of this - and perhaps more importantly - I can see the code base
makes heavy use of its own custom URLStreamHandlers and
URLStreamHandlerFactories so I would guess that this has been taken care
of. The developers have indeed been aware of this as far as I can tell. All
in all I think this has been taken into consideration.

But you are 100% right that most people would not expect url1.equals(url2)
to trigger a name service lookup (DNS query). Ouch!

I had to go over my own contribution, PR 161, but luckily it uses caching
by URI, not caching by URL. Phew.

Thanks.



On Sat, Oct 28, 2017 at 7:16 PM, Victor Williams Stafusa da Silva <
[email protected]> wrote:

> I was looking around an issue, when I landed in the class
> PatchModuleFileManager (
> https://github.com/apache/incubator-netbeans/blob/
> master/java.source.base/src/org/netbeans/modules/java/source/parsing/
> PatchModuleFileManager.java,
> version of 20170921).
>
> This class uses java.net.URLs as keys of a HashMap. This is a really bad
> thing to do because URL has one of the worst implementations of
> equals(Object) and hashCode() ever invented.
>
> Let's see this snippet from the URL.hashCode()'s method javadoc (
> https://docs.oracle.com/javase/9/docs/api/java/net/URL.html#hashCode--):
>
> > The hash code is based upon all the URL components relevant for URL
> comparison. As such, this operation is a blocking operation.
>
> The equals(Object) method is also similarly bad:
>
> > Two hosts are considered equivalent if both host names can be resolved
> into the same IP addresses; else if either host name can't be resolved, the
> host names must be equal without regard to case; or both host names equal
> to null.
>
> > Since hosts comparison requires name resolution, this operation is a
> blocking operation.
>
> > Note: The defined behavior for equals is known to be inconsistent with
> virtual hosting in HTTP.
> So, both equals(Object) and hashCode() method are blocking operations which
> do DNS lookups dependent on the state of the internet. Also, it is at least
> debatable the fact that hashCode() method is synchronized. At least, it
> caches the hashCode() method results. I'm not sure, but I would not be
> surprised if it is possible to construct two different instances of URLs
> from the same String with different hashCodes()'s due to a network topology
> change in the meantime.
>
> This all means that the URL class, by not having a sane equals(Object) and
> hashCode() implementation shouldn't be used as a key to a HashMap or
> LinkedHashMap nor be used in a HashSet or LinkedHashSet.
>
> In fact, I'm in favor of putting a very big and loud @Deprecated in
> java.net.URL. It is an innocent-looking, but actually very dangerous,
> performance bottleneck and bug factory. Although it is actually possible to
> use it safely if you never call equals(Object) and hashCode(), this is
> something extremely hard and error-prone to do in practice, since
> equals(Object) and hashCode() usage is widespread as it should be. And
> arguably, a class that features ill-conceived equals(Object) and hashCode()
> methods are itself ill-conceived. However, that is something that should be
> done with the JDK guys.
>
> Now, back to the PatchModuleFileManager, it uses URL as a key to a HashMap.
> That is nasty, so I propose that it should be replaced with URI or String.
> The PatchModuleFileManager uses the FileObjects class in the same package,
> which also uses the URL.equals(object) method. The codebase is huge, and
> those are just the first two classes that I checked, so it is very likely
> that the URL class is misused in a lot of other places in the code base,
> deserving a major effort to check for that.
>
> I'm not familiar with the internal details of netbeans nor is that anything
> that I could quicky learn, so I'm not in position of creating a PR to fix
> that, so I will just raise the alarm here.
>
> Victor Williams Stafusa da Silva
>

Reply via email to