Github user justinleet commented on the issue:
https://github.com/apache/incubator-metron/pull/402
@cestella
Do you know of an example URL that isn't a URL in Java, e.g. with the
square brackets issue? I didn't see one in that thread, but I could have just
missed it. Are we sure commons validator wouldn't have the same issue? I'm
fine with validating them and then parsing if we're stuck dealing with Java
implementation quirks.
Specifically on the square brackets issue, it sounds like (and this is way
out of my wheelhouse, so forgive my potential ignorance) that these characters
(and others) qualify as unwise (https://www.ietf.org/rfc/rfc2396.txt) and
should be escaped, which is why Java is strict about them.
Could we appropriately escape the offending characters and end up with a
correct URI? If commons has something to validate a URL, is there something to
do that escaping for us? If that works, do we potentially want to be doing
operations on URIs instead of URLs, e.g. `URI_TO_PATH`, `URI_TO_HOST`?
Obviously, this raises the question of what we do the URL* functions and
coexistence.
In terms of test cases, I'd want to see something with the unwise character
issue (but I don't have an example of that myself). I can help you try to dig
one up if you want. In addition, I'd like to see something with a fragment
defined (e.g. `http://java.sun.com/index.html#chapter1`) and make sure that the
functions work appropriately (mostly just tests for `TO_PATH`).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---