Github user justinleet commented on the issue:

    https://github.com/apache/incubator-metron/pull/402
  
    @cestella
    Do you know of an example URL that isn't a URL in Java, e.g. with the 
square brackets issue?  I didn't see one in that thread, but I could have just 
missed it.  Are we sure commons validator wouldn't have the same issue?  I'm 
fine with validating them and then parsing if we're stuck dealing with Java 
implementation quirks.
    
    Specifically on the square brackets issue, it sounds like (and this is way 
out of my wheelhouse, so forgive my potential ignorance) that these characters 
(and others) qualify as unwise (https://www.ietf.org/rfc/rfc2396.txt) and 
should be escaped, which is why Java is strict about them.
    
    Could we appropriately escape the offending characters and end up with a 
correct URI? If commons has something to validate a URL, is there something to 
do that escaping for us?  If that works, do we potentially want to be doing 
operations on URIs instead of URLs, e.g. `URI_TO_PATH`, `URI_TO_HOST`?  
Obviously, this raises the question of what we do the URL* functions and 
coexistence.
    
    In terms of test cases, I'd want to see something with the unwise character 
issue (but I don't have an example of that myself).  I can help you try to dig 
one up if you want.  In addition, I'd like to see something with a fragment 
defined (e.g. `http://java.sun.com/index.html#chapter1`) and make sure that the 
functions work appropriately (mostly just tests for `TO_PATH`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to