We've been using an RFC3986 URI implementation for over a decade, there were issues we had to work around regarding formatting, so we provided static methods to address them.  Significant performance benefits can be derived from strict normalization relating to identity.

Java doesn't implement RFC2396 strictly, as it has an expanded character set that doesn't require escaping and can result in more than one normalized form.   My understanding is its these types of corner cases regarding character escaping are what prevented Java's URI implementation from being upgraded to RFC3986.

We use RFC3986 for identity and we use URL for connections, currently the JDK still depends on URL for identity, which generally incurs the cost of network DNS lookup (at least once per URL).   When using Uri for identity, it allows for server replication for example, however when URL is used, as it resolves to an IP address, there may be a number of replicating servers that resolve from the same URI to an address range.   In reality identity should be determined by authentication, there's a high cost of using DNS to determine identity, when a suitable RFC3986 URI normalization implementation can infer it without incurring network calls.

Perhaps it might be an option to use a provider mechanism, allowing an RFC version to be selected?

Our implementation also has a utility method to return a URL instance for connections.

Javadoc from our RFC3986 Uri implementation (AL2.0):

/**
 * This class represents an immutable instance of a URI as defined by RFC 3986.
 * <p>
 * This class replaces java.net.URI functionality.
 * <p>
 * Unlike java.net.URI this class is not Serializable and hashCode and
 * equality is governed by strict RFC3986 normalisation. In addition "other"  * characters allowed in java.net.URI as specified by javadoc, not specifically  * allowed by RFC3986 are illegal and must be escaped.  This strict adherence
 * is essential to eliminate false negative or positive matches.
 * <p>
 * In addition to RFC3896 normalisation, on OS platforms with a \ file separator  * the path is converted to UPPER CASE for comparison for file: schema, during
 * equals and hashCode calls.
 * <p>
 * IPv6 and IPvFuture host addresses must be enclosed in square brackets as per  * RFC3986.  A zone delimiter %, if present, must be represented in escaped %25
 * form as per RFC6874.
 * <p>
 * In addition to RFC3986 normalization, IPv6 host addresses will be normalized  * to comply with RFC 5952 A Recommendation for IPv6 Address Text Representation.
 * This is to ensure consistent equality between identical IPv6 addresses.
 *
 * @since 3.0.0
 */
public final class Uri implements Comparable<Uri> {

<SNIP>  Static factory methods for various cases:

    /**
     * Parses the given argument {@code rfc3986compliantURI} and creates an appropriate URI
     * instance.
     *
     * The parameter string is checked for compliance, an IllegalArgumentException
     * is thrown if the string is non compliant.
     *
     * @param rfc3986compliantURI
     *            the string which has to be parsed to create the URI instance.
     * @return the created instance representing the given URI.
     */
    public static Uri create(String rfc3986compliantURI) {
        Uri result = null;
        try {
            result = new Uri(rfc3986compliantURI);
        } catch (URISyntaxException e) {
            throw new IllegalArgumentException(e.getMessage());
        }
        return result;
    }

    /**
     * The parameter string doesn't contain any existing escape sequences, any
     * escape character % found is encoded as %25. Illegal characters are
     * escaped if possible.
     *
     * The Uri is normalised according to RFC3986.
     *
     * @param unescapedString URI in un-escaped string form
     * @return an RFC3986 compliant Uri.
     * @throws java.net.URISyntaxException if string cannot be escaped.
     */
    public static Uri escapeAndCreate(String unescapedString) throws URISyntaxException{
        return new Uri(quoteComponent(unescapedString, allLegalUnescaped));
    }

    /**
     * The parameter string may already contain escaped sequences, any illegal      * characters are escaped and any that shouldn't be escaped are un-escaped.
     *
     * The escape character % is not re-encoded.
     * @param nonCompliantEscapedString URI in string from.
     * @return an RFC3986 compliant Uri.
     * @throws java.net.URISyntaxException if string cannot be escaped.
     */
    public static Uri parseAndCreate(String nonCompliantEscapedString) throws URISyntaxException{         return new Uri(quoteComponent(nonCompliantEscapedString, allLegal));
    }

<SNIP>


/** Fixes windows file URI string by converting back slashes to forward
     * slashes and inserting a forward slash before the drive letter if it is
     * missing.  No normalisation or modification of case is performed.
     * @param uri String representation of URI
     * @return fixed URI String
     */
    public static String fixWindowsURI(String uri) {
        if (uri == null) return null;
        if (File.separatorChar != '\\') return uri;
        if ( uri.startsWith("file:") || uri.startsWith("FILE:")){
            char [] u = uri.toCharArray();
            int l = u.length;
            StringBuilder sb = new StringBuilder(uri.length()+1);
            for (int i=0; i<l; i++){
                // Ensure we use forward slashes
                if (u[i] == File.separatorChar) {
                    sb.append('/');
                    continue;
                }
                if (i == 5 && uri.startsWith(":", 6 )) {
                    // Windows drive letter without leading slashes doesn't comply
                    // with URI spec, fix it here
                    sb.append("/");
                }
                sb.append(u[i]);
            }
            return sb.toString();
        }
        return uri;
    }


<SNIP>


public static Uri filePathToUri(String path) throws URISyntaxException{
        String forwardSlash = "/";
        if (path == null || path.length() == 0) {
            // codebase is "file:"
            path = "*";
        }
        // Ensure compatibility with URLClassLoader, when directory
        // character is dropped by File.
        boolean directory = false;
        if (path.endsWith(forwardSlash)) directory = true;
        path = new File(path).getAbsolutePath();
        if (directory) {
            if (!(path.endsWith(File.separator))){
                path = path + File.separator;
            }
        }
        if (File.separatorChar == '\\') {
            path = path.replace(File.separatorChar, '/');
        }
        path = fixWindowsURI("file:" + path);
        return Uri.escapeAndCreate(path); //$NON-NLS-1$
    }

<SNIP>

    /**
     * Converts this URI instance to a URL.
     *
     * @return the created URL representing the same resource as this URI.
     * @throws MalformedURLException
     *             if an error occurs while creating the URL or no protocol
     *             handler could be found.
     */
    public URL toURL() throws MalformedURLException {
        if (!absolute) {
            throw new IllegalArgumentException(Messages.getString("luni.91") + ": " //$NON-NLS-1$//$NON-NLS-2$
                    + toString());
        }
        if (opaque) return new URL(toString()); // Let the Handler parse it.
        String hst = host;
        StringBuilder sb = new StringBuilder();
        //userinfo will be rare, utilise sb, then clear it.
        if (userinfo != null){
            sb.append(userinfo).append('@').append(hst);
            hst = sb.toString();
            sb.delete(0, sb.length()-1);
        }
        // now lets create the file section of the URL.
        sb.append(path);
        if (query != null) sb.append('?').append(query);
        if (fragment != null) sb.append('#').append(fragment);
        String file = sb.toString(); //for code readability
        // deprecated to provide a warning against misuse, not for removal.
        @SuppressWarnings("deprecation")
        URL url = new URL(scheme, hst, port, file, null);
        return url;
    }

--
Regards,
Peter

Reply via email to