On Sat, Jul 13, 2013 at 12:46:17PM -0700, Kyle J. McKay wrote:
> I expect it will be easier just to normalize the URL without
> splitting. That is, lowercase the parts that are case-insensitive
> (scheme and host name) and adjust the URL-escaping to remove URL
> escaping (%xx) from characters that don't need it but add it to any
> for which it is required that are not escaped (according to RFC
I think you are suggesting doing better than this, but just to be clear,
we cannot treat the URL as a simple string and just decode and
One of the things that gets encoded are the delimiting characters. So if
I have the URL:
you would "canonicalize" it into:
But those are two different URLs entirely; the first has the username
"foo:bar", and the second has the username "foo" and the password "bar".
I admit that these are unlikely to come up in practice, but I am worried
that there is some room for mischief here. For example:
If we canonicalize that into:
and do a lookup, we think we are hitting example.com, but we are
actually hitting example.comtricky.host (i.e., that is how curl will
interpret it). If we were deciding to use a stored credential based on
that information, it would be quite bad (we would leak credentials to
the owner of comtricky.host). I know your patch does not impact the
credential lookup behavior, but it would be nice in the long run if the
two lookups followed the same rules.
So I think the three options are basically:
1. No decoding, require the user to use a consistent prefix between
config and other uses of the URL. I.e., your current patch. The
downside is that it doesn't handle any variation of input.
2. Full decoding into constituent parts. This handles canonicalization
of encoding, and also allows "wildcard" components (e.g., a URL
with username can match the generic "https://example.com" in the
config). The downside is that you cannot do a "longest prefix wins"
rule for overriding.
3. Full decoding as in (2), but then re-assemble into a canonicalized
encoded URL. The upside is that you get to do "longest prefix
wins", but you can no longer have wildcard components. I think this
is what you are suggesting in your mail.
I'm still in favor of (2), because I think the wildcard components are
important (and while I agree that the "longest prefix wins" is nicer, we
already have "last one wins" for the rest of the config, including the
credential URL matcher). But I certainly think (3) is better than (1).
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html