On 2012.7.25 12:14 AM, Junio C Hamano wrote:
>> Nothing, because paths are not URI escaped. :)
>> You probably meant svn_uri_canonicalize().  And no, it does not double 
>> escape,
>> so its safe to escape as early as possible.
> Are you saying that the function assumes that a local pathname would

URI and path canonicalization are done differently and by different functions.
 svn_uri_canonicalize() vs svn_dirent_canonicalize().  Or maybe you're
referring to the path portion of the URL?  I don't think that makes a
difference for what you're asking, but its important to keep in mind.

> not have '%' in it, returns its input as-is when it sees one, and if
> the caller really needs to express a path with '%' in it, it is the
> responsibility of the caller to escape it?

It appears that if the % is followed by hex it assumes its an escape.
Otherwise it escapes it.  Thus...

   http://www.google.com/per%%nt -> http://www.google.com/per%25%25nt
   http://www.google.com/per%ant -> http://www.google.com/per%25ant
   http://www.google.com/per%cent -> http://www.google.com/per%CEnt

Which makes sense if the idea is to not double escape.

> That makes it even more confusing....

Straight out of the RFC.

    Implementations must not percent-encode or decode the same string more
    than once, as decoding an already decoded string might lead to
    misinterpreting a percent data octet as the beginning of a percent-
    encoding, or vice versa in the case of percent-encoding an already
    percent-encoded string.

It makes it far simpler to use.  You can't read the mind of the user, but its
a fair guess that they're not really thinking too deeply about how escaping
works.  It makes URI and path canonicalization safer and simpler.  Otherwise
you'd need to keep track of whether a thing was already escaped or not!  Just
begging for loads of bugs.  (If SVN were using URI and path objects they'd
just take care of it and none of this would be a problem in the first place).

This way you have no double escaping concerns.  No need to track if a thing is
already canonicalized.  Do it as often and as early as you like.  Making a
corner case a little harder is a small price to pay for making the common case
much, much easier.

This also appears to be what Firefox does.

>>    my $uri = "http://www.example.com/ foo";
>>     print SVN::_Core::svn_uri_canonicalize(
>>         SVN::_Core::svn_uri_canonicalize($uri)
>>     );
>> That produces "http://www.example.com/%20foo";.
> In other words, if your DocumentRoot was /var/www and you have a
> directory /var/www/per%cent you want to expose to the outside world,
> you have to say "http://www.example.com/per%25cent"; yourself and the
> "canonicalize" function will be an identity function?

Yes.  It can be made to work better.

There's a number of places in the code which effectively do this:

    my $full_url = $url . '/' . $path;

And I was canonicalizing them like this:

    my $full_url = canonicalize_url($url . '/' . $path);

I'd been pondering whether it would be worthwhile to have a function which
added a path to a base URL and canonicalized.  Now I see that yes, it would be
to deal with this corner case.

    my $full_url = append_path_to_url($url, $path);

That would properly URI encode any % in $path before appending and then
canonicalizing the whole thing as a URI.

I'm pretty sure the code in master doesn't handle this at all.

> I have this vague suspicion that Jonathan was asking about what your
> Git::SVN::Utils::canonicalize_path() sub does, so all of the above
> might be moot, though...

Its just a pass through to the SVN API.

44. I am not the atheist chaplain.
    -- The 213 Things Skippy Is No Longer Allowed To Do In The U.S. Army
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to