On Tue, 30 Jun 2020 15:56:49 +0200, Manuel Jacob wrote:
> On 2020-06-30 14:02, Yuya Nishihara wrote:
> > On Tue, 30 Jun 2020 08:45:44 +0200, Manuel Jacob wrote:
> >> # HG changeset patch
> >> # User Manuel Jacob <m...@manueljacob.de>
> >> # Date 1593435816 -7200
> >> #      Mon Jun 29 15:03:36 2020 +0200
> >> # Branch stable
> >> # Node ID 7bd48930ea77337213859f562e2fc0abd6734830
> >> # Parent  4d00ac33053273ed2b9a6431800d59df94adcfc3
> >> # EXP-Topic svn_encoding
> >> convert: correctly convert paths to UTF-8 for Subversion
> > 
> >> @@ -117,7 +151,7 @@
> >>              path = b'/' + util.normpath(path)
> >>          # Module URL is later compared with the repository URL 
> >> returned
> >>          # by svn API, which is UTF-8.
> >> -        path = encoding.tolocal(path)
> >> +        path = fs2svn(path)
> >>          path = b'file://%s' % quote(path)
> >>      return svn.core.svn_path_canonicalize(path)
> > 
> > It's better to document that geturl() may raise UnicodeDecodeError.
> 
> I think we should handle all cases where it could fail and bail out with 
> a useful warning.
> 
> * This is ideally the case for everything handled by issvnurl() (see my 
> replies on patch 6).

I know, but geturl() function itself can't handle arbitrary bytes. It's
obvious that fs2svn() may raise unicode exception, but at geturl() level,
it's getting unclear. Since bytes function doesn't generally raise unicode
exception in Mercurial codebase, I think it's better to document about the
design of geturl() function.

> >> --- a/tests/test-convert-svn-encoding.t
> >> +++ b/tests/test-convert-svn-encoding.t
> >> @@ -163,6 +163,26 @@
> >>    abort: http://localhost:$HGPORT/\xff: missing or unsupported 
> >> repository (esc)
> >>    [255]
> >> 
> >> +In Subversion, paths are Unicode (encoded as UTF-8). Therefore paths 
> >> that can't
> >> +be converted between UTF-8 and the locale encoding (which is always 
> >> ASCII in
> >> +tests) don't work.
> >> +
> >> +  $ cp -R svn-repo $XFF
> > 
> > I suspect this would fail on Windows depending on system locale.
> 
> It should be encodable by the current code page, but not be ASCII. Is 
> that even possible in general?

I don't know. I can only say that 0xff can't be decoded as *some* variant
of Shift_JISes. I have no idea how Win32API layer will handle that.
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Reply via email to