On 2020-06-30 15:25, Manuel Jacob wrote:
On 2020-06-30 14:24, Yuya Nishihara wrote:
On Tue, 30 Jun 2020 08:45:47 +0200, Manuel Jacob wrote:
# HG changeset patch
# User Manuel Jacob <m...@manueljacob.de>
# Date 1593494609 -7200
#      Tue Jun 30 07:23:29 2020 +0200
# Branch stable
# Node ID 9915fdff8d1732ce62b6df69b50106384d4ad4d1
# Parent  e1a4c7f23e804f37c3848fc408607af916d619d1
# EXP-Topic svn_encoding
convert: handle percent-encoded bytes in file URLs like Subversion

 def issvnurl(ui, url):
     try:
         proto, path = url.split(b'://', 1)
@@ -361,7 +387,7 @@
             ):
                 path = path[:2] + b':/' + path[6:]
             try:
-                path.decode(fsencoding)
+                unicodepath = path.decode(fsencoding)
             except UnicodeDecodeError:
                 ui.warn(
                     _(
@@ -371,28 +397,17 @@
                     % pycompat.sysbytes(fsencoding)
                 )
                 return False
- # FIXME: The following reasoning and logic is wrong and will be
-            # fixed in a following changeset.
- # pycompat.fsdecode() / pycompat.fsencode() are used so that bytes - # in the URL roundtrip correctly on Unix. urlreq.url2pathname() on - # py3 will decode percent-encoded bytes using the utf-8 encoding - # and the "replace" error handler. This means that it will not - # preserve non-UTF-8 bytes (https://bugs.python.org/issue40983). - # url.open() uses the reverse function (urlreq.pathname2url()) and
-            # has a similar problem
- # (https://bz.mercurial-scm.org/show_bug.cgi?id=6357). It makes - # sense to solve both problems together and handle all file URLs
-            # consistently. For now, we warn.
- unicodepath = urlreq.url2pathname(pycompat.fsdecode(path)) - if pycompat.ispy3 and u'\N{REPLACEMENT CHARACTER}' in unicodepath:
+            try:
+ unicodepath = url2pathname_like_subversion(unicodepath)
+            except NonUtf8PercentEncodedBytes:
                 ui.warn(
                     _(
- b'on Python 3, we currently do not support non-UTF-8 ' - b'percent-encoded bytes in file URLs for Subversion '
-                        b'repositories\n'
+                        b'Subversion does not support non-UTF-8 '
+                        b'percent-encoded bytes in file URLs\n'
                     )
                 )
-            path = pycompat.fsencode(unicodepath)
+                return False
+            path = unicodepath.encode(fsencoding)

I think pycompat.fsencode() is more correct since the path will be later
tested by os.path.*() functions. On Python 2, I'm not sure. Maybe our
encoding.unitolocal() is okay.

It was deliberate to use something that fails when it can’t be
encoded. However, I thought about failing in case of a logic error,
not about that the URL could contain a (UTF-8-encoded) code point that
is not available in `fsencoding`.

On Windows, Subversion will be able to handle all code points. We’re
unnecessarily restricting ourselves by doing the rest of the function
on bytes. Although not entirely Mercurial-idiomatic, we should maybe
consider doing the rest of the function in Unicode. We can also bail
out, but we have to make sure the warning mentions that it’s not the
user’s fault.

On Unix, Subversion will try later to convert the path to the locale
encoding, which fails if the code point is not encodable. We should
bail out in this case.

Let me add that with "bail out", I included that a warning describing the problem is shown.

Since what I described is mostly about showing nicer messages, should they be added in an amended patch or as follow-ups?
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Reply via email to