Re: [PATCH 6 of 6 stable] convert: handle percent-encoded bytes in file URLs like Subversion

Manuel Jacob Tue, 30 Jun 2020 06:47:01 -0700

On 2020-06-30 15:25, Manuel Jacob wrote:

On 2020-06-30 14:24, Yuya Nishihara wrote:
On Tue, 30 Jun 2020 08:45:47 +0200, Manuel Jacob wrote:
# HG changeset patch
# User Manuel Jacob <m...@manueljacob.de>
# Date 1593494609 -7200
#      Tue Jun 30 07:23:29 2020 +0200
# Branch stable
# Node ID 9915fdff8d1732ce62b6df69b50106384d4ad4d1
# Parent  e1a4c7f23e804f37c3848fc408607af916d619d1
# EXP-Topic svn_encoding
convert: handle percent-encoded bytes in file URLs like Subversion
 def issvnurl(ui, url):
     try:
         proto, path = url.split(b'://', 1)
@@ -361,7 +387,7 @@
             ):
                 path = path[:2] + b':/' + path[6:]
             try:
-                path.decode(fsencoding)
+                unicodepath = path.decode(fsencoding)
             except UnicodeDecodeError:
                 ui.warn(
                     _(
@@ -371,28 +397,17 @@
                     % pycompat.sysbytes(fsencoding)
                 )
                 return False
- # FIXME: The following reasoning and logic is wrong andwill be
-            # fixed in a following changeset.
- # pycompat.fsdecode() / pycompat.fsencode() are used sothat bytes- # in the URL roundtrip correctly on Unix.urlreq.url2pathname() on- # py3 will decode percent-encoded bytes using the utf-8encoding- # and the "replace" error handler. This means that itwill not- # preserve non-UTF-8 bytes(https://bugs.python.org/issue40983).- # url.open() uses the reverse function(urlreq.pathname2url()) and
-            # has a similar problem
- # (https://bz.mercurial-scm.org/show_bug.cgi?id=6357).It makes- # sense to solve both problems together and handle allfile URLs
-            # consistently. For now, we warn.
- unicodepath =urlreq.url2pathname(pycompat.fsdecode(path))- if pycompat.ispy3 and u'\N{REPLACEMENT CHARACTER}' inunicodepath:
+            try:
+ unicodepath =url2pathname_like_subversion(unicodepath)
+            except NonUtf8PercentEncodedBytes:
                 ui.warn(
                     _(
- b'on Python 3, we currently do not supportnon-UTF-8 '- b'percent-encoded bytes in file URLs forSubversion '
-                        b'repositories\n'
+                        b'Subversion does not support non-UTF-8 '
+                        b'percent-encoded bytes in file URLs\n'
                     )
                 )
-            path = pycompat.fsencode(unicodepath)
+                return False
+            path = unicodepath.encode(fsencoding)
I think pycompat.fsencode() is more correct since the path will belater
tested by os.path.*() functions. On Python 2, I'm not sure. Maybe our
encoding.unitolocal() is okay.
It was deliberate to use something that fails when it can’t be
encoded. However, I thought about failing in case of a logic error,
not about that the URL could contain a (UTF-8-encoded) code point that
is not available in `fsencoding`.

On Windows, Subversion will be able to handle all code points. We’re
unnecessarily restricting ourselves by doing the rest of the function
on bytes. Although not entirely Mercurial-idiomatic, we should maybe
consider doing the rest of the function in Unicode. We can also bail
out, but we have to make sure the warning mentions that it’s not the
user’s fault.

On Unix, Subversion will try later to convert the path to the locale
encoding, which fails if the code point is not encodable. We should
bail out in this case.

Let me add that with "bail out", I included that a warning describingthe problem is shown.

Since what I described is mostly about showing nicer messages, shouldthey be added in an amended patch or as follow-ups?

_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Re: [PATCH 6 of 6 stable] convert: handle percent-encoded bytes in file URLs like Subversion

Reply via email to