John Keeping <[email protected]> writes:
> When this change was originally made (0846b0c - git-remote-testpy: hash
> bytes explicitly , I didn't realised that the "hex" encoding we chose is
> a "bytes to bytes" encoding so it just fails with an error on Python 3
> in the same way as the original code.
>
> It is not possible to provide a single code path that works on Python 2
> and Python 3 since Python 2.x will attempt to decode the string before
> encoding it, which fails for strings that are not valid in the default
> encoding. Python 3.1 introduced the "surrogateescape" error handler
> which handles this correctly and permits a bytes -> unicode -> bytes
> round-trip to be lossless.
>
> At this point Python 3.0 is unsupported so we don't go out of our way to
> try to support it.
>
> Helped-by: Michael Haggerty <[email protected]>
> Signed-off-by: John Keeping <[email protected]>
> ---
Thanks; will queue and wait for an Ack from Michael.
Does the helper function need to be named with leading underscore,
though?
> On Sun, Jan 27, 2013 at 02:13:29PM +0000, John Keeping wrote:
>> On Sun, Jan 27, 2013 at 05:44:37AM +0100, Michael Haggerty wrote:
>> > So to handle all of the cases across Python versions as closely as
>> > possible to the old 2.x code, it might be necessary to make the code
>> > explicitly depend on the Python version number, like:
>> >
>> > hasher = _digest()
>> > if sys.hexversion < 0x03000000:
>> > pathbytes = repo.path
>> > elif sys.hexversion < 0x03010000:
>> > # If support for Python 3.0.x is desired (note: result can
>> > # be different in this case than under 2.x or 3.1+):
>> > pathbytes = repo.path.encode(sys.getfilesystemencoding(),
>> > 'backslashreplace')
>> > else
>> > pathbytes = repo.path.encode(sys.getfilesystemencoding(),
>> > 'surrogateescape')
>> > hasher.update(pathbytes)
>> > repo.hash = hasher.hexdigest()
>
> How about this?
>
> git-remote-testpy.py | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/git-remote-testpy.py b/git-remote-testpy.py
> index c7a04ec..16b0c52 100644
> --- a/git-remote-testpy.py
> +++ b/git-remote-testpy.py
> @@ -36,6 +36,22 @@ if sys.hexversion < 0x02000000:
> sys.stderr.write("git-remote-testgit: requires Python 2.0 or later.\n")
> sys.exit(1)
>
> +
> +def _encode_filepath(path):
> + """Encodes a Unicode file path to a byte string.
> +
> + On Python 2 this is a no-op; on Python 3 we encode the string as
> + suggested by [1] which allows an exact round-trip from the command line
> + to the filesystem.
> +
> + [1] http://docs.python.org/3/c-api/unicode.html#file-system-encoding
> +
> + """
> + if sys.hexversion < 0x03000000:
> + return path
> + return path.encode('utf-8', 'surrogateescape')
> +
> +
> def get_repo(alias, url):
> """Returns a git repository object initialized for usage.
> """
> @@ -45,7 +61,7 @@ def get_repo(alias, url):
> repo.get_head()
>
> hasher = _digest()
> - hasher.update(repo.path.encode('hex'))
> + hasher.update(_encode_filepath(repo.path))
> repo.hash = hasher.hexdigest()
>
> repo.get_base_path = lambda base: os.path.join(
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html