On 2020/06/16 0:18, C. Michael Pilato wrote:
> On Mon, Jun 15, 2020 at 10:32 AM Yasuhito FUTATSUKI <futat...@yf.bsdclub.org>
> wrote:
> 
>> On 2020/06/15 21:38, C. Michael Pilato wrote:
>> Is it needed something like this?
>>
>> [[[
>> Index: tools/backup/hot-backup.py.in
>> ===================================================================
>> --- tools/backup/hot-backup.py.in       (revision 1878855)
>> +++ tools/backup/hot-backup.py.in       (working copy)
>> @@ -218,7 +218,7 @@
>>
>>    if stderr_lines:
>>      raise Exception("Unable to find the youngest revision for repository
>> '%s'"
>> -                    ": %s" % (repo_dir, stderr_lines[0].rstrip()))
>> +                    ": %s" % (repo_dir,
>> stderr_lines[0].rstrip().decode()))
>>
>>    return stdout_lines[0].strip().decode();
>>
>> ]]]
>>
>> If svnlook runs on locale other than C, this can cause UnicodeDecodeError,
>> but without applying it, the output from svnlook is embeded as a
>> representaion
>> of bytes object to the exception message, like b'...'.
>> (Although I think this script assuming C locale implicitly.)
>>
>> So I'm not sure which is better applying this or not.
>>
>> 'return stdout_lines[0].strip().decode();' is okey (except an extra ';',
>> but it is not critical), because stdout_lines[0] is always ascii string
>> in this context.
>>
> 
> I removed a couple of stray semicolons in  r1878859 -- thanks for catching
> that.
> 
> As for your question:  if I force "svnlook" to create errors (by setting
> the svnlook variable to "/usr/bin/svn"), today I see an error message with
> the b'...' formatting.  Adding .decode() as you suggested makes the b'...'
> go away and I see what I'd expect to see.  As far as I can tell, I'm using
> the "en-US.UTF-8" locale, though -- not the "C" one.  But maybe I'm just
> getting lucky because the locale encoding is UTF-8 and not, say, Shift-JIS
> or something?  I dunno.

I confirmed that the code with .decode() work well in ja_JP.UTF-8 locale
on Python 3.6 and Python 3.7 (but I got an error like "'ascii' codec can't
decode byte 0xe3 in position 18: ordinal not in range(128)").

With non UTF-8, non ascii locale, .decode() without specifying encoding
causes UnicodeDecodeError on 'utf-8' codecs on Python 3.6 and 3.7.

Without .decode(), Python 3.6:
[[[
$ env LC_MESSAGES=ja_JP.eucJP LC_CTYPE=ja_JP.eucJP /usr/local/bin/python3.6m 
tools/backup/hot-backup.py /home/futatuki/tmp/t /home/futatuki/tmp/svn-test/tt
Beginning hot backup of '/home/futatuki/tmp/t'.
Unable to find the youngest revision for repository '/home/futatuki/tmp/t': 
b"svnlook: E000002: \xa5\xd5\xa5\xa1\xa5\xa4\xa5\xeb 
'/home/futatuki/tmp/t/format' \xa4\xf2\xb3\xab\xa4\xb1\xa4\xde\xa4\xbb\xa4\xf3: 
\xa4\xbd\xa4\xce\xa4\xe8\xa4\xa6\xa4\xca\xa5\xd5\xa5\xa1\xa5\xa4\xa5\xeb\xa4\xde\xa4\xbf\xa4\xcf\xa5\xc7\xa5\xa3\xa5\xec\xa5\xaf\xa5\xc8\xa5\xea\xa4\xcf\xa4\xa2\xa4\xea\xa4\xde\xa4\xbb\xa4\xf3"
]]]

With .decode(), Python 3.6:
[[[
$ env LC_MESSAGES=ja_JP.eucJP LC_CTYPE=ja_JP.eucJP /usr/local/bin/python3.6m 
tools/backup/hot-backup.py /home/futatuki/tmp/t /home/futatuki/tmp/svn-test/tt
Beginning hot backup of '/home/futatuki/tmp/t'.
'utf-8' codec can't decode byte 0xa5 in position 18: invalid start byte
]]]

cf. With .decode(), UTF-8 locale (Japanese):
(This is also what we want to see on ja_JP.eucJP terminal with ja_JP.eucJP 
locale.)
[[[
$ env LC_MESSAGES=ja_JP.UTF-8 LC_CTYPE=ja_JP.UTF-8 /usr/local/bin/python3.6m 
tools/backup/hot-backup.py /home/futatuki/tmp/t /home/futatuki/tmp/svn-test/tt
Beginning hot backup of '/home/futatuki/tmp/t'.
Unable to find the youngest revision for repository '/home/futatuki/tmp/t': 
svnlook: E000002: ファイル '/home/futatuki/tmp/t/format' を開けません: 
そのようなファイルまたはディレクトリはありません
]]]

So if we want to allow this script run on non UTF-8, non ascii locale
with Python 3, it needs additional code to set encoding.

Cheers,
-- 
Yasuhito FUTATSUKI <futat...@yf.bsdclub.org>

Reply via email to