On 2020/06/16 0:18, C. Michael Pilato wrote: > On Mon, Jun 15, 2020 at 10:32 AM Yasuhito FUTATSUKI <futat...@yf.bsdclub.org> > wrote: > >> On 2020/06/15 21:38, C. Michael Pilato wrote: >> Is it needed something like this? >> >> [[[ >> Index: tools/backup/hot-backup.py.in >> =================================================================== >> --- tools/backup/hot-backup.py.in (revision 1878855) >> +++ tools/backup/hot-backup.py.in (working copy) >> @@ -218,7 +218,7 @@ >> >> if stderr_lines: >> raise Exception("Unable to find the youngest revision for repository >> '%s'" >> - ": %s" % (repo_dir, stderr_lines[0].rstrip())) >> + ": %s" % (repo_dir, >> stderr_lines[0].rstrip().decode())) >> >> return stdout_lines[0].strip().decode(); >> >> ]]] >> >> If svnlook runs on locale other than C, this can cause UnicodeDecodeError, >> but without applying it, the output from svnlook is embeded as a >> representaion >> of bytes object to the exception message, like b'...'. >> (Although I think this script assuming C locale implicitly.) >> >> So I'm not sure which is better applying this or not. >> >> 'return stdout_lines[0].strip().decode();' is okey (except an extra ';', >> but it is not critical), because stdout_lines[0] is always ascii string >> in this context. >> > > I removed a couple of stray semicolons in r1878859 -- thanks for catching > that. > > As for your question: if I force "svnlook" to create errors (by setting > the svnlook variable to "/usr/bin/svn"), today I see an error message with > the b'...' formatting. Adding .decode() as you suggested makes the b'...' > go away and I see what I'd expect to see. As far as I can tell, I'm using > the "en-US.UTF-8" locale, though -- not the "C" one. But maybe I'm just > getting lucky because the locale encoding is UTF-8 and not, say, Shift-JIS > or something? I dunno.
I confirmed that the code with .decode() work well in ja_JP.UTF-8 locale on Python 3.6 and Python 3.7 (but I got an error like "'ascii' codec can't decode byte 0xe3 in position 18: ordinal not in range(128)"). With non UTF-8, non ascii locale, .decode() without specifying encoding causes UnicodeDecodeError on 'utf-8' codecs on Python 3.6 and 3.7. Without .decode(), Python 3.6: [[[ $ env LC_MESSAGES=ja_JP.eucJP LC_CTYPE=ja_JP.eucJP /usr/local/bin/python3.6m tools/backup/hot-backup.py /home/futatuki/tmp/t /home/futatuki/tmp/svn-test/tt Beginning hot backup of '/home/futatuki/tmp/t'. Unable to find the youngest revision for repository '/home/futatuki/tmp/t': b"svnlook: E000002: \xa5\xd5\xa5\xa1\xa5\xa4\xa5\xeb '/home/futatuki/tmp/t/format' \xa4\xf2\xb3\xab\xa4\xb1\xa4\xde\xa4\xbb\xa4\xf3: \xa4\xbd\xa4\xce\xa4\xe8\xa4\xa6\xa4\xca\xa5\xd5\xa5\xa1\xa5\xa4\xa5\xeb\xa4\xde\xa4\xbf\xa4\xcf\xa5\xc7\xa5\xa3\xa5\xec\xa5\xaf\xa5\xc8\xa5\xea\xa4\xcf\xa4\xa2\xa4\xea\xa4\xde\xa4\xbb\xa4\xf3" ]]] With .decode(), Python 3.6: [[[ $ env LC_MESSAGES=ja_JP.eucJP LC_CTYPE=ja_JP.eucJP /usr/local/bin/python3.6m tools/backup/hot-backup.py /home/futatuki/tmp/t /home/futatuki/tmp/svn-test/tt Beginning hot backup of '/home/futatuki/tmp/t'. 'utf-8' codec can't decode byte 0xa5 in position 18: invalid start byte ]]] cf. With .decode(), UTF-8 locale (Japanese): (This is also what we want to see on ja_JP.eucJP terminal with ja_JP.eucJP locale.) [[[ $ env LC_MESSAGES=ja_JP.UTF-8 LC_CTYPE=ja_JP.UTF-8 /usr/local/bin/python3.6m tools/backup/hot-backup.py /home/futatuki/tmp/t /home/futatuki/tmp/svn-test/tt Beginning hot backup of '/home/futatuki/tmp/t'. Unable to find the youngest revision for repository '/home/futatuki/tmp/t': svnlook: E000002: ファイル '/home/futatuki/tmp/t/format' を開けません: そのようなファイルまたはディレクトリはありません ]]] So if we want to allow this script run on non UTF-8, non ascii locale with Python 3, it needs additional code to set encoding. Cheers, -- Yasuhito FUTATSUKI <futat...@yf.bsdclub.org>