** Changed in: duplicity
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to duplicity in Ubuntu.
https://bugs.launchpad.net/bugs/1893481
Title:
UnicodeEncodeError when logging improperly encoded filenames
Status in Duplicity:
Fix Released
Status in duplicity package in Ubuntu:
Fix Committed
Bug description:
Attempts to log messages which contain unicode surrogate characters cause
exceptions.
(These surrogate characters arise, for example, when handling files whose
names are not properly encoded as UTF-8.)
NOTE: I have no idea whether this is an issue when running on python
2. (If it is, the fixes suggested below probably won't work.)
Duplicity version: 0.8.15
Python version: 3.8.5
Target filesystem: Linux
Example log output:
--- Logging error ---
Traceback (most recent call last):
File "/opt/Python-3.8.5/lib/python3.8/logging/__init__.py", line 1084, in
emit
stream.write(msg + self.terminator)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position
45: surrogates not allowed
Call stack:
File "/root/.local/pipx/venvs/duplicity/bin/duplicity", line 104, in
<module>
with_tempdir(main)
File "/root/.local/pipx/venvs/duplicity/bin/duplicity", line 90, in
with_tempdir
fn()
File
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/dup_main.py",
line 1531, in main
do_backup(action)
File
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/dup_main.py",
line 1655, in do_backup
full_backup(col_stats)
File
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/dup_main.py",
line 559, in full_backup
bytes_written = write_multivol(u"full", tarblock_iter,
File
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/dup_main.py",
line 417, in write_multivol
at_end = gpg.GPGWriteFile(tarblock_iter, tdp.name, config.gpg_profile,
File
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/gpg.py",
line 390, in GPGWriteFile
data = block_iter.__next__().data
File
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/diffdir.py",
line 544, in __next__
result = self.process(next(self.input_iter)) # pylint:
disable=assignment-from-no-return
File
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/diffdir.py",
line 238, in get_delta_iter
log_delta_path(delta_path, new_path, stats)
File
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/diffdir.py",
line 181, in log_delta_path
log.Info(_(u"A %s") %
File
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/log.py",
line 128, in Info
Log(s, INFO, code, extra)
File
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/log.py",
line 91, in Log
_logger.log(DupToLoggerLevel(verb_level), s,
Message: 'A home/dairiki/PRCS/junk-changelog/22_Senaste\udcc4nd,v'
Arguments: ()
Steps to reproduce:
- Have a file with funny characters in its name, encoded in latin-1 encoding.
E.g. a file whose name is "Fü" encoded to latin-1 (b'F\xfc'). When duplicity
handles this file, the improperly encoded character will be replaced with a
unicode surrogate character.
- Attempt to create an archive containing this file, with verbosity set to 5.
Duplicity will try to log each file processed. When it gets to this file, an
exception will be reported (and the file will not make it into the archive.)
Alternative steps to produce:
- If the archive is created with verbosity less than 5, the file will make it
into the archive. However, if an attempt is made to list files using
'duplicity list-current-files', an exception will be reported when it gets to
the file with the funny name.
Workaround
==========
A simple workaround is to set the environment variable
PYTHONIOENCODING="utf-8:surrogateescape" before running duplicity.
This will set the encoding error mode for stdout and stderr to
'surrogateescape' (by default it is 'strict') with the effect that any
surrogates will be replaced with the unicode replacement character
(U+FFFD: "�").
Possible Fix
============
A possible fix, at least for Py3K, is probably for duplicity to explicitly
set the encoding error strategy for stdin and stdout.
For python >= 3.7 this is simple:
sys.stdin.reconfigure(errors='surrogateescape')
sys.stderr.reconfigure(errors='surrogateescape')
For earlier pythons (>= 3), the best option might be:
sys.stdin = codecs.getwriter('utf-8')(sys.stdin.detach(),
'surrogateescape')
(and similarly for stderr)
Note that python 2 doesn't know about errors='surrogateescape'.
Errors='replace' would probably work as an alternative, but it's not
ideal as it replaces the surrogates with a plain question mark rather
than a unicode replacement character.
Possible Similar Issue
======================
I didn't actually verify that this fails, but it appears that there
might be a similar issue when using the --log-fd command line option.
Function duplicity.log.add_fd() does a:
handler = logging.StreamHandler(os.fdopen(fd, u'w'))
In Python 3 os.fdopen (an alias for open) opens the stream with
errors='strict' by default.
handler = logging.StreamHandler(os.fdopen(fd, u'w',
errors='surrogateescape'))
or
handler = logging.StreamHandler(open(fd, u'w',
errors='surrogateescape'))
is probably a better choice. (But neither will work in python 2.)
To manage notifications about this bug go to:
https://bugs.launchpad.net/duplicity/+bug/1893481/+subscriptions
--
Mailing list: https://launchpad.net/~desktop-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~desktop-packages
More help : https://help.launchpad.net/ListHelp