Package: git-buildpackage
Version: 0.9.9
Severity: normal
Tags: patch
I ran into the following issue when importing the history of a Debian
package:
> gbp import-dscs --debsnap wireless-tools
gbp:info: Downloading snapshots of 'wireless-tools' to '/tmp/tmpkzi1rlbg'...
gbp:info: No git repository found, creating one.
Traceback (most recent call last):
File "/usr/bin/gbp", line 149, in <module>
sys.exit(supercommand())
File "/usr/bin/gbp", line 145, in supercommand
return module.main(args)
File "/usr/lib/python3/dist-packages/gbp/scripts/import_dscs.py", line 180,
in main
if importer.importdsc(dscs[0]):
File "/usr/lib/python3/dist-packages/gbp/scripts/import_dscs.py", line 72, in
importdsc
return import_dsc.main(['import-dsc'] + self.args + [dsc.dscfile])
File "/usr/lib/python3/dist-packages/gbp/scripts/import_dsc.py", line 518, in
main
apply_debian_patch(repo, source, dsc, commit, options)
File "/usr/lib/python3/dist-packages/gbp/scripts/import_dsc.py", line 174, in
apply_debian_patch
author = get_author_from_changelog(source.unpacked)
File "/usr/lib/python3/dist-packages/gbp/scripts/import_dsc.py", line 114, in
get_author_from_changelog
dch = ChangeLog(filename=os.path.join(dir, 'debian/changelog'))
File "/usr/lib/python3/dist-packages/gbp/deb/changelog.py", line 89, in
__init__
self._read()
File "/usr/lib/python3/dist-packages/gbp/deb/changelog.py", line 132, in _read
self._contents = f.read()
File "/usr/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 906:
invalid start byte
This happened while it was importing version 23-2 (see
http://snapshot.debian.org/package/wireless-tools/23-2/). The changelog
back then was in ISO-8859-1. I've attached a patch that treats invalid
UTF-8 files as ISO-8859-1.
-- System Information:
Debian Release: buster/sid
APT prefers unstable
APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 4.15.2 (SMP w/12 CPU cores)
Locale: LANG=nl_NL.utf8, LC_CTYPE=nl_NL.utf8 (charmap=UTF-8),
LANGUAGE=nl_NL.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages git-buildpackage depends on:
ii devscripts 2.18.2
ii git 1:2.17.0-1
ii man-db 2.8.3-2
ii python3 3.6.5-3
ii python3-dateutil 2.6.1-1
ii python3-pkg-resources 39.1.0-1
Versions of packages git-buildpackage recommends:
ii cowbuilder 0.87+b1
ii pbuilder 0.229.2
ii pristine-tar 1.44
ii python3-requests 2.18.4-2
Versions of packages git-buildpackage suggests:
pn python3-notify2 <none>
ii sudo 1.8.23-1
ii unzip 6.0-21
-- no debconf information
>From 48bc76b8a5294098548ef8c6b10e0f25b718fddf Mon Sep 17 00:00:00 2001
From: Guus Sliepen <[email protected]>
Date: Tue, 5 Jun 2018 21:41:28 +0200
Subject: [PATCH] Treat changelogs with invalid UTF-8 sequences as ISO-8859-1.
This allows import-dscs to import old versions of a package that did not
yet use UTF-8 encoding.
---
gbp/deb/changelog.py | 8 ++++++--
gbp/git/vfs.py | 5 ++++-
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/gbp/deb/changelog.py b/gbp/deb/changelog.py
index 5cfaaf79..dda9b753 100644
--- a/gbp/deb/changelog.py
+++ b/gbp/deb/changelog.py
@@ -128,8 +128,12 @@ class ChangeLog(object):
self._cp = cp
def _read(self):
- with open(self.filename, encoding='utf-8') as f:
- self._contents = f.read()
+ try:
+ with open(self.filename, encoding='utf-8') as f:
+ self._contents = f.read()
+ except UnicodeDecodeError:
+ with open(self.filename, encoding='iso-8859-1') as f:
+ self._contents = f.read()
def __getitem__(self, item):
return self._cp[item]
diff --git a/gbp/git/vfs.py b/gbp/git/vfs.py
index 8363f77b..ec47201a 100644
--- a/gbp/git/vfs.py
+++ b/gbp/git/vfs.py
@@ -33,7 +33,10 @@ class GitVfs(object):
if binary:
self._data = io.BytesIO(content)
else:
- self._data = io.StringIO(content.decode())
+ try:
+ self._data = io.StringIO(content.decode())
+ except UnicodeDecodeError:
+ self._data = io.StringIO(content.decode("iso-8859-1"))
def readline(self):
return self._data.readline()
--
2.17.0