Source: python-popcon
Version: 1.3
Severity: serious
Tags: patch
X-Debbugs-Cc: [email protected]
Hi,
Since (at least) 07-Aug-2016 14:04, the all-popcon-results.txt.gz file
contains an invalid line:
Package: libfyba0-dbg 0 2 0 0
--> Package: libf erdp-ommon 0 0 0 1
Package: libg++2.8.1.3-dbg 0 0 0 1
That's (currently) line 54869. The "space" on the second line is actually
a 0xa0 character. This is making the package unusable.
$ python3
Python 3.5.2+ (default, Aug 5 2016, 08:07:14)
[GCC 6.1.1 20160724] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import popcon
>>> popcon.package("foo")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/popcon.py", line 145, in package
raw = package_raw(*packages)
File "/usr/lib/python3/dist-packages/popcon.py", line 189, in package_raw
data = _fetch()
File "/usr/lib/python3/dist-packages/popcon.py", line 108, in _fetch
txt = _decompress(txt)
File "/usr/lib/python3/dist-packages/popcon.py", line 134, in _decompress
data = data.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 3574857:
invalid start byte
You can see this happening here:
https://jenkins.debian.net/view/reproducible/view/problems/job/reproducible_html_notes/
Patch attached. NB. the comment in the lookup section.
Regards,
--
,''`.
: :' : Chris Lamb
`. `'` [email protected] / chris-lamb.co.uk
`-
diff --git a/src/popcon.py b/src/popcon.py
index 4f089f0..670c514 100644
--- a/src/popcon.py
+++ b/src/popcon.py
@@ -115,7 +115,7 @@ def _parse(results):
results = results.splitlines()
for line in results:
elems = line.split()
- if elems[0] != "Package:":
+ if elems[0] != b"Package:":
continue
ans[elems[1]] = Package(*(int(i) for i in elems[2:]))
return ans
@@ -131,7 +131,6 @@ def _decompress(compressed):
gzippedstream = io.BytesIO(compressed)
gzipper = gzip.GzipFile(fileobj=gzippedstream)
data = gzipper.read()
- data = data.decode()
return data
@@ -206,8 +205,11 @@ def package_raw(*packages):
cached_timestamp = time.time()
ans = dict()
for pkg in packages:
- if pkg in data:
- ans[pkg] = data[pkg]
+ # Lookup using bytestrings, but always index results by the original so
+ # that callsites can look it up.
+ lookup = pkg if isinstance(pkg, bytes) else pkg.encode('utf-8')
+ if lookup in data:
+ ans[pkg] = data[lookup]
if KEEP_DATA:
cached_data = data
return ans