Package: urlwatch Version: 1.11-1 Severity: normal --- Please enter the report below this line. ---
I use the attached hooks.py script to format all html data from the watched URLs as text. URLwatch is run as a cron job: #Checks URLs for changes -- see ~/.urlwatch/urls.txt 5,55 * * * * [my username] urlwatch When there are no changes to the watched URLs, I get an email with the following error message: Traceback (most recent call last): File "/usr/bin/urlwatch", line 232, in <module> data = job.retrieve(timestamp, filter, headers) File "/usr/share/urlwatch/urlwatch/handler.py", line 111, in retrieve content_unicode = content.decode(encoding, 'ignore') LookupError: unknown encoding: Thank you for your help. --- System information. --- Architecture: amd64 Kernel: Linux 3.2.0-4-amd64 Debian Release: 7.0 500 testing security.debian.org 500 testing mirror.csclub.uwaterloo.ca 500 testing debian.osuosl.org --- Package information. --- Depends (Version) | Installed ==============================-+-============ python (>= 2.4) | 2.7.3~rc2-1 python-support (>= 0.90.0) | 1.0.15 Recommends (Version) | Installed ==============================-+-=========== python-vobject | 0.8.1c-4 python-utidylib | 0.2-8 lynx | 2.8.8dev.12-2 Suggests (Version) | Installed ========================-+-=========== html2text |
# # Hooks file for urlwatch # # Adapted from the example file provided in # /usr/share/doc/urlwatch/hooks.py.example by urlwatch 1.11-1 # Needed for regular expression substitutions # import re # Additional modules installed with urlwatch # from urlwatch import ical2txt from urlwatch import html2txt def filter(url, data): return html2txt.html2text(data, method='lynx')