Package: urlwatch
Version: 1.11-1
Severity: normal

--- Please enter the report below this line. ---

I use the attached hooks.py script to format all html data from the
watched URLs as text.

URLwatch is run as a cron job:
#Checks URLs for changes -- see ~/.urlwatch/urls.txt
5,55 * * * *    [my username]   urlwatch

When there are no changes to the watched URLs, I get an email with the
following error message:

Traceback (most recent call last):
  File "/usr/bin/urlwatch", line 232, in <module>
    data = job.retrieve(timestamp, filter, headers)
  File "/usr/share/urlwatch/urlwatch/handler.py", line 111, in retrieve
    content_unicode = content.decode(encoding, 'ignore')
LookupError: unknown encoding:

Thank you for your help.

--- System information. ---
Architecture: amd64
Kernel:       Linux 3.2.0-4-amd64

Debian Release: 7.0
  500 testing         security.debian.org
  500 testing         mirror.csclub.uwaterloo.ca
  500 testing         debian.osuosl.org

--- Package information. ---
Depends              (Version) | Installed
==============================-+-============
python                (>= 2.4) | 2.7.3~rc2-1
python-support     (>= 0.90.0) | 1.0.15


Recommends           (Version) | Installed
==============================-+-===========
python-vobject                 | 0.8.1c-4
python-utidylib                | 0.2-8
lynx                           | 2.8.8dev.12-2


Suggests       (Version) | Installed
========================-+-===========
html2text                |
#
# Hooks file for urlwatch
#
# Adapted from the example file provided in 
# /usr/share/doc/urlwatch/hooks.py.example by urlwatch 1.11-1


# Needed for regular expression substitutions
# import re

# Additional modules installed with urlwatch
# from urlwatch import ical2txt
from urlwatch import html2txt


def filter(url, data):
  return html2txt.html2text(data, method='lynx')

Reply via email to