Package: plucker Version: 1.8-4 Severity: important Tags: patch in the process of scooping Bruce Schneier's Cryptogram monthly newsletter, plucker generated a trace-back. actually it generated a double trace-back, but that's only because plucking something is a two stage process (1. strip extraneous html; 2. convert html to plucker), so when the first stage fails and doesn't generate the expected temporary file, the second stage fails too. so i'm only including the first trace-back below as the second trace-back is irrelevant after the first is fixed.
$ plucker-build -p"/home/corey/.sitescooper/txt/Crypto-Gram_20050316"
-s scoop
Pluckerdir is '/home/corey/.sitescooper/txt/Crypto-Gram_20050316'...
NOTE: db_file is a deprecated option. Please use the doc_file option
instead.
NOTE: db_name is a deprecated option. Please use the doc_name option
instead.
---- 0 collected, 1 to do ----
Processing plucker:/Crypto-Gram_20050316.html...
Retrieved ok.
Error: Unknown error parsing document
plucker:/Crypto-Gram_20050316.html:
Traceback (most recent call last):
File "/usr/lib/python2.3/site-packages/PyPlucker/Parser.py", line 28,
in generic_parser
parser = TextParser.StructuredHTMLParser (url, data, headers,
config, attributes)
File "/usr/lib/python2.3/site-packages/PyPlucker/TextParser.py", line
1155, in __init__
self.feed (text)
File "/usr/lib/python2.3/site-packages/_xmlplus/parsers/sgmllib.py",
line 441, in finish_starttag
self.handle_starttag(tag, method, attrs)
File "/usr/lib/python2.3/site-packages/PyPlucker/TextParser.py", line
1219, in handle_starttag
id = _list_to_dict(attrs).get('id')
File "/usr/lib/python2.3/site-packages/PyPlucker/TextParser.py", line
331, in _list_to_dict
result[string.lower (key)] = cleanup_attribute (val)
File "/usr/lib/python2.3/site-packages/PyPlucker/TextParser.py", line
356, in cleanup_attribute
content="&#%d" % val
NameError: global name 'val' is not defined
<plucker progress message & second traceback begins here>
viewing the referenced line (TextParser.py, line 356), it does contain
an undefined global variable.
i looked at plucker cvs and this bug was fixed in version 1.67 of
TextParser.py, viewable at
<http://cvs.plkr.org/index.cgi/parser/python/PyPlucker/TextParser.py>.
the previous version of TextParser.py, 1.66, was shipped in plucker 1.8,
so this bug must have been fixed since the last plucker release.
a patch to fix this specific bug, and only this bug, is attached.
the attached tar file should contain all the data files to recreate the
trace-back if anybody cares to investigate the matter more.
i've never encountered this error before, so maybe this bug only
deserves a severity of "normal", but without the patch it was impossible
to scoop & pluck Cryptogram. but the severity shouldn't matter as with
the patch this bug should be fixed and closed really quickly. ;-)
this all happened two weeks ago, but i'm just now getting around to
reporting this bug. i've scooped several web sites on a daily basis
since applying this patch two weeks ago and i haven't seen any
regressions/bugs caused by the patch.
thanks for maintaining plucker!
-- System Information:
Debian Release: 3.1
APT prefers testing
APT policy: (990, 'testing')
Architecture: i386 (i686)
Kernel: Linux 2.4.27-k7+5+new
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)
Versions of packages plucker depends on:
ii netpbm 2:10.0-8 Graphics conversion tools
ii python 2.3.5-1 An interactive high-level
object-o
ii python2.3 2.3.5-1 An interactive high-level
object-o
-- no debconf information
TextParser.py.patch
Description: Binary data
Crypto-Gram_20050316.tar.bz2
Description: Binary data

