Package: plucker
Version: 1.8-4
Severity: important
Tags: patch

in the process of scooping Bruce Schneier's Cryptogram monthly
newsletter, plucker generated a trace-back.  actually it generated a
double trace-back, but that's only because plucking something is a two
stage process (1. strip extraneous html; 2. convert html to plucker), so
when the first stage fails and doesn't generate the expected temporary
file, the second stage fails too.  so i'm only including the first
trace-back below as the second trace-back is irrelevant after the first
is fixed.

$ plucker-build  -p"/home/corey/.sitescooper/txt/Crypto-Gram_20050316"
-s scoop
Pluckerdir is '/home/corey/.sitescooper/txt/Crypto-Gram_20050316'...
NOTE: db_file is a deprecated option. Please use the doc_file option
instead.
NOTE: db_name is a deprecated option. Please use the doc_name option
instead.
---- 0 collected, 1 to do ----
Processing plucker:/Crypto-Gram_20050316.html...
  Retrieved ok.
Error:  Unknown error parsing document
plucker:/Crypto-Gram_20050316.html:
Traceback (most recent call last):
  File "/usr/lib/python2.3/site-packages/PyPlucker/Parser.py", line 28,
in generic_parser
    parser = TextParser.StructuredHTMLParser (url, data, headers,
config, attributes)
  File "/usr/lib/python2.3/site-packages/PyPlucker/TextParser.py", line
1155, in __init__
    self.feed (text)
  File "/usr/lib/python2.3/site-packages/_xmlplus/parsers/sgmllib.py",
line 441, in finish_starttag
    self.handle_starttag(tag, method, attrs)
  File "/usr/lib/python2.3/site-packages/PyPlucker/TextParser.py", line
1219, in handle_starttag
    id = _list_to_dict(attrs).get('id')
  File "/usr/lib/python2.3/site-packages/PyPlucker/TextParser.py", line
331, in _list_to_dict
    result[string.lower (key)] = cleanup_attribute (val)
  File "/usr/lib/python2.3/site-packages/PyPlucker/TextParser.py", line
356, in cleanup_attribute
    content="&#%d" % val
NameError: global name 'val' is not defined
<plucker progress message & second traceback begins here>

viewing the referenced line (TextParser.py, line 356), it does contain
an undefined global variable.

i looked at plucker cvs and this bug was fixed in version 1.67 of
TextParser.py, viewable at
<http://cvs.plkr.org/index.cgi/parser/python/PyPlucker/TextParser.py>. 
the previous version of TextParser.py, 1.66, was shipped in plucker 1.8,
so this bug must have been fixed since the last plucker release.

a patch to fix this specific bug, and only this bug, is attached.

the attached tar file should contain all the data files to recreate the
trace-back if anybody cares to investigate the matter more.

i've never encountered this error before, so maybe this bug only
deserves a severity of "normal", but without the patch it was impossible
to scoop & pluck Cryptogram.  but the severity shouldn't matter as with
the patch this bug should be fixed and closed really quickly. ;-)

this all happened two weeks ago, but i'm just now getting around to
reporting this bug.  i've scooped several web sites on a daily basis
since applying this patch two weeks ago and i haven't seen any
regressions/bugs caused by the patch.

thanks for maintaining plucker!

-- System Information:
Debian Release: 3.1
  APT prefers testing
  APT policy: (990, 'testing')
Architecture: i386 (i686)
Kernel: Linux 2.4.27-k7+5+new
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)

Versions of packages plucker depends on:
ii  netpbm                        2:10.0-8   Graphics conversion tools
ii  python                        2.3.5-1    An interactive high-level
object-o
ii  python2.3                     2.3.5-1    An interactive high-level
object-o

-- no debconf information

Attachment: TextParser.py.patch
Description: Binary data

Attachment: Crypto-Gram_20050316.tar.bz2
Description: Binary data

Reply via email to