Im getting the same problem with max depth on wrired and bbc news, but Im
also getting a problem with bbc recipies.
I wont starve as ive got all the recipies plucked using the old pyplucker,
but i cant sem to pluck any of the bcc
recipies pages using plucker desktop, and the new pypluker.

pyplucker crashes out with the following message.

Processing http://www.bbc.co.uk/food/recipes/print/.....brown_7685.html...
  Retrieved ok.
Error:  Unknown error parsing document
http://www.bbc.co.uk/food/recipes/print/1/T/triplechocolatebrown_7685.html:
Traceback (innermost last):
  File "C:\Program Files\Plucker\PyPlucker\Parser.py", line 27, in
generic_parse
r
    parser = TextParser.StructuredHTMLParser (url, data, headers, config,
attrib
utes)
  File "C:\Program Files\Plucker\PyPlucker\TextParser.py", line 875, in
__init__

    self.feed (text)
  File "C:\Program Files\Plucker\Python\lib\sgmllib.py", line 83, in feed
    self.goahead(0)
  File "C:\Program Files\Plucker\Python\lib\sgmllib.py", line 113, in
goahead
    k = self.parse_starttag(i)
  File "C:\Program Files\Plucker\Python\lib\sgmllib.py", line 258, in
parse_star
ttag
    self.finish_starttag(tag, attrs)
  File "C:\Program Files\Plucker\Python\lib\sgmllib.py", line 292, in
finish_sta
rttag
    self.handle_starttag(tag, method, attrs)
  File "C:\Program Files\Plucker\PyPlucker\TextParser.py", line 958, in
handle_s
tarttag
    sgmllib.SGMLParser.handle_starttag(self, tag, method, attrs)
  File "C:\Program Files\Plucker\Python\lib\sgmllib.py", line 332, in
handle_sta
rttag
    method(attrs)
  File "C:\Program Files\Plucker\PyPlucker\TextParser.py", line 1128, in
do_meta

    ctype, parameters = parse_http_header_value(data[1][1])
IndexError: list index out of range
  Parsing failed.

The other thing ive noticed in plucker desktop, when plucking site from a
website url,
url pattern filter, stay on host, and stay on domain dont seem to do
anything.
(wired news again, plucking from a website url, puck level 5, stay on host
set),

Having said that plucker is great, and betas are beta for a reason :), keep
up the good work.

John

ps. running windows 2000, plucker desktop 1.2.0


Reply via email to