Seems that IMDb allows "plot summary" entries to have no author, but the
imdb/parser/http/movieParser.py's DOMHTMLPlotParser incorrectly assumes an
author string exists and tries to call replace() on the string resulting in
crashes with:
<begin excerpt>
--->Best match for "run fatboy run" is "Run Fatboy Run"
Traceback (most recent call last):
File "/home/danc/pyTivoMetaThis.py", line 597, in <module>
main()
File "/home/danc/pyTivoMetaThis.py", line 578, in main
formatMovieData(title, metadataFileName)
File "/home/danc/pyTivoMetaThis.py", line 339, in formatMovieData
objIA.update(movie)
File "/usr/lib/python2.5/site-packages/imdb/__init__.py", line 609, in
update
ret = method(mopID)
File "/usr/lib/python2.5/site-packages/imdb/parser/http/__init__.py", line
398, in get_movie_plot
return self.mProxy.plot_parser.parse(cont, getRefs=self._getRefs)
File "/usr/lib/python2.5/site-packages/imdb/parser/http/utils.py", line
675, in parse
data = self.parse_dom(html_string)
File "/usr/lib/python2.5/site-packages/imdb/parser/http/utils.py", line
773, in parse_dom
data = attr_postprocess(data)
File "/usr/lib/python2.5/site-packages/imdb/parser/http/movieParser.py",
line 1079, in <lambda>
x.get('author').replace('{', '<').replace('}', '>'),
AttributeError: 'NoneType' object has no attribute 'replace'
<end excerpt>
This happens with movies such as:
Run Fatboy Run (third plot summary)
http://www.imdb.com/title/tt0425413/plotsummary
Wanted (first plot summary)
http://www.imdb.com/title/tt0493464/plotsummary
Looking at the code, I initially thought just adding a default to the
x.get('author') would fix the problem, but that didn't work. Turns out x
has an 'author' key, for these cases, but the corresponding value in the
dictionary is actually a None object.
So here's a proposed patch including an extra tweak to deal with email
addresses that are enclosed in () rather than just {}.
<begin patch>
--- movieParser.py.orig Mon Sep 22 04:06:32 2008
+++ movieParser.py Fri Oct 10 06:59:06 2008
@@ -1055,6 +1055,14 @@
if self._is_plot_writer:
self._plot_writer += data
+def _process_plotsummary(x):
+
+ if x.get('author') is None:
+ xauthor = u'Anonymous'
+ else:
+ xauthor = x.get('author').replace('{', '<').replace('}',
'>').replace('(','<').replace(')','>')
+ xplot = x.get('plot', '').strip()
+ return u'%s::%s' % (xauthor, xplot)
class DOMHTMLPlotParser(DOMParserBase):
"""Parser for the "plot summary" page of a given movie.
@@ -1068,17 +1076,14 @@
result = pparser.parse(plot_summary_html_string)
"""
_defGetRefs = True
-
+
extractors = [Extractor(label='plot',
path="//[EMAIL PROTECTED]'plotpar']",
attrs=Attribute(key='plot',
multi=True,
path={'plot': './text()',
'author': './i/a/text()'},
- postprocess=lambda x: u'%s::%s' % (
- x.get('author').replace('{', '<').replace('}',
'>'),
- x.get('plot', '').strip())))]
-
+ postprocess=lambda x: _process_plotsummary(x)
))]
class HTMLAwardsParser(ParserBase):
"""Parser for the "awards" page of a given person or movie.
<end patch>
Thanks again for a great package.
Rdian06
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel