Seems that IMDb allows "plot summary" entries to have no author, but the
imdb/parser/http/movieParser.py's DOMHTMLPlotParser incorrectly assumes an
author string exists and tries to call replace() on the string resulting in
crashes with:

<begin excerpt>

--->Best match for "run fatboy run" is "Run Fatboy Run"
Traceback (most recent call last):
  File "/home/danc/pyTivoMetaThis.py", line 597, in <module>
    main()
  File "/home/danc/pyTivoMetaThis.py", line 578, in main
    formatMovieData(title, metadataFileName)
  File "/home/danc/pyTivoMetaThis.py", line 339, in formatMovieData
    objIA.update(movie)
  File "/usr/lib/python2.5/site-packages/imdb/__init__.py", line 609, in
update
    ret = method(mopID)
  File "/usr/lib/python2.5/site-packages/imdb/parser/http/__init__.py", line
398, in get_movie_plot
    return self.mProxy.plot_parser.parse(cont, getRefs=self._getRefs)
  File "/usr/lib/python2.5/site-packages/imdb/parser/http/utils.py", line
675, in parse
    data = self.parse_dom(html_string)
  File "/usr/lib/python2.5/site-packages/imdb/parser/http/utils.py", line
773, in parse_dom
    data = attr_postprocess(data)
  File "/usr/lib/python2.5/site-packages/imdb/parser/http/movieParser.py",
line 1079, in <lambda>
    x.get('author').replace('{', '<').replace('}', '>'),
AttributeError: 'NoneType' object has no attribute 'replace'

<end excerpt>

This happens with movies such as:

Run Fatboy Run (third plot summary)
http://www.imdb.com/title/tt0425413/plotsummary

Wanted (first plot summary)
http://www.imdb.com/title/tt0493464/plotsummary

Looking at the code, I initially thought just adding a default to the
x.get('author') would fix the problem, but that didn't work.  Turns out x
has an 'author' key, for these cases, but the corresponding value in the
dictionary is actually a None object.

So here's a proposed patch including an extra tweak to deal with email
addresses that are enclosed in () rather than just {}.

<begin patch>

--- movieParser.py.orig    Mon Sep 22 04:06:32 2008
+++ movieParser.py    Fri Oct 10 06:59:06 2008
@@ -1055,6 +1055,14 @@
         if self._is_plot_writer:
             self._plot_writer += data

+def _process_plotsummary(x):
+
+    if x.get('author') is None:
+        xauthor = u'Anonymous'
+    else:
+        xauthor = x.get('author').replace('{', '<').replace('}',
'>').replace('(','<').replace(')','>')
+    xplot = x.get('plot', '').strip()
+    return u'%s::%s' % (xauthor, xplot)

 class DOMHTMLPlotParser(DOMParserBase):
     """Parser for the "plot summary" page of a given movie.
@@ -1068,17 +1076,14 @@
         result = pparser.parse(plot_summary_html_string)
     """
     _defGetRefs = True
-
+
     extractors = [Extractor(label='plot',
                     path="//[EMAIL PROTECTED]'plotpar']",
                     attrs=Attribute(key='plot',
                             multi=True,
                             path={'plot': './text()',
                                 'author': './i/a/text()'},
-                            postprocess=lambda x: u'%s::%s' % (
-                            x.get('author').replace('{', '<').replace('}',
'>'),
-                            x.get('plot', '').strip())))]
-
+                            postprocess=lambda x: _process_plotsummary(x)
))]

 class HTMLAwardsParser(ParserBase):
     """Parser for the "awards" page of a given person or movie.

<end patch>

Thanks again for a great package.

Rdian06
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Reply via email to