IMDb seems to have changed their pages slightly causing movieParser.py to include trailing junk characters in the "plot summary".
For instance with "Nothing Like the Holidays" ( http://www.imdb.com/title/tt1151915/), the "plot summary" ends up being: <begin> A Puerto Rican family living in the area of Humboldt Park in west Chicago face what may be their last Christmas together. | ยป <end> I tracked this down to the following code which just deals with the | character. Extractor(label='h5sections', path="//d...@class='info']/h5/..", attrs=[ Attribute(key="plot summary", path="./h5[starts-with(text(), " \ "'Plot:')]/../div/text()", postprocess=lambda x: \ x.strip().rstrip('|').rstrip()), Changing the postprocess to the following fixes the problem by looking for the "RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK" in addition to the | : x.strip().rstrip(u'| \u00BB').rstrip()),
------------------------------------------------------------------------------
_______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel