Re: [Imdbpy-devel] Unable to get IMDb for specific entry

Jesper Noehr Sun, 06 May 2007 04:48:11 -0700

Was a bug.

IMDb will redirect you to the proper page, if you specify HTTP/1.1.  
urllib, uses HTTP/0.9 and HTTP/1.0, hence you'll get the search page.  
Very strange.


I've written two patches that fixes this.

--- /root/IMDbPY-3.0/build/lib.linux-i686-2.4/imdb/__init__.py   
2007-05-03 13:10:02.000000000 +0000
+++ __init__.py 2007-05-06 11:44:28.000000000 +0000
@@ -345,7 +345,11 @@
              title = title.encode('utf-8')
          params = 'q=%s;s=pt' % str(urllib.quote_plus(title))
          content = self._searchIMDb(params)
-        if not content: return None
+        if content.find("<h2>Popular Results</h2>") > 0:
+            params = 's=all&q=%s' % str(urllib.quote_plus(title))
+            content = self._searchIMDb(params)
+        if not content:
+                return None
          from imdb.parser.http.searchMovieParser import  
BasicMovieParser
          mparser = BasicMovieParser()
          result = mparser.parse(content)

(looks for more results with a different query string, in case the  
search page is presented on the first try)

...and now using urllib2 instead of urllib (might have bugs in it!);

--- /root/IMDbPY-3.0/build/lib.linux-i686-2.4/imdb/parser/http/ 
__init__.py      2007-05-03 13:10:02.000000000 +0000
+++ parser/http/__init__.py     2007-05-06 11:34:34.000000000 +0000
@@ -68,11 +68,18 @@

+import urllib2
-class IMDbURLopener(FancyURLopener):
+class IMDbURLopener:
      """Fetch web pages and handle errors."""
      def __init__(self, *args, **kwargs):
-        FancyURLopener.__init__(self, *args, **kwargs)
+        self.urlOpener = urllib2.build_opener()
+
+        # Manually subclass needed methods to self.urlOpener instead.
+        self.addheaders = self.urlOpener.addheaders
+        self.open = self.urlOpener.open
+        self.close = self.urlOpener.close
+
          # XXX: IMDb's web server doesn't like urllib-based programs,
          #      so lets fake to be Mozilla.
          #      Wow!  I'm shocked by my total lack of ethic! <g>
@@ -89,7 +96,7 @@
          encode = None
          try:
              if size != -1:
-                self.addheader('Range', 'bytes=0-%d' % size)
+                self.addheaders('Range', 'bytes=0-%d' % size)
              uopener = self.open(url)
              content = uopener.read(size=size)
              server_encode = uopener.info().getparam('charset')


On May 6, 2007, at 11:44 AM, Jesper Noehr wrote:

> Bug?
>
> Using the SQL access system, get_imdbID('445950') always returns
> None. Even though it has an entry on IMDb: http://imdb.com/title/
> tt0375154/
>
> Is it because of the '+' in the title?
>
> I tried fixing it in the code, but it's just too obscure for me to
> grasp on a Sunday morning :-)
>
> -- 
> Jesper
>
> ---------------------------------------------------------------------- 
> ---
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> Imdbpy-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/imdbpy-devel


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Imdbpy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Re: [Imdbpy-devel] Unable to get IMDb for specific entry

Reply via email to