Was a bug.
IMDb will redirect you to the proper page, if you specify HTTP/1.1.
urllib, uses HTTP/0.9 and HTTP/1.0, hence you'll get the search page.
Very strange.
I've written two patches that fixes this.
--- /root/IMDbPY-3.0/build/lib.linux-i686-2.4/imdb/__init__.py
2007-05-03 13:10:02.000000000 +0000
+++ __init__.py 2007-05-06 11:44:28.000000000 +0000
@@ -345,7 +345,11 @@
title = title.encode('utf-8')
params = 'q=%s;s=pt' % str(urllib.quote_plus(title))
content = self._searchIMDb(params)
- if not content: return None
+ if content.find("<h2>Popular Results</h2>") > 0:
+ params = 's=all&q=%s' % str(urllib.quote_plus(title))
+ content = self._searchIMDb(params)
+ if not content:
+ return None
from imdb.parser.http.searchMovieParser import
BasicMovieParser
mparser = BasicMovieParser()
result = mparser.parse(content)
(looks for more results with a different query string, in case the
search page is presented on the first try)
...and now using urllib2 instead of urllib (might have bugs in it!);
--- /root/IMDbPY-3.0/build/lib.linux-i686-2.4/imdb/parser/http/
__init__.py 2007-05-03 13:10:02.000000000 +0000
+++ parser/http/__init__.py 2007-05-06 11:34:34.000000000 +0000
@@ -68,11 +68,18 @@
+import urllib2
-class IMDbURLopener(FancyURLopener):
+class IMDbURLopener:
"""Fetch web pages and handle errors."""
def __init__(self, *args, **kwargs):
- FancyURLopener.__init__(self, *args, **kwargs)
+ self.urlOpener = urllib2.build_opener()
+
+ # Manually subclass needed methods to self.urlOpener instead.
+ self.addheaders = self.urlOpener.addheaders
+ self.open = self.urlOpener.open
+ self.close = self.urlOpener.close
+
# XXX: IMDb's web server doesn't like urllib-based programs,
# so lets fake to be Mozilla.
# Wow! I'm shocked by my total lack of ethic! <g>
@@ -89,7 +96,7 @@
encode = None
try:
if size != -1:
- self.addheader('Range', 'bytes=0-%d' % size)
+ self.addheaders('Range', 'bytes=0-%d' % size)
uopener = self.open(url)
content = uopener.read(size=size)
server_encode = uopener.info().getparam('charset')
On May 6, 2007, at 11:44 AM, Jesper Noehr wrote:
> Bug?
>
> Using the SQL access system, get_imdbID('445950') always returns
> None. Even though it has an entry on IMDb: http://imdb.com/title/
> tt0375154/
>
> Is it because of the '+' in the title?
>
> I tried fixing it in the code, but it's just too obscure for me to
> grasp on a Sunday morning :-)
>
> --
> Jesper
>
> ----------------------------------------------------------------------
> ---
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> Imdbpy-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/imdbpy-devel
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Imdbpy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel