Re: [Imdbpy-devel] Unable to get IMDb for specific entry

Jesper Noehr Mon, 07 May 2007 23:42:21 -0700

On May 7, 2007, at 10:46 PM, Davide Alberani wrote:

> On May 07, Davide Alberani <[EMAIL PROTECTED]> wrote:
>
>> No, these was just some ideas: I've done it and the problem
>> persists, so it looks like you've spotted the bug correctly.
>
> In the CVS there is the first implementation of the switch from
> urllib to urllib2, based on your patch and hints.
>
> I've not changed the get_imdbID method, so far.
>
> I've done a fast test, and moving from urllib to urllib2 doesn't
> seem to have fixed the problem with titles with "+"; maybe there is
> something wrong in my code, or maybe the switch from HTTP/1.0 to
> HTTP/1.1 is not a solution (or maybe there's something in my
> environment: I've removed the proxy, but so far no luck...)
>
> If possible, take a look at my code, and test it.
> If really needed, your proposed fix in the get_imdbID will be
> introduced.


OK, I've done some more thorough testing on this now.

Let me demonstrate the problem:
This is the (stripped down) version of the HTTP request that IMDbPY  
sends of to imdb.com:

GET /find?q=tristan+%2B+isolde+%282006%29;s=pt HTTP/1.0
User-Agent: Mozilla/5.0

If you telnet to imdb.com port 80, and give it that, you will get a  
bunch of HTML (the search results page). Which is obviously not what  
you want.

Previously I suggested that there's a difference between doing the  
same query over HTTP/1.0 and 1.1. I was wrong;

GET /find?s=all&q=tristan+%2B+isolde+%282006%29 HTTP/1.1
Host: imdb.com
User-Agent: Mozilla/5.0

This one will give you a HTTP 302 Found, which redirects you to  
http://imdb.com/title/tt0375154/. I figured that since this was HTTP/ 
1.1, but otherwise more or less the same, they had some kind of  
degradation of service on the old 1.0 protocol. Not so.

This query:

GET /find?s=all&q=tristan+%2B+isolde+%282006%29 HTTP/1.0
User-Agent: Mozilla/5.0

Which has the *same* querystring (&s=all) as the 1.1 before, *also*  
gives you a 302 found! The problem is that q=<foo>;s=pt does not  
always redirect you to the match. That's why my hack in get_imdbID  
worked.

Bottom line is: You can stay with urllib, but I highly recommend  
switching to urllib2 anyway. No point in using archaic modules in  
there. The problem is, that the query that you do now, does not  
return the match. I'm guessing this is due to the IMDbPyWeb account  
on imdb.com?

-- 
Jesper Noehr
[EMAIL PROTECTED]




-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Re: [Imdbpy-devel] Unable to get IMDb for specific entry

Reply via email to