On May 7, 2007, at 10:46 PM, Davide Alberani wrote: > On May 07, Davide Alberani <[EMAIL PROTECTED]> wrote: > >> No, these was just some ideas: I've done it and the problem >> persists, so it looks like you've spotted the bug correctly. > > In the CVS there is the first implementation of the switch from > urllib to urllib2, based on your patch and hints. > > I've not changed the get_imdbID method, so far. > > I've done a fast test, and moving from urllib to urllib2 doesn't > seem to have fixed the problem with titles with "+"; maybe there is > something wrong in my code, or maybe the switch from HTTP/1.0 to > HTTP/1.1 is not a solution (or maybe there's something in my > environment: I've removed the proxy, but so far no luck...) > > If possible, take a look at my code, and test it. > If really needed, your proposed fix in the get_imdbID will be > introduced.
OK, I've done some more thorough testing on this now. Let me demonstrate the problem: This is the (stripped down) version of the HTTP request that IMDbPY sends of to imdb.com: GET /find?q=tristan+%2B+isolde+%282006%29;s=pt HTTP/1.0 User-Agent: Mozilla/5.0 If you telnet to imdb.com port 80, and give it that, you will get a bunch of HTML (the search results page). Which is obviously not what you want. Previously I suggested that there's a difference between doing the same query over HTTP/1.0 and 1.1. I was wrong; GET /find?s=all&q=tristan+%2B+isolde+%282006%29 HTTP/1.1 Host: imdb.com User-Agent: Mozilla/5.0 This one will give you a HTTP 302 Found, which redirects you to http://imdb.com/title/tt0375154/. I figured that since this was HTTP/ 1.1, but otherwise more or less the same, they had some kind of degradation of service on the old 1.0 protocol. Not so. This query: GET /find?s=all&q=tristan+%2B+isolde+%282006%29 HTTP/1.0 User-Agent: Mozilla/5.0 Which has the *same* querystring (&s=all) as the 1.1 before, *also* gives you a 302 found! The problem is that q=<foo>;s=pt does not always redirect you to the match. That's why my hack in get_imdbID worked. Bottom line is: You can stay with urllib, but I highly recommend switching to urllib2 anyway. No point in using archaic modules in there. The problem is, that the query that you do now, does not return the match. I'm guessing this is due to the IMDbPyWeb account on imdb.com? -- Jesper Noehr [EMAIL PROTECTED] ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel