On Tue, Sep 28, 2010 at 10:50 PM, Israel Fruchter
<israel.fruch...@gmail.com> wrote:
> I fixed my two issues (Person name & headshot),

Great!  Thank you very much.

I'll have time to check the patch and commit it to Mercurial only
tomorrow.  By the way, anyone should feel free to fork the IMDbPY
repository on Bitbucket (specifically
http://bitbucket.org/alberanid/imdbpy_new_search_parsers/ ),
commit his changes and ask for a pull.

> BTW, I really like the parser, it's a good scraper,

Most of the credit should go to H. Turgut Uyar, who wrote that
wonderful DOM/XPath-based parser.

> tried building it myself, imdb page killed lxml, elementtree and
> BeautifulSoup,

Strange: we're based on lxml (falling back to BeautifulSoup if
lxml is not installed).

> you think they are writing an ill-formed html deliberately to keep us
> scrapers away ??

Hmmm... I fear that, like every big portal, they have to cope with
so many browsers and environment that a nice and standard HTML
can't work. :-)


Thanks again!
-- 
Davide Alberani <davide.alber...@gmail.com>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Reply via email to