On Thu, Sep 10, 2009 at 7:44 PM, Puneet Aggarwal <look4pun...@gmail.com>wrote:

> Thanks all for the suggestions. I think I will start with BeautifulSoup
> (3.0.7a) and will experiment with other suggested libs if it does not fit
> into my requirement or if I face issues with this.
>

 You are not going to believe this, but the creator of BeautifulSoup
(Leonardo)
 advised me to use the SGMLParser module in Python for parsing HTML.  This
 was back in 2004 (or 2005) when I had written to him regarding
BeautifulSoup
 as parser in HarvestMan. He advised me to derive a wrapper from SGMLParser
 and thats what I did.

 In case you are interested, you can check out the HTML parser used in
HarvestMan.
It is available at,


http://harvestman-crawler.googlecode.com/svn/trunk/HarvestMan/harvestman/lib/pageparser.py



>
> On Thu, Sep 10, 2009 at 7:07 PM, Baishampayan Ghose <b.gh...@gmail.com>wrote:
>
>> > Can anyone suggest me a good library for html parsing in python ?
>> > I googled a found few libararies BeautifulSoup, HTMLParser, SGMLParser
>> etc.
>> >
>> > Can anyone suggest me which should I go for from your experience.
>>
>> BeautifulSoup was OK, but now it's broken. Use lxml, it's very good.
>>
>> http://codespeak.net/lxml/
>>
>> Regards,
>> BG
>>
>>
>> --
>> Baishampayan Ghose
>> b.ghose at gmail.com
>> _______________________________________________
>> BangPypers mailing list
>> BangPypers@python.org
>> http://mail.python.org/mailman/listinfo/bangpypers
>>
>
>
> _______________________________________________
> BangPypers mailing list
> BangPypers@python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>
>


-- 
--Anand
_______________________________________________
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers

Reply via email to