Re: [BangPypers] HTML Parsing in python

2009-10-20 Thread Anand Balachandran Pillai
On Thu, Sep 10, 2009 at 7:44 PM, Puneet Aggarwal look4pun...@gmail.comwrote: Thanks all for the suggestions. I think I will start with BeautifulSoup (3.0.7a) and will experiment with other suggested libs if it does not fit into my requirement or if I face issues with this. You are not going

Re: [BangPypers] HTML Parsing in python

2009-10-20 Thread Yuvi Panda
I use lxml.html. Just as good, and MUCH faster. A pain to install though. On Tue, Oct 20, 2009 at 6:32 PM, Anand Balachandran Pillai abpil...@gmail.com wrote: On Thu, Sep 10, 2009 at 7:44 PM, Puneet Aggarwal look4pun...@gmail.comwrote: Thanks all for the suggestions. I think I will start

Re: [BangPypers] HTML Parsing in python

2009-10-20 Thread srid
On Tue, Oct 20, 2009 at 6:34 PM, Yuvi Panda yuvipa...@gmail.com wrote: I use lxml.html. Just as good, and MUCH faster. A pain to install though. If you're using ActivePython, the following command is just enough to get lxml installed on Mac, Linux or Windows: $ pypm install lxml

Re: [BangPypers] HTML Parsing in python

2009-09-10 Thread Anand Chitipothu
2009/9/10 Puneet Aggarwal look4pun...@gmail.com: Hi BangPypers, Can anyone suggest me a good library for html parsing in python ? I googled a found few libararies BeautifulSoup, HTMLParser, SGMLParser etc. Can anyone suggest me which should I go for from your experience. I recommend

Re: [BangPypers] HTML Parsing in python

2009-09-10 Thread Baiju M
On Thu, Sep 10, 2009 at 2:29 PM, Puneet Aggarwallook4pun...@gmail.com wrote: Hi BangPypers, Can anyone suggest me a good library for html parsing in python ? http://code.google.com/p/html5lib/ -- Baiju M ___ BangPypers mailing list

Re: [BangPypers] HTML Parsing in python

2009-09-10 Thread Noufal Ibrahim
On Thu, Sep 10, 2009 at 3:41 PM, Anand Chitipothu anandol...@gmail.com wrote: 2009/9/10 Puneet Aggarwal look4pun...@gmail.com: Hi BangPypers, Can anyone suggest me a good library for html parsing in python ? I googled a found few libararies BeautifulSoup, HTMLParser, SGMLParser etc. Can

Re: [BangPypers] HTML Parsing in python

2009-09-10 Thread Ramkumar R
or use cElementTree (the ElementTree implementation in C). ElementTree is an XML parser. Forget that I mentioned it if you're only going to be parsing HTML. ___ BangPypers mailing list BangPypers@python.org

Re: [BangPypers] HTML Parsing in python

2009-09-10 Thread Ramkumar R
+1 Beautiful Soup The author is no longer interested in maintaining BeautifulSoup (see http://www.crummy.com/software/BeautifulSoup/3.1-problems.html). The BeautifulSoup port to Python 3.x is pretty terrible, as it's based on the error intolerant HTMLParser. While it's a fantastic library for

Re: [BangPypers] HTML Parsing in python

2009-09-10 Thread S.Ramaswamy
On Thu, Sep 10, 2009 at 2:29 PM, Puneet Aggarwal look4pun...@gmail.comwrote: Hi BangPypers, Can anyone suggest me a good library for html parsing in python ? I googled a found few libararies BeautifulSoup, HTMLParser, SGMLParser etc. Can anyone suggest me which should I go for from your

Re: [BangPypers] HTML Parsing in python

2009-09-10 Thread Baishampayan Ghose
Can anyone suggest me a good library for html parsing in python ? I googled a found few libararies BeautifulSoup, HTMLParser, SGMLParser etc. Can anyone suggest me which should I go for from your experience. BeautifulSoup was OK, but now it's broken. Use lxml, it's very good.

Re: [BangPypers] HTML Parsing in python

2009-09-10 Thread Dhananjay Nene
Do you require tolerance for non well formed xml / html ? If y, you may consider sgmlop http://effbot.org/zone/sgmlop-index.htm On Thu, Sep 10, 2009 at 7:07 PM, Baishampayan Ghose b.gh...@gmail.comwrote: Can anyone suggest me a good library for html parsing in python ? I googled a found few

Re: [BangPypers] HTML Parsing in python

2009-09-10 Thread Puneet Aggarwal
Thanks all for the suggestions. I think I will start with BeautifulSoup (3.0.7a) and will experiment with other suggested libs if it does not fit into my requirement or if I face issues with this. On Thu, Sep 10, 2009 at 7:07 PM, Baishampayan Ghose b.gh...@gmail.comwrote: Can anyone suggest me

Re: [BangPypers] HTML Parsing in python

2009-09-10 Thread Puneet Aggarwal
Hi Dhananjay, My requirement is simple. I need to extract information from a page. But the pages can be malformed html or it can be any junk html. So the tolerance required. Thanks, Puneet On Thu, Sep 10, 2009 at 7:33 PM, Dhananjay Nene dhananjay.n...@gmail.comwrote: Do you require tolerance

Re: [BangPypers] HTML Parsing in python

2009-09-10 Thread srid
On Thu, Sep 10, 2009 at 6:37 AM, Baishampayan Ghose b.gh...@gm BeautifulSoup was OK, but now it's broken. Use lxml, it's very good. http://codespeak.net/lxml/ IanB has an interesting blog post on using lxml to parse HTML: