On Thu, Sep 10, 2009 at 7:44 PM, Puneet Aggarwal look4pun...@gmail.comwrote:
Thanks all for the suggestions. I think I will start with BeautifulSoup
(3.0.7a) and will experiment with other suggested libs if it does not fit
into my requirement or if I face issues with this.
You are not going
I use lxml.html. Just as good, and MUCH faster. A pain to install though.
On Tue, Oct 20, 2009 at 6:32 PM, Anand Balachandran Pillai
abpil...@gmail.com wrote:
On Thu, Sep 10, 2009 at 7:44 PM, Puneet Aggarwal look4pun...@gmail.comwrote:
Thanks all for the suggestions. I think I will start
On Tue, Oct 20, 2009 at 6:34 PM, Yuvi Panda yuvipa...@gmail.com wrote:
I use lxml.html. Just as good, and MUCH faster. A pain to install though.
If you're using ActivePython, the following command is just enough to
get lxml installed on Mac, Linux or Windows:
$ pypm install lxml
2009/9/10 Puneet Aggarwal look4pun...@gmail.com:
Hi BangPypers,
Can anyone suggest me a good library for html parsing in python ?
I googled a found few libararies BeautifulSoup, HTMLParser, SGMLParser etc.
Can anyone suggest me which should I go for from your experience.
I recommend
On Thu, Sep 10, 2009 at 2:29 PM, Puneet Aggarwallook4pun...@gmail.com wrote:
Hi BangPypers,
Can anyone suggest me a good library for html parsing in python ?
http://code.google.com/p/html5lib/
--
Baiju M
___
BangPypers mailing list
On Thu, Sep 10, 2009 at 3:41 PM, Anand Chitipothu anandol...@gmail.com wrote:
2009/9/10 Puneet Aggarwal look4pun...@gmail.com:
Hi BangPypers,
Can anyone suggest me a good library for html parsing in python ?
I googled a found few libararies BeautifulSoup, HTMLParser, SGMLParser etc.
Can
or use cElementTree (the ElementTree implementation in C).
ElementTree is an XML parser. Forget that I mentioned it if you're
only going to be parsing HTML.
___
BangPypers mailing list
BangPypers@python.org
+1 Beautiful Soup
The author is no longer interested in maintaining BeautifulSoup (see
http://www.crummy.com/software/BeautifulSoup/3.1-problems.html). The
BeautifulSoup port to Python 3.x is pretty terrible, as it's based on
the error intolerant HTMLParser. While it's a fantastic library for
On Thu, Sep 10, 2009 at 2:29 PM, Puneet Aggarwal look4pun...@gmail.comwrote:
Hi BangPypers,
Can anyone suggest me a good library for html parsing in python ?
I googled a found few libararies BeautifulSoup, HTMLParser, SGMLParser
etc.
Can anyone suggest me which should I go for from your
Can anyone suggest me a good library for html parsing in python ?
I googled a found few libararies BeautifulSoup, HTMLParser, SGMLParser etc.
Can anyone suggest me which should I go for from your experience.
BeautifulSoup was OK, but now it's broken. Use lxml, it's very good.
Do you require tolerance for non well formed xml / html ? If y, you may
consider sgmlop http://effbot.org/zone/sgmlop-index.htm
On Thu, Sep 10, 2009 at 7:07 PM, Baishampayan Ghose b.gh...@gmail.comwrote:
Can anyone suggest me a good library for html parsing in python ?
I googled a found few
Thanks all for the suggestions. I think I will start with BeautifulSoup
(3.0.7a) and will experiment with other suggested libs if it does not fit
into my requirement or if I face issues with this.
On Thu, Sep 10, 2009 at 7:07 PM, Baishampayan Ghose b.gh...@gmail.comwrote:
Can anyone suggest me
Hi Dhananjay,
My requirement is simple. I need to extract information from a page. But the
pages can be malformed html or it can be any junk html. So the tolerance
required.
Thanks,
Puneet
On Thu, Sep 10, 2009 at 7:33 PM, Dhananjay Nene dhananjay.n...@gmail.comwrote:
Do you require tolerance
On Thu, Sep 10, 2009 at 6:37 AM, Baishampayan Ghose b.gh...@gm
BeautifulSoup was OK, but now it's broken. Use lxml, it's very good.
http://codespeak.net/lxml/
IanB has an interesting blog post on using lxml to parse HTML:
14 matches
Mail list logo