On Thu, 24 Mar 2011 20:10:46 +0800 (CST) Whut Jia <whut_...@163.com> wrote: > Hi,all > I want to parse a html content and withdraw some element in myself > apache handler.Please ask how to do it. Thanks, > Jia
I think right now the only public C library for parsing html is in the venerable and long unmaintained libwww. However, I wrote a quick and simple, event driven parser library a few months ago -- I have been meaning to open source this on CCAN or somewhere but have not gotten around to it, so if you are interested you can send me a message directly, I have some basic scraper demos etc. It is not on the scale of libwww -- it is just a low level HTML parser -- but I am sure it could do what you want, and you can either compile it in or link to with an apache module (it has no further dependencies). -- "Enthusiasm is not the enemy of the intellect." (said of Irving Howe) "The angel of history[...]is turned toward the past." (Walter Benjamin)