Jesse Pelton wrote:
> You might want to consider libxml2 (http://www.xmlsoft.org/) or its C++ 
> wrapper, libxml++.  Since you mention browsers, you might also be able to 
> tease out the parser from the source for Gecko, KHTML, or WebKit.

Thanks Jesse for these suggestions.

> Note that parsing the "tag soup" HTML that makes up the Web is often a matter 
> of guesswork (...)

Agree.

Adding that sort of heuristic to Xerces would considerably complicate
the code and its maintenance.

>From what I see NekoHTML uses some Xerces API and doesn't _modify_
Xerces but _uses_ it.

I can't believe there's no something like NekoHTML written in C++ for
Xerces-C++.

The amount of HTML is huge comparing with XML and people are left with
no good tool to work with HTML.
Having C++ version of NekoHTML would make Xercec-C++ even more popular
and valuable.


-- 
Piotr Dobrogost
*** curlpp.org - c++ wrapper for libcurl ***

Reply via email to