I am using  http://htmlparser.sourseforge.net for my Data Mining engine.
It has 'lexer' package, lightweight, and I don't need to perform ANY
html/xml error checking etc., - it's lightweight low-level 'parser', it is
not a parser, it is not DOM, SAX, etc. We do not need to create DOM to
extract Outlink[], and to extract plain text.
What about licensing?

We can develop own low-lewel HTML (InputSource) processing engine from
scratch, we need only Outlink[] and PlainText.

Reply via email to