Nutch Improvement - HTML Parser

Fuad Efendi Fri, 17 Feb 2006 19:13:09 -0800

I am using  http://htmlparser.sourseforge.net for my Data Mining engine.
It has 'lexer' package, lightweight, and I don't need to perform ANY
html/xml error checking etc., - it's lightweight low-level 'parser', it is
not a parser, it is not DOM, SAX, etc. We do not need to create DOM to
extract Outlink[], and to extract plain text.
What about licensing?


We can develop own low-lewel HTML (InputSource) processing engine from
scratch, we need only Outlink[] and PlainText.

Nutch Improvement - HTML Parser

Reply via email to