I have tried both HtmlParser v1.5 and NekoHTML. About the former my
implementation doesn't work as i.e. it get text from javascripts; I
have followed the hint from
http://htmlparser.sourceforge.net/javadoc/org/htmlparser/visitors/TextExtractingVisitor.html
The following is my NOT working implementation relying upon HtmlParser v1.5:
import org.htmlparser.visitors.TextExtractingVisitor;
import org.htmlparser.*;
import org.htmlparser.util.*;
public class HtmlFilter {
public static String getText(String html) {
Parser parser = Parser.createParser(html, "UTF-8");
TextExtractingVisitor visitor = new TextExtractingVisitor();
try {
parser.visitAllNodesWith(visitor);
} catch (ParserException e) {
e.printStackTrace();
}
String textInPage = visitor.getExtractedText();
return textInPage;
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]