hello guys i am a phd student and i use any23 for a work of mine. i
have embedded it into jsp server andi i used the following lines of
code:
......
some imports
<%@ page import="org.apache.any23.*"%>
<%@ page import="org.apache.any23.Any23.*"%>
<%@ page import="org.apache.any23.extractor.*"%>
<%@ page import="org.apache.any23.extractor.rdf.*"%>
<%@ page import="org.apache.any23.extractor.rdfa.*"%>
<%@ page import="org.apache.any23.extractor.xpath.*"%>
<%@ page import="org.apache.any23.extractor.html.*"%>
<%@ page import="org.apache.any23.extractor.html.annotations.*"%>
<%@ page import="org.apache.any23.http.*"%>
<%@ page import="org.apache.any23.writer.*"%>
<%@ page import="org.apache.any23.source.*"%>
<%@ page import="org.apache.any23.validator.*"%>
<%@ page import="org.apache.any23.validator.rule.*"%>
<%@ page import="org.apache.any23.vocab.*"%>
<%@ page import="org.apache.any23.util.*"%>
<%@ page import="org.apache.any23.mime.*"%>
<%@ page import="org.apache.any23.servlet.*"%>
<%@ page import="org.apache.any23.plugin.*"%>
<%@ page import="org.apache.any23.plugin.htmlscraper.*"%>
<%@ page import="net.rootdev.javardfa.*"%>
<%
Any23 runner = new Any23();
runner.setHTTPUserAgent("test-user-agent");
HTTPClient httpClient = runner.getHTTPClient();
DocumentSource source = new
HTTPDocumentSource(httpClient,"http://www.contra.gr");
ByteArrayOutputStream out1 = new ByteArrayOutputStream();
TripleHandler handler = new NTriplesWriter(out1);
try {
runner.extract(source, handler);
} finally {
handler.close();
}
String n3 = out1.toString("UTF-8");
out.print(n3);
....
Well.it works fine if i parse for a direct source of rdf data but in
case i parse from an html website (www.contra.gr for example) i get
the following error
java.lang.IllegalArgumentException: Illegal XPath expression:
//*/h:head/h:base[position()=1]/@href
Can you help me; Can you also send me examples of use (or links)
because i can't find any examples how to use it;
Thanks for your help