hello guys i am a phd student and i use any23 for a work of mine. i have embedded it into jsp server andi i used the following lines of code:

......
some imports

<%@ page import="org.apache.any23.*"%>
<%@ page import="org.apache.any23.Any23.*"%>
<%@ page import="org.apache.any23.extractor.*"%>
<%@ page import="org.apache.any23.extractor.rdf.*"%>
<%@ page import="org.apache.any23.extractor.rdfa.*"%>
<%@ page import="org.apache.any23.extractor.xpath.*"%>
<%@ page import="org.apache.any23.extractor.html.*"%>
<%@ page import="org.apache.any23.extractor.html.annotations.*"%>
<%@ page import="org.apache.any23.http.*"%>
<%@ page import="org.apache.any23.writer.*"%>
<%@ page import="org.apache.any23.source.*"%>
<%@ page import="org.apache.any23.validator.*"%>
<%@ page import="org.apache.any23.validator.rule.*"%>
<%@ page import="org.apache.any23.vocab.*"%>
<%@ page import="org.apache.any23.util.*"%>
<%@ page import="org.apache.any23.mime.*"%>
<%@ page import="org.apache.any23.servlet.*"%>
<%@ page import="org.apache.any23.plugin.*"%>
<%@ page import="org.apache.any23.plugin.htmlscraper.*"%>
<%@ page import="net.rootdev.javardfa.*"%>
<%

  Any23 runner = new Any23();
 runner.setHTTPUserAgent("test-user-agent");
 HTTPClient httpClient = runner.getHTTPClient();
DocumentSource source = new HTTPDocumentSource(httpClient,"http://www.contra.gr";);
 ByteArrayOutputStream out1 = new ByteArrayOutputStream();
 TripleHandler handler = new NTriplesWriter(out1);
      try {
     runner.extract(source, handler);
      } finally {
     handler.close();
      }
String n3 = out1.toString("UTF-8");
out.print(n3);
....



Well.it works fine if i parse for a direct source of rdf data but in case i parse from an html website (www.contra.gr for example) i get the following error java.lang.IllegalArgumentException: Illegal XPath expression: //*/h:head/h:base[position()=1]/@href

Can you help me; Can you also send me examples of use (or links) because i can't find any examples how to use it;

Thanks for your help


Reply via email to