Hello, I'm trying to register the crawler plugin but I can't. I followed the instructions of "http://any23.apache.org/any23-plugins.html", It means, I added *the apache-any23-basic-crawler-1.0.0-incubating.jar* to the CLASSPATH_PREFIX environment variable. I added the crawler4j-3.3.jar, httpclient-4.1.2.jar, httpcore-4.1.4.jar and je-4.0.92.jar to the * $HOME/.any23/plugins* directory. But when I try to executed the command:
>any23 -X crawler -s -f ntriples http://www.repubblica.it 1> out.nt 2> repubblica.log I only get this: ------------------------------------------------------------------------ Apache Any23 :: crawler ------------------------------------------------------------------------ Jun 09, 2013 2:48:55 PM org.apache.any23.Any23 <init> INFO: ======================= Configuration Properties ======================= any23.http.client.max.connections=5 any23.extraction.metadata.timesize=off any23.rdfa.extractor.xslt=rdfa.xslt any23.extraction.csv.comment=# any23.extraction.head.meta=off any23.extraction.csv.field=, any23.extraction.rdfa.programmatic=on any23.microdata.strict=off any23.http.client.timeout=10000 any23.extraction.metadata.nesting=on any23.core.version=0.7.0-incubating (tags/any23-0.7.0-incubating/core@r1358077; 2012-07-06 10:41:50+0200) any23.http.user.agent.default=Any23-CLI any23.extraction.context.uri=? any23.extraction.metadata.domain.per.entity=off any23.plugin.dirs=./plugins any23.microdata.ns.default=http://rdf.data-vocabulary.org/ ======================================================================== Jun 09, 2013 2:48:56 PM org.apache.any23.rdf.PopularPrefixes getPrefixes INFO: Loading prefixes from /org/apache/any23/prefixes/prefixes.properties ------------------------------------------------------------------------ Apache Any23 FAILURE Execution terminated with errors: java.lang.NoClassDefFoundError: edu/uci/ics/crawler4j/crawler/WebCrawler at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(Unknown Source) at java.security.SecureClassLoader.defineClass(Unknown Source) at java.net.URLClassLoader.defineClass(Unknown Source) at java.net.URLClassLoader.access$100(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at org.apache.any23.plugin.crawler.SiteCrawler.<clinit>(SiteCrawler.java:64) at org.apache.any23.cli.Crawler.run(Crawler.java:101) at org.apache.any23.cli.ToolRunner.execute(ToolRunner.java:136) at org.apache.any23.cli.ToolRunner.main(ToolRunner.java:69) Caused by: java.lang.ClassNotFoundException: edu.uci.ics.crawler4j.crawler.WebCrawler at java.net.URLClassLoader$1.run(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) ... 16 more Total time: 0s Finished at: Sun Jun 09 14:48:56 BRT 2013 Final Memory: 20M/479M ------------------------------------------------------------------------ I think the trouble is because I didn't add the Apache Any23 JVM classpath variable, but what is the name of this variable? Regards, Humberto
