Hello,
 I'm trying to register the crawler plugin but I can't.
I followed the instructions of "http://any23.apache.org/any23-plugins.html";,
It means, I added *the apache-any23-basic-crawler-1.0.0-incubating.jar* to
the CLASSPATH_PREFIX environment variable. I added the crawler4j-3.3.jar,
httpclient-4.1.2.jar, httpcore-4.1.4.jar and je-4.0.92.jar to the *
$HOME/.any23/plugins* directory. But when I try to executed the command:

>any23 -X crawler -s -f ntriples http://www.repubblica.it 1> out.nt 2>
repubblica.log

I only get this:

------------------------------------------------------------------------
Apache Any23 :: crawler
------------------------------------------------------------------------

Jun 09, 2013 2:48:55 PM org.apache.any23.Any23 <init>
INFO:
======================= Configuration Properties =======================
any23.http.client.max.connections=5
any23.extraction.metadata.timesize=off
any23.rdfa.extractor.xslt=rdfa.xslt
any23.extraction.csv.comment=#
any23.extraction.head.meta=off
any23.extraction.csv.field=,
any23.extraction.rdfa.programmatic=on
any23.microdata.strict=off
any23.http.client.timeout=10000
any23.extraction.metadata.nesting=on
any23.core.version=0.7.0-incubating
(tags/any23-0.7.0-incubating/core@r1358077; 2012-07-06 10:41:50+0200)
any23.http.user.agent.default=Any23-CLI
any23.extraction.context.uri=?
any23.extraction.metadata.domain.per.entity=off
any23.plugin.dirs=./plugins
any23.microdata.ns.default=http://rdf.data-vocabulary.org/
========================================================================

Jun 09, 2013 2:48:56 PM org.apache.any23.rdf.PopularPrefixes getPrefixes
INFO: Loading prefixes from /org/apache/any23/prefixes/prefixes.properties

------------------------------------------------------------------------
Apache Any23 FAILURE

Execution terminated with errors:
java.lang.NoClassDefFoundError: edu/uci/ics/crawler4j/crawler/WebCrawler
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(Unknown Source)
at java.security.SecureClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.access$100(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at org.apache.any23.plugin.crawler.SiteCrawler.<clinit>(SiteCrawler.java:64)
at org.apache.any23.cli.Crawler.run(Crawler.java:101)
at org.apache.any23.cli.ToolRunner.execute(ToolRunner.java:136)
at org.apache.any23.cli.ToolRunner.main(ToolRunner.java:69)
Caused by: java.lang.ClassNotFoundException:
edu.uci.ics.crawler4j.crawler.WebCrawler
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 16 more

Total time: 0s
Finished at: Sun Jun 09 14:48:56 BRT 2013
Final Memory: 20M/479M
------------------------------------------------------------------------

I think the trouble is because I didn't add the Apache Any23 JVM classpath
variable, but what is the name of this variable?

Regards,
Humberto

Reply via email to