I started a crawl after adding a plugin given on the wiki (
http://wiki.apache.org/nutch/WritingPluginExample-0%2e9)

When I crawled, it stopped after throwing an exception. Here is what the
hadoop.log file says:

----------------------------------------------------------------------------------------------------------------
2007-10-07 16:42:25,407 INFO  crawl.Crawl - crawl started in:
/home/sagar/nutch_crawl
2007-10-07 16:42:25,422 INFO  crawl.Crawl - rootUrlDir =
/home/sagar/urls/iiitb
2007-10-07 16:42:25,422 INFO  crawl.Crawl - threads = 10
2007-10-07 16:42:25,422 INFO  crawl.Crawl - depth = 3
2007-10-07 16:42:25,608 INFO  crawl.Injector - Injector: starting
2007-10-07 16:42:25,608 INFO  crawl.Injector - Injector: crawlDb:
/home/sagar/nutch_crawl/crawldb
2007-10-07 16:42:25,608 INFO  crawl.Injector - Injector: urlDir:
/home/sagar/urls/iiitb
2007-10-07 16:42:25,626 INFO  crawl.Injector - Injector: Converting injected
urls to crawl db entries.
2007-10-07 16:42:27,207 INFO  plugin.PluginRepository - Plugins: looking in:
/home/sagar/nutch-0.9/src/plugin
2007-10-07 16:42:27,620 INFO  plugin.PluginRepository - Plugin
Auto-activation mode: [true]
2007-10-07 16:42:27,620 INFO  plugin.PluginRepository - Registered Plugins:
2007-10-07 16:42:27,620 INFO  plugin.PluginRepository -     the nutch core
extension points (nutch-extensionpoints)
2007-10-07 16:42:27,620 INFO  plugin.PluginRepository -     Basic Query
Filter (query-basic)
2007-10-07 16:42:27,620 INFO  plugin.PluginRepository -     Basic URL
Normalizer (urlnormalizer-basic)
2007-10-07 16:42:27,620 INFO  plugin.PluginRepository -     Html Parse
Plug-in (parse-html)
2007-10-07 16:42:27,620 INFO  plugin.PluginRepository -     Basic Indexing
Filter (index-basic)
2007-10-07 16:42:27,620 INFO  plugin.PluginRepository -     Site Query
Filter (query-site)
2007-10-07 16:42:27,620 INFO  plugin.PluginRepository -     Basic Summarizer
Plug-in (summary-basic)
2007-10-07 16:42:27,620 INFO  plugin.PluginRepository -     HTTP Framework
(lib-http)
2007-10-07 16:42:27,620 INFO  plugin.PluginRepository -     Text Parse
Plug-in (parse-text)
2007-10-07 16:42:27,620 INFO  plugin.PluginRepository -     Regex URL Filter
(urlfilter-regex)
2007-10-07 16:42:27,620 INFO  plugin.PluginRepository -     Pass-through URL
Normalizer (urlnormalizer-pass)
2007-10-07 16:42:27,620 INFO  plugin.PluginRepository -     Http Protocol
Plug-in (protocol-http)
2007-10-07 16:42:27,620 INFO  plugin.PluginRepository -     Regex URL
Normalizer (urlnormalizer-regex)
2007-10-07 16:42:27,621 INFO  plugin.PluginRepository -     OPIC Scoring
Plug-in (scoring-opic)
2007-10-07 16:42:27,621 INFO  plugin.PluginRepository -     CyberNeko HTML
Parser (lib-nekohtml)
2007-10-07 16:42:27,621 INFO  plugin.PluginRepository -     JavaScript
Parser (parse-js)
2007-10-07 16:42:27,621 INFO  plugin.PluginRepository -     URL Query Filter
(query-url)
2007-10-07 16:42:27,621 INFO  plugin.PluginRepository -     Regex URL Filter
Framework (lib-regex-filter)
2007-10-07 16:42:27,621 INFO  plugin.PluginRepository - Registered
Extension-Points:
2007-10-07 16:42:27,621 INFO  plugin.PluginRepository -     Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2007-10-07 16:42:27,621 INFO  plugin.PluginRepository -     Nutch URL
Normalizer (org.apache.nutch.net.URLNormalizer)
2007-10-07 16:42:27,621 INFO  plugin.PluginRepository -     Nutch Protocol (
org.apache.nutch.protocol.Protocol)
2007-10-07 16:42:27,621 INFO  plugin.PluginRepository -     Nutch Analysis (
org.apache.nutch.analysis.NutchAnalyzer)
2007-10-07 16:42:27,621 INFO  plugin.PluginRepository -     Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2007-10-07 16:42:27,621 INFO  plugin.PluginRepository -     Nutch Indexing
Filter (org.apache.nutch.indexer.IndexingFilter)
2007-10-07 16:42:27,621 INFO  plugin.PluginRepository -     Nutch Online
Search Results Clustering Plugin (
org.apache.nutch.clustering.OnlineClusterer)
2007-10-07 16:42:27,621 INFO  plugin.PluginRepository -     HTML Parse
Filter (org.apache.nutch.parse.HtmlParseFilter)
2007-10-07 16:42:27,621 INFO  plugin.PluginRepository -     Nutch Content
Parser (org.apache.nutch.parse.Parser)
2007-10-07 16:42:27,621 INFO  plugin.PluginRepository -     Nutch Scoring (
org.apache.nutch.scoring.ScoringFilter)
2007-10-07 16:42:27,621 INFO  plugin.PluginRepository -     Nutch Query
Filter (org.apache.nutch.searcher.QueryFilter)
2007-10-07 16:42:27,621 INFO  plugin.PluginRepository -     Ontology Model
Loader (org.apache.nutch.ontology.Ontology)
2007-10-07 16:42:27,625 WARN  net.URLNormalizers -
URLNormalizers:PluginRuntimeException when initializing url normalizer
plugin urlnormalizer-basic instance in getURLNormalizers function:
attempting to continue instantiating plugins
2007-10-07 16:42:27,628 WARN  net.URLNormalizers -
URLNormalizers:PluginRuntimeException when initializing url normalizer
plugin urlnormalizer-regex instance in getURLNormalizers function:
attempting to continue instantiating plugins
2007-10-07 16:42:27,632 WARN  net.URLNormalizers -
URLNormalizers:PluginRuntimeException when initializing url normalizer
plugin urlnormalizer-pass instance in getURLNormalizers function: attempting
to continue instantiating plugins
2007-10-07 16:42:27,667 WARN  mapred.LocalJobRunner - job_l8t6s1
java.lang.RuntimeException: org.apache.nutch.plugin.PluginRuntimeException:
java.lang.ClassNotFoundException:
org.apache.nutch.urlfilter.regex.RegexURLFilter
    at org.apache.nutch.net.URLFilters.<init>(URLFilters.java:74)
    at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java
:60)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java
:58)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(
ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java
:58)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(
ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:170)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java
:126)
Caused by: org.apache.nutch.plugin.PluginRuntimeException:
java.lang.ClassNotFoundException:
org.apache.nutch.urlfilter.regex.RegexURLFilter
    at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java
:166)
    at org.apache.nutch.net.URLFilters.<init>(URLFilters.java:54)
    ... 8 more
Caused by: java.lang.ClassNotFoundException:
org.apache.nutch.urlfilter.regex.RegexURLFilter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java
:156)
    ... 9 more
----------------------------------------------------------------------------------------------------------------

How do I get over this exception? I checked the nutch sources. In the java
packages there is no urlfilter package under src/java/org/apache/nutch.

Please advise...

Reply via email to