Michael, this looks like an error in your Nutch configuration, or
possibly your CLASSPATH. I'd guess it's the former. Take a look at
the following nutch-site.xml (or nutch-default) properties, and make
sure they reference (a) the right place on disk, (b) plugins that
actually exist:
- plugin.folders
- plugin.includes
- urlfilter.order
If you're still stuck, email me privately and we'll try to work
through this.
--Matt
On Sep 13, 2005, at 7:14 PM, Michael Ji wrote:
hi Matt:
Thanks your advice.
I can trigger URLFilterChecker successfully, however,
get the following error, complain about index filter.
Could you let me know where the problem will be?
"
050921 191015 impl:
point=org.apache.nutch.net.URLFilter
class=org.apache.nutch.net.RegexURLFilter
050921 191015 not including:
E:\programs\cygwin\home\fji\versionControl\nutch_V07_P87\nutch\build
\plugins\WhitelistURLFilter
050921 191015 SEVERE
org.apache.nutch.plugin.PluginRuntimeException:
extension point:
org.apache.nutch.indexer.IndexingFilter does not
exist.
Exception in thread "main"
java.lang.ExceptionInInitializerError
at
org.apache.nutch.net.URLFilterChecker.checkAll
(URLFilterChecker.java:93)
at
org.apache.nutch.net.URLFilterChecker.main(URLFilterChecker.java:126)
Caused by: java.lang.RuntimeException:
org.apache.nutch.plugin.PluginRuntimeException:
extension point:
org.apache.nutch.indexer.IndexingFilter does not
exist.
at
org.apache.nutch.plugin.PluginRepository.getInstance
(PluginRepository.java:147)
at
org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:40)
... 2 more
Caused by:
org.apache.nutch.plugin.PluginRuntimeException:
extension point:
org.apache.nutch.indexer.IndexingFilter does not
exist.
at
org.apache.nutch.plugin.PluginRepository.installExtensions
(PluginRepository.java:78)
at
org.apache.nutch.plugin.PluginRepository.<init>
(PluginRepository.java:61)
at
org.apache.nutch.plugin.PluginRepository.getInstance
(PluginRepository.java:144)
... 3 more
"
thanks,
Michael Ji
--- Matt Kangas <[EMAIL PROTECTED]> wrote:
Hi Michael,
Ordinarily there's no need to edit bin/nutch to run
a specific class.
If the class is in a JAR in <nutch-home>/lib, you
can just say "nutch
<full class name>". For example, the following two
commands are
equivalent:
$ nutch crawl
$ nutch org.apache.nutch.tools.CrawlTool
However, the situation is a little different for
plugins. Ordinarily
the classes for a plugin are placed in
<nutch-home>/plugins/<plugin-
name>, not <nutch-home>/lib. To instantiate the
plugin class, you
must *another* class which calls the appropriate
plugin factory. For
URLFilter plugins, the factory class is
org.apache.nutch.net.URLFilters. This class does not
have a main()
method, but there is a helper class to test filters,
URLFilterChecker. You can run it as follows:
$ nutch org.apache.nutch.net.URLFilterChecker
-allCombined < urls.txt
Hope that helps. Let me know if that doesn't work
for you.
--Matt
On Sep 11, 2005, at 3:20 PM, Michael Ji wrote:
hi Matt:
I implemented and compiled your patch in Nutch 07
successfully.
However, I met a running problem, when I want to
test
patch manually by calling its' class.
I edited bin/nutch and added line,
"
elif [ "$COMMAND" = WhitelistFilterTester ] ; then
CLASS=epile.crawl.plugin.WhitelistURLFilter
"
But when I call it, give me error as
"
Exception in thread "main"
java.lang.NoClassDefFoundError:
epile/crawl/plugin/Wh
itelistURLFilter
"
I guess the classpath is not defined properly.
My environment setting as followings:
1. nutch build.xml
adding "<ant dir="epile" target="deploy"/> "
2. nutch/src/plugin/
create dir of "epile-basic/src/java"
then copy unzip nutch-87 of epile/crawl.. to that
dir
3. I created plugin.xml in epile-basic/, same as
the
one you loaded in patch;
and a new build.xml of
"
<?xml version="1.0"?>
<project name="WhitelistURLFilter" default="jar">
<import file="../build-plugin.xml"/>
</project>
"
4. In nutch, I can run "ant" successfully,
in nutch/build/, a new WhitelistURLFilter/ is
created
and with WhitelistURLFilter.class inside;
Did I miss something important?
thanks,
Michael Ji
--
Matt Kangas / [EMAIL PROTECTED]