Michael, this looks like an error in your Nutch configuration, or possibly your CLASSPATH. I'd guess it's the former. Take a look at the following nutch-site.xml (or nutch-default) properties, and make sure they reference (a) the right place on disk, (b) plugins that actually exist:

- plugin.folders
- plugin.includes
- urlfilter.order

If you're still stuck, email me privately and we'll try to work through this.

--Matt

On Sep 13, 2005, at 7:14 PM, Michael Ji wrote:

hi Matt:

Thanks your advice.

I can trigger URLFilterChecker successfully, however,
get the following error, complain about index filter.
Could you let me know where the problem will be?

"
050921 191015 impl:
point=org.apache.nutch.net.URLFilter
class=org.apache.nutch.net.RegexURLFilter

050921 191015 not including:
E:\programs\cygwin\home\fji\versionControl\nutch_V07_P87\nutch\build \plugins\WhitelistURLFilter

050921 191015 SEVERE
org.apache.nutch.plugin.PluginRuntimeException:
extension point:
org.apache.nutch.indexer.IndexingFilter does not
exist.
Exception in thread "main"
java.lang.ExceptionInInitializerError
    at
org.apache.nutch.net.URLFilterChecker.checkAll (URLFilterChecker.java:93)
    at
org.apache.nutch.net.URLFilterChecker.main(URLFilterChecker.java:126)
Caused by: java.lang.RuntimeException:
org.apache.nutch.plugin.PluginRuntimeException:
extension point:
org.apache.nutch.indexer.IndexingFilter does not
exist.
    at
org.apache.nutch.plugin.PluginRepository.getInstance (PluginRepository.java:147)
    at
org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:40)
    ... 2 more
Caused by:
org.apache.nutch.plugin.PluginRuntimeException:
extension point:
org.apache.nutch.indexer.IndexingFilter does not
exist.
    at
org.apache.nutch.plugin.PluginRepository.installExtensions (PluginRepository.java:78)
    at
org.apache.nutch.plugin.PluginRepository.<init> (PluginRepository.java:61)
    at
org.apache.nutch.plugin.PluginRepository.getInstance (PluginRepository.java:144)
    ... 3 more
"

thanks,

Michael Ji


--- Matt Kangas <[EMAIL PROTECTED]> wrote:


Hi Michael,

Ordinarily there's no need to edit bin/nutch to run
a specific class.
If the class is in a JAR in <nutch-home>/lib, you
can just say "nutch
<full class name>". For example, the following two
commands are
equivalent:

$ nutch crawl
$ nutch org.apache.nutch.tools.CrawlTool

However, the situation is a little different for
plugins. Ordinarily
the classes for a plugin are placed in
<nutch-home>/plugins/<plugin-
name>, not <nutch-home>/lib. To instantiate the
plugin class, you
must *another* class which calls the appropriate
plugin factory. For
URLFilter plugins, the factory class is
org.apache.nutch.net.URLFilters. This class does not
have a main()
method, but there is a helper class to test filters,

URLFilterChecker. You can run it as follows:

$ nutch org.apache.nutch.net.URLFilterChecker
-allCombined < urls.txt

Hope that helps. Let me know if that doesn't work
for you.

--Matt

On Sep 11, 2005, at 3:20 PM, Michael Ji wrote:


hi Matt:

I implemented and compiled your patch in Nutch 07
successfully.

However, I met a running problem, when I want to

test

patch manually by calling its' class.

I edited bin/nutch and added line,
"
elif [ "$COMMAND" = WhitelistFilterTester ] ; then
  CLASS=epile.crawl.plugin.WhitelistURLFilter
"

But when I call it, give me error as
"
Exception in thread "main"
java.lang.NoClassDefFoundError:

epile/crawl/plugin/Wh

itelistURLFilter
"

I guess the classpath is not defined properly.

My environment setting as followings:

1. nutch build.xml
adding "<ant dir="epile" target="deploy"/> "

2. nutch/src/plugin/
create dir of "epile-basic/src/java"
then copy unzip nutch-87 of epile/crawl.. to that

dir


3. I created plugin.xml in epile-basic/, same as

the

one you loaded in patch;
and a new build.xml of
"
<?xml version="1.0"?>

<project name="WhitelistURLFilter" default="jar">

  <import file="../build-plugin.xml"/>

</project>

"

4. In nutch, I can run "ant" successfully,
in nutch/build/, a new WhitelistURLFilter/ is

created

and with WhitelistURLFilter.class inside;

Did I miss something important?

thanks,

Michael Ji






--
Matt Kangas / [EMAIL PROTECTED]


Reply via email to