Hi John,

  I posted it earlier as a .txt file, but since it's small I could just include 
it in this email:


import java.net.URL;
import java.net.URLClassLoader;


public class test2{

    public test2(){}

    public static void main (String [] args) throws Exception{
        test2 t = new test2();
        URL [] theURLs = new URL[7];

        theURLs[5] = new 
URL("file:/home/chris/cs599-search-engines/nutch/build/plugins/parse-rss/saxpath.jar");
        theURLs[6] = new 
URL("file:/home/chris/cs599-search-engines/nutch/build/plugins/parse-rss/jaxen-full.jar");
        //theURLs[5] = new 
URL("file:/home/chris/cs599-search-engines/nutch/build/plugins/parse-rss/jdom.jar");
        theURLs[0] = new 
URL("file:/home/chris/cs599-search-engines/nutch/build/plugins/parse-rss/parse-rss.jar");
        theURLs[1] = new 
URL("file:/home/chris/cs599-search-engines/nutch/build/classes/");
        theURLs[2] = new 
URL("file:/home/chris/cs599-search-engines/nutch/build/plugins/parse-rss/commons-feedparser-0.5-beta.jar");
        theURLs[3] = new 
URL("file:/home/chris/cs599-search-engines/nutch/build/plugins/parse-rss/log4j-1.2.6.jar");
        theURLs[4] = new 
URL("file:/home/chris/cs599-search-engines/nutch/build/plugins/parse-rss/commons-httpclient-3.0-beta1.jar");
        System.out.println(t.getClass().getName());
        URLClassLoader theLoader = new URLClassLoader(theURLs, 
t.getClass().getClassLoader());
        //      theLoader.loadClass("org.jdom.Document");

        Class c = theLoader.loadClass("net.nutch.parse.rss.RSSParser");
        Object o = (c.getConstructors()[0]).newInstance(null);
        c.getMethod("testMain",null).invoke(o,null); //this works fine!
        
        //org.jdom.Document d = new org.jdom.Document();
    }

}

There it is. Of course, to run it, you will need to have those jar files that I 
am dynamically loading, along with making sure that the path to those jar files 
works on whatever system that you are running it on. You will also need the 
parse-rss.jar file that has the RSS Parser Plugin that I'm currently working on 
for Nutch. You can get that from the following url: 
http://baron.pagemewhen.com:8080/~chris/parse-rss.zip

Then, just give it a compile and it should work. Basically the issue that I'm 
having is that I can get the feedparser working fine, in it's own standalone 
program. I can run it from the command line, or from another standalone program 
as I'm demonstrating above. However, when I am trying to use the feedparser as 
part of my parse-rss plugin, it chokes when the Parse getParse method is 
called, because it involves calls to the feedparser. The error message that I'm 
receiving (as shown in the original crawl log file that I posted to the group) 
says something to the effect of: ClassNotFoundException: org/jdom/Document, 
which is a class that the feedparser depends on. The weird thing is however, if 
you look at my log file, the PluginRepository read the necessary jar file 
(jdom.jar) from the plugin.xml file (within the library sub-element of 
runtime), from the correct path on my system (as indicated by ensuring that 
it's the same path that works in my test2.java program above). I e
ven put println's and LOG.info messages everywhere to ensure that the 
PluginClassLoader loads all the necessary jar urls (from its constructor), when 
the getClassLoader method is called in the PluginRepository getPluginInstance 
method is called. I see it loading the jdom.jar file. However, when the 
feedparser gets called from my plugin when nutch is trying to getContent from 
an rss file, the feedparser chokes cuz it can't find the org/jdom/Document 
class.

That's basically my problem in a nutshell. Sorry for the long-windedness (is 
that a word? :-) ), but I just wanted to be as thorough as I could real fast 
when explaining the extent to which I've investigated this problem that I'm 
having.

Any help anyone could provide would be much appreciated.

Thanks,
 Chris


----- Original Message -----
From: John X <[EMAIL PROTECTED]>
Date: Saturday, March 26, 2005 2:00 pm
Subject: Re: Huge Problem trying to develop plugin for Nutch

> On Sat, Mar 26, 2005 at 01:13:33PM -0800, CHRIS A MATTMANN wrote:
> > Hi John,
> > 
> >  Thanks for your reply. Actually I already have the feedparser 
> working from the command line. I also included a program, 
> test2.java with my original email that shows how I can dynamically 
> load the class and call the feedparser method. So, I actually 
> already have that tool.
> 
> Could you post your working command line tool?
> 
> John
> 
> > 
> > Any help on this issue would be greatly appreciated.
> > 
> > Thanks,
> >   Chris
> > 
> > 
> > ----- Original Message -----
> > From: John X <[EMAIL PROTECTED]>
> > Date: Saturday, March 26, 2005 1:07 am
> > Subject: Re: Huge Problem trying to develop plugin for Nutch
> > 
> > > Why try it the hard way? You may want to
> > > create a simple tool, just calling feedparser to parse your 
> hi.rss?> > Have that work first, then worry about dynamic loading 
> and nutch 
> > > plugin system.
> > > Let us know when you have the simple tool.
> > > 
> > > John
> > > 
> > > On Fri, Mar 25, 2005 at 06:08:50PM -0800, Chris Mattmann wrote:
> > > > Hi Folks,
> > > > 
> > > >  
> > > > 
> > > >  My name is Chris Mattmann: I work at the Jet Propulsion 
> > > Laboratory in
> > > > Pasadena, CA, U.S.A. I'm new to the list. Nice to meet you all.
> > > > 
> > > >  
> > > > 
> > > > I am having some * major * trouble trying to build an RSS 
> > > content parser
> > > > plugin for nutch. My plugin is based on the parse-pdf plugin 
> > > structure and
> > > > uses the apache commons-feedparser library out of the 
> Jakarta 
> > > sandbox to try
> > > > and parse rss feeds and send them to nutch for indexing. The 
> > > probem that I
> > > > am having is * very * strange. Basically after about 2 days 
> of 
> > > going around
> > > > the Nutch source code I've tracked my problem down to 
> basically 
> > > the fact
> > > > that for whatever reason, the jdom.jar library the commons-
> > > feedparser relies
> > > > on, is not accessible via the Nutch Plugin runtime. I keep 
> > > getting the same
> > > > error whenever I run the crawler to crawl Rss pages. I've 
> set up 
> > > a dummy web
> > > > page with a single link to an rss file. Here's the webpage:
> > > > 
> > > >  
> > > > 
> > > 
> > 
> > 
> __________________________________________
> http://www.neasys.com - A Good Place to Be
> Come to visit us today!
> 

Reply via email to