Why try it the hard way? You may want to
create a simple tool, just calling feedparser to parse your hi.rss?
Have that work first, then worry about dynamic loading and nutch plugin system.
Let us know when you have the simple tool.

John

On Fri, Mar 25, 2005 at 06:08:50PM -0800, Chris Mattmann wrote:
> Hi Folks,
> 
>  
> 
>  My name is Chris Mattmann: I work at the Jet Propulsion Laboratory in
> Pasadena, CA, U.S.A. I'm new to the list. Nice to meet you all.
> 
>  
> 
> I am having some * major * trouble trying to build an RSS content parser
> plugin for nutch. My plugin is based on the parse-pdf plugin structure and
> uses the apache commons-feedparser library out of the Jakarta sandbox to try
> and parse rss feeds and send them to nutch for indexing. The probem that I
> am having is * very * strange. Basically after about 2 days of going around
> the Nutch source code I've tracked my problem down to basically the fact
> that for whatever reason, the jdom.jar library the commons-feedparser relies
> on, is not accessible via the Nutch Plugin runtime. I keep getting the same
> error whenever I run the crawler to crawl Rss pages. I've set up a dummy web
> page with a single link to an rss file. Here's the webpage:
> 
>  
> 

Reply via email to