Hi Stefan,

 I appreciate your efforts on this. While creating a separate, working rss
parser may work well for the Nutch project, unfortunately it doesn't really
help me much in my project at my search engine class at USC (which was the
motivation for me working the parse-rss plugin the first place). I need to
get the one that I was writing working because my teacher probably won't
allow me to utilize your code because that basically renders my project
obsolete. :-)

 On another level, I think it would important for the Nutch project to
discover why I'm receiving the error in my parse-rss plugin, because as John
X seems to have discovered as well, I don't think it's something that is a
trivial error, and on the other hand, I don't think it's something either
that a user has a low probability of encountering when developing a plugin
with Nutch. I think in fact, that I didn't really do anything out of the
ordinary when going about developing my parse-rss plugin, and I think that a
lot of users are going to be stumped when they are building plugins for
Nutch if we don't track this error, identify its cause, and remedy it. 

 Of course, this is all just my humble opinion.


Cheers,
  Chris



P.S. As John X noted, the code for my parse-rss plugin is available at:

http://baron.pagemewhen.com:8080/~chris/parse-rss.zip

It's in the standard Nutch plugin format, including source tree under "src"
and dependency jars under the "lib" directory.



______________________________________________
Chris A. Mattmann
[EMAIL PROTECTED] 
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                        Mailstop:  171-246
Phone:  818-354-8810
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.

> -----Original Message-----
> From: Stefan Groschupf [mailto:[EMAIL PROTECTED]
> Sent: Monday, March 28, 2005 9:17 AM
> To: [email protected]
> Cc: John X; [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: Re: [Nutch-dev] I made parse-rss work, but ... Re: Huge Problem
> trying to develop plugin for Nutch
> 
> Chris, John,
> 
> Since I'm not able to found the sources, I simple write a own RSS
> parser plugin and contributed it.
> Find it here;
> http://issues.apache.org/jira/browse/NUTCH-30
> I hope it is ok to fix your problem this way. :-)
> 
> If you like it please vote for the issue.
> 
> 
> Some comments about xml and java in general:
> XML support in java is a pain.
> Especially in container apps like a plugin system that has a class
> loading model.
> You will find million postings about problems eg. using xml in webapps
> within jboss or tomcat..
> 
> Anyway it is every-time related to class loading may incompatible
> versions of the libs are in the normal jdk lib or in the endorsed
> folder or inside any other jar that is in the class path.
> 
> From:
> http://wiki.media-style.com/pages/viewpage.action?pageId=1154
> 
> The class-loader of a plugin gets all jar libraries assigned until
> initialization that are defined in the manifest file. Beside these
> 'local' libraries, the dependency chain of a plugin is analyzed, and
> all jar libraries defined as public are assigned to the class-loader as
> well.
>   When now at runtime a class tries to load a other class, first we try
> to load the class from the plugin's class-loader. In case loading a
> class from the plugin's class-loader fails, we forward the class load
> request to the parent of the plugin class-loader.
> 
> This is the class-loader of the nutch tool that had started the plugin
> system.
> So back-end it is the runtime class-loader of java in case of the user
> interface it is the class-loader of the tomcat webapp.
> 
> 
> Hope that helps, at least we have a working rss parser now. :-)
> 
> Stefan
> 
> 
> Am 27.03.2005 um 11:12 schrieb John X:
> 
> > Chris,
> >
> > I made plugin parse-rss work by
> >
> > (1) installing jdom.jar under $nutch_top/lib,
> > instead of $nutch_top/src/plugin/parse-rss/lib
> > (2) using jaxen-{core,jdom}.jar,instead of jaxen-full.jar.
> > Related, there are some hacks necessary in commons-feedparser,
> > mostly reflecting api changes for XPath.
> >
> > (1) above is puzzling. I got the same error as you did,
> > if jdom.jar is placed under the plugin's own lib dir.
> > I am not sure it is caused by possible bug in nutch plugin core,
> > or namespace conflicting in some jars, or something else.
> >
> > Stefan (Groschupf): could you please enlighten us on possible causes?
> >
> > One note: there is a tool called net.nutch.parse.ParserChecker, that
> > you can use to debug parser plugins. It is more convenient
> > to use it than start a crawler.
> >
> > Will you be able to contribute this plugin after the dust settles?
> >
> > Best,
> >
> > John
> >
> > On Sat, Mar 26, 2005 at 01:32:34PM -0800, CHRIS A MATTMANN wrote:
> >> Hi John,
> >>
> >>   I posted it earlier as a .txt file, but since it's small I could
> >> just include it in this email:
> >>
> >>
> >> import java.net.URL;
> >> import java.net.URLClassLoader;
> >>
> >>
> >
> >
> > -------------------------------------------------------
> > SF email is sponsored by - The IT Product Guide
> > Read honest & candid reviews on hundreds of IT Products from real
> > users.
> > Discover which products truly live up to the hype. Start reading now.
> > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> > _______________________________________________
> > Nutch-developers mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/nutch-developers
> >
> >
> ---------------------------------------------------------------
> company:              http://www.media-style.com
> forum:                http://www.text-mining.org
> blog:                 http://www.find23.net

Reply via email to