Hi Folks,

 I just wanted to let you know that I�ve submitted the parse-rss plugin that
I was working on to the JIRA system under issue �NUTCH-30�
(http://issues.apache.org/jira/browse/NUTCH-30). The plugin includes a patch
filie (svn diff), along with the zipped up source and runtime libraries. The
rss parser is based on the commons-feedparser out of the jakarta sandbox,
and fully supports all of the major rss formats (atom, rss 1.0, 2.0, etc.).
Additionally, I�ve included a junit test that runs the parser on an example
rss file and validates the outlinks and content extracted.

I hope that you will find it useful and vote to have it included in the
nutch distro.

Thanks,
  Chris 

______________________________________________
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group
 
_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                        Mailstop:  171-246
Phone:  818-354-8810
_______________________________________________________
 
Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.
 
 


Reply via email to