Hi Folks, I just wanted to let you know that I�ve submitted the parse-rss plugin that I was working on to the JIRA system under issue �NUTCH-30� (http://issues.apache.org/jira/browse/NUTCH-30). The plugin includes a patch filie (svn diff), along with the zipped up source and runtime libraries. The rss parser is based on the commons-feedparser out of the jakarta sandbox, and fully supports all of the major rss formats (atom, rss 1.0, 2.0, etc.). Additionally, I�ve included a junit test that runs the parser on an example rss file and validates the outlinks and content extracted.
I hope that you will find it useful and vote to have it included in the nutch distro. Thanks, Chris ______________________________________________ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _________________________________________________ Jet Propulsion Laboratory Pasadena, CA Office: 171-266B Mailstop: 171-246 Phone: 818-354-8810 _______________________________________________________ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.
