Chris Mattmann wrote:
Hi Folks,
I just wanted to let you know that I�ve submitted the parse-rss plugin that
I was working on to the JIRA system under issue �NUTCH-30�
(http://issues.apache.org/jira/browse/NUTCH-30). The plugin includes a patch
filie (svn diff), along with the zipped up source and runtime libraries. The
rss parser is based on the commons-feedparser out of the jakarta sandbox,
and fully supports all of the major rss formats (atom, rss 1.0, 2.0, etc.).
Additionally, I�ve included a junit test that runs the parser on an example
rss file and validates the outlinks and content extracted.
I hope that you will find it useful and vote to have it included in the
nutch distro.
+1, with some reservations (see jira).
I think it's a very useful contribution. Thank you, Chris!
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com