Chris Mattmann wrote:
Hi Folks,

 I just wanted to let you know that I�ve submitted the parse-rss plugin that
I was working on to the JIRA system under issue �NUTCH-30�
(http://issues.apache.org/jira/browse/NUTCH-30). The plugin includes a patch
filie (svn diff), along with the zipped up source and runtime libraries. The
rss parser is based on the commons-feedparser out of the jakarta sandbox,
and fully supports all of the major rss formats (atom, rss 1.0, 2.0, etc.).
Additionally, I�ve included a junit test that runs the parser on an example
rss file and validates the outlinks and content extracted.

I hope that you will find it useful and vote to have it included in the
nutch distro.

+1, with some reservations (see jira).

I think it's a very useful contribution. Thank you, Chris!

--
Best regards,
Andrzej Bialecki
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Reply via email to