Andrzej Bialecki wrote:
Wow... thats GREAT. (I'm the author of the FeedParser).Chris Mattmann wrote:
Hi Folks,
I just wanted to let you know that I�ve submitted the parse-rss plugin that
I was working on to the JIRA system under issue �NUTCH-30�
(http://issues.apache.org/jira/browse/NUTCH-30). The plugin includes a patch
filie (svn diff), along with the zipped up source and runtime libraries. The
rss parser is based on the commons-feedparser out of the jakarta sandbox,
and fully supports all of the major rss formats (atom, rss 1.0, 2.0, etc.).
Additionally, I�ve included a junit test that runs the parser on an example
rss file and validates the outlinks and content extracted.
I hope that you will find it useful and vote to have it included in the nutch distro.
+1, with some reservations (see jira).
I think it's a very useful contribution. Thank you, Chris!
BTW. Its in commons-proper now but I just haven't had a chance to do a 0.5.0 release. We've had a release candidate but I need to release another one WRT some feedback we've had.
If you're running from a sandbox build I'd HIGHLY recommend getting a commons proper build of 0.5.0RC1.
http://jakarta.apache.org/commons/feedparser/
Kevin
--
Use Rojo (RSS/Atom aggregator). Visit http://rojo.com. Ask me for an invite! Also see irc.freenode.net #rojo if you want to chat.
Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
If you're interested in RSS, Weblogs, Social Networking, etc... then you should work for Rojo! If you recommend someone and we hire them you'll get a free iPod!
Kevin A. Burton, Location - San Francisco, CA
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
