[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471955 ]
Chris A. Mattmann commented on NUTCH-444: ----------------------------------------- Hi Renaud, In fact, Rome does appear to be quite easy to use, given the above coding example. If I recall, the main issues that I had with it before involved the large amount of external libraries that it required in order to run it (which may not be the case anymore). Additionally, I recall there being an issue with the fact that Rome loaded the entire RSS structure into memory; on the other hand, commons-feedparser uses a SAX-based approach, which I really liked. So, those were some of the deterrents when I originally evaluated the technologies circa May 2005. I'm not against adapting the current parse-rss plugin, or alternatively writing a parse-rss++ that utilizes a different underlying feedparser technology. I just need to be convinced that it makes sense. Non-active development is not a valid excuse for switching libraries -- I've seen a number of really nice implementations and projects that produced an awesome piece of software only to have developers abandon active development on it (I won't name names, but they're out there if you look). This doesn't take away from the fact that the software works, is proven, and suits the needs of the developers that use it. In any case, I'll take the lead on shepherding anything produced out of this into the sources. Look forward to working with you all. Cheers, Chris > Possibly use a different library to parse RSS feed for improved performance > and compatibility > --------------------------------------------------------------------------------------------- > > Key: NUTCH-444 > URL: https://issues.apache.org/jira/browse/NUTCH-444 > Project: Nutch > Issue Type: Improvement > Components: fetcher > Affects Versions: 0.9.0 > Reporter: Renaud Richardet > Priority: Minor > Fix For: 0.9.0 > > > As discussed by Nutch Newbie, Gal, and Chris on NUTCH-443, the current > library (feedparser) has the following issues: > - OutOfMemory when parsing > 100k feeds, since it has to convert the feed to > jdom first > - no support for Atom 1.0 > - there has been no development in the last year > Alternatives are: > - Rome > - Informa > - custom implementation based on Stax > - ?? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.