[ 
https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471955
 ] 

Chris A. Mattmann commented on NUTCH-444:
-----------------------------------------

Hi Renaud,

 In fact, Rome does appear to be quite easy to use, given the above coding 
example. If I recall, the main issues that I had with it before involved the 
large amount of external libraries that it required in order to run it (which 
may not be the case anymore). Additionally, I recall there being an issue with 
the fact that Rome loaded the entire RSS structure into memory; on the other 
hand, commons-feedparser uses a SAX-based approach, which I really liked.

 So, those were some of the deterrents when I originally evaluated the 
technologies circa May 2005. I'm not against adapting the current parse-rss 
plugin, or alternatively writing a parse-rss++ that utilizes a different 
underlying feedparser technology. I just need to be convinced that it makes 
sense. Non-active development is not a valid excuse for switching libraries -- 
I've seen a number of really nice implementations and projects that produced an 
awesome piece of software only to have developers abandon active development on 
it (I won't name names, but they're out there if you look). This doesn't take 
away from the fact that the software works, is proven, and suits the needs of 
the developers that use it.

  In any case, I'll take the lead on shepherding anything produced out of this 
into the sources. Look forward to working with you all.

Cheers,
  Chris



> Possibly use a different library to parse RSS feed for improved performance 
> and compatibility
> ---------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-444
>                 URL: https://issues.apache.org/jira/browse/NUTCH-444
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 0.9.0
>            Reporter: Renaud Richardet
>            Priority: Minor
>             Fix For: 0.9.0
>
>
> As discussed by Nutch Newbie, Gal, and Chris on NUTCH-443, the current 
> library (feedparser) has the following issues:
> - OutOfMemory when parsing > 100k feeds, since it has to convert the feed to 
> jdom first
> - no support for Atom 1.0
> - there has been no development in the last year
> Alternatives are:
> - Rome 
> - Informa
> - custom implementation based on Stax
> - ??

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to