[
https://issues.apache.org/jira/browse/NUTCH-887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898620#action_12898620
]
Chris A. Mattmann commented on NUTCH-887:
-----------------------------------------
Hey Julien:
+1 to relying on Tika for RSS parsing. If there's something missing that Nutch
needs, we'll add it to Tika and roll it into 0.8.
{quote}
There is also the parse-rss plugin in Nutch which is quite similar - what's the
difference with the feed one again? Since the Tika parser would handle all
sorts of feed formats why not simply rely on it?
{quote}
I wrote parse-rss back in 2005, and used commons-feedparser from Kevin Burton
and his crew. At the time it was well developed, and a little more flexible and
easier for me to pick up than Rome. Since then however, its development has
really become stagnant and it is no longer maintained.
In terms of real differences in terms of functionality, they are roughly
equivalent so there isn't much difference. I would suggest we move forward with
the feed plugin in Tika and roll it back in through Nutch.
> Delegate parsing of feeds to Tika
> ---------------------------------
>
> Key: NUTCH-887
> URL: https://issues.apache.org/jira/browse/NUTCH-887
> Project: Nutch
> Issue Type: Wish
> Components: parser
> Affects Versions: 2.0
> Reporter: Julien Nioche
> Fix For: 2.0
>
>
> [Starting a new thread from https://issues.apache.org/jira/browse/NUTCH-874]
> One of the plugins which hasn't been ported yet is the feed parser. We could
> rely on the one we recently added to Tika, knowing that there is a
> substantial difference in the sense that the Tika feed parser generates a
> simple XHTML representation of the document where the feeds are simply
> represented as anchors whereas the Nutch version created new documents for
> each feed.
> There is also the parse-rss plugin in Nutch which is quite similar - what's
> the difference with the feed one again? Since the Tika parser would handle
> all sorts of feed formats why not simply rely on it?
> Any thoughts on this?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.