Feed Plugin Crawl Links

Richard Bergmann Wed, 07 Aug 2013 09:56:56 -0700

NUTCH 1.7

I am using the feed parse plugin to consume index an RSS feed.  While content 
(title and description) of each feed item is indexed, what I would *really* 
like is to crawl and index the content of the page that the item links to.


Is this something that is supposed to happen but is not for some reason (i.e., 
I have it configured improperly)?  Or is it not designed to crawl the link?  If 
the latter, is there some way to *make* it crawl that link.

FYI, the parse-plugins.xml file has (for relevant RSS entries):

<mimeType name="application/xml">
  <plugin id="parse-tika" />
  <plugin id="feed" />
</mimeType>

Using this configuration the feed parser plugin is NEVER invoked, but my links 
are crawled.  Switching the order results in only the title and description of 
the feed item being indexed, but the link is not crawled (e.g., perhaps the 
parse-tika plugin is not called?).

Rich Bergmann

Feed Plugin Crawl Links

Reply via email to