Goldschmidt, Dave wrote:
Hello,

Is there an API of some sort for injecting content into Nutch *without*
using Nutch's crawler?  Or does anyone have ideas as to how to approach
this problem?  I.e. given a URL, a page of content, metadata about the
page, links, etc., how can I inject this into Nutch without Nutch
performing the crawl?

Thanks in advance for your ideas and insights,

DaveG

You may want to open the source of the Fetcher.java and look at handleFetch. You'll see content parsing and how it is written to a segment. From there you can decern how to use the API and how it fits your needs.

-j

Reply via email to