Dave, you don't want to "inject" anything per-se, at least according
to nutch terminology. Instead, you'll want create your own synthetic
crawler. Nutch's crawler outputs one "segment file" (directory of
files, actually) per crawler pass. It is this segment that is
processed by the "nutch index" stage.
So, create a program that iterates through your content and writes it
to a segment file, simulating the crawler's output. Just read the
source for Fetcher.java to see how it uses
org.apache.nutch.segment.SegmentWriter and mimic that. Then follow
the rest of the tutorial as if your segment files had fallen out of
the real crawler.
--Matt
On Sep 26, 2005, at 2:32 PM, Goldschmidt, Dave wrote:
Hello,
Is there an API of some sort for injecting content into Nutch
*without*
using Nutch's crawler? Or does anyone have ideas as to how to
approach
this problem? I.e. given a URL, a page of content, metadata about the
page, links, etc., how can I inject this into Nutch without Nutch
performing the crawl?
Thanks in advance for your ideas and insights,
DaveG
--
Matt Kangas / [EMAIL PROTECTED]