I have been trying to get to grips with
org.apache.nutch.crawl.Injector to help with a requirement I have for
the project I'm working on and I'm a little confused about one place.
On lines 120 - 121 any existing CrawlDatum is used instead of the
newly injected one. This doesn't seem to make sense from my point of
view, I'm guessing it's just a matter of not being able to see the
issue from the other side. The scenario I an in is as such, when I
inject a url it is because I want it to be re-indexed, maybe because
it's changed, I don't care if that url's already in the crawldb I want
it re-indexed. As far as I can see, if this wasn't the case I wouldn't
be trying to inject it.

What am I missing here? Why is the existing CrawlDatum used instead of
the newly injected one?

Cheers
Rob

Reply via email to