pages that serverside forwards will be refetched every time
-----------------------------------------------------------
Key: NUTCH-353
URL: http://issues.apache.org/jira/browse/NUTCH-353
Project: Nutch
Issue Type: Bug
Affects Versions: 0.8.1, 0.9.0
Reporter: Stefan Groschupf
Priority: Blocker
Fix For: 0.8.1
Attachments: doNotRefecthForwarderPagesV1.patch
Pages that do a serverside forward are not written with a status change back
into the crawlDb. Also the nextFetchTime is not changed.
This causes a refetch of the same page again and again. The result is nutch is
not polite and refetching the forwarding and target page in each segment
iteration. Also it effects the scoring since the forward page contribute it's
score to all outlinks.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira