[ 
https://issues.apache.org/jira/browse/NUTCH-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669947#action_12669947
 ] 

Andrzej Bialecki  commented on NUTCH-353:
-----------------------------------------

Actually, the problem in the issue description is solved now. I'm closing this 
one, and the remaining functionality should be tracked as an enhancement in a 
separate issue.

> pages that serverside forwards will be refetched every time
> -----------------------------------------------------------
>
>                 Key: NUTCH-353
>                 URL: https://issues.apache.org/jira/browse/NUTCH-353
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 0.8.1, 0.9.0
>            Reporter: Stefan Groschupf
>            Assignee: Andrzej Bialecki 
>             Fix For: 1.0.0
>
>         Attachments: doNotRefecthForwarderPagesV1.patch
>
>
> Pages that do a serverside forward are not written with a status change back 
> into the crawlDb. Also the nextFetchTime is not changed. 
> This causes a refetch of the same page again and again. The result is nutch 
> is not polite and refetching the forwarding and target page in each segment 
> iteration. Also it effects the scoring since the forward page contribute it's 
> score to all outlinks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to