[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-578:
--------------------------------
Fix Version/s: (was: 1.5)
1.6
> URL fetched with 403 is generated over and over again
> -----------------------------------------------------
>
> Key: NUTCH-578
> URL: https://issues.apache.org/jira/browse/NUTCH-578
> Project: Nutch
> Issue Type: Bug
> Components: generator
> Affects Versions: 1.0.0
> Environment: Ubuntu Gutsy Gibbon (7.10) running on VMware server. I
> have checked out the most recent version of the trunk as of Nov 20, 2007
> Reporter: Nathaniel Powell
> Assignee: Markus Jelsma
> Fix For: 1.6
>
> Attachments: NUTCH-578.patch, NUTCH-578_v2.patch, NUTCH-578_v3.patch,
> NUTCH-578_v4.patch, crawl-urlfilter.txt, nutch-site.xml, regex-normalize.xml,
> urls.txt
>
>
> I have not changed the following parameter in the nutch-default.xml:
> <property>
> <name>db.fetch.retry.max</name>
> <value>3</value>
> <description>The maximum number of times a url that has encountered
> recoverable errors is generated for fetch.</description>
> </property>
> However, there is a URL which is on the site that I'm crawling,
> www.teachertube.com, which keeps being generated over and over again for
> almost every segment (many more times than 3):
> fetch of http://www.teachertube.com/images/ failed with: Http code=403,
> url=http://www.teachertube.com/images/
> This is a bug, right?
> Thanks.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira