[ 
https://issues.apache.org/jira/browse/NUTCH-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdy Galema updated NUTCH-1448:
--------------------------------

    Attachment: nutch-1448.txt

Thank you for you interest Christian. This issue should indeed prevent that 
problem. (Note it does not fix already present corrupt entries in the table, 
you should remove those by hand or solve them otherwise).

Here is the patch. I have been running this functionality for quite some time 
now. If anyone has suggestions let them know.
                
> Redirected urls should be handled more cleanly (more like an outlink url)
> -------------------------------------------------------------------------
>
>                 Key: NUTCH-1448
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1448
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Ferdy Galema
>             Fix For: 2.1
>
>         Attachments: nutch-1448.txt
>
>
> This is specifically for Nutch2.x. Handling a redirects url like an outlink 
> is much more cleaner because this makes it more simple to trace how new urls 
> are added to the webpage database. Instant fetching of redirects won't work, 
> but this is a small price to pay. (Note that this currently does not work at 
> all, because the http.max.redirect property has no effect). Will be attaching 
> a patch in the upcoming days.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to