[
https://issues.apache.org/jira/browse/NUTCH-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181937#comment-17181937
]
Sebastian Nagel commented on NUTCH-1150:
----------------------------------------
This is solved by NUTCH-2776 using a cache limited by size and time. Thanks,
for the patch, [~vijithv], nice solution using a Bloom filter. Of course, while
memory efficient the Bloom filter has the disadvantage that the false positives
redirects are not (or only later) followed.
> http.redirect.max can lead to multiple parses of the same url
> -------------------------------------------------------------
>
> Key: NUTCH-1150
> URL: https://issues.apache.org/jira/browse/NUTCH-1150
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 1.3, 1.4
> Reporter: Markus Jelsma
> Priority: Major
> Attachments: NUTCH-1150.patch
>
>
> With http.redirect.max > 0 it's possible that a document is parsed multiple
> times. This is the case when several url's from the same fetch redirect to a
> shared location.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)