[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849143#comment-13849143
]
Markus Jelsma commented on NUTCH-1325:
--------------------------------------
Hi Tejas,
(1):
Current mapper is:
{code}
if (datum.numFailures() >= failureThreshold) {
// TODO: also write to external storage, i.e. memcache
context.write(key, emptyText);
}
{code}
If we change this to:
{code}
context.write(key, datum.numFailures());
{code}
Then in the reducer we can check if all hosts have failed, then emit domain
name. If one host hasn't failed, we have to emit all the failed host names.
(2):
Perhaps we can retry with https:// and other scheme's if the first fails with
http://. It is ugly but should work.
Cheers,
> HostDB for Nutch
> ----------------
>
> Key: NUTCH-1325
> URL: https://issues.apache.org/jira/browse/NUTCH-1325
> Project: Nutch
> Issue Type: New Feature
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.9
>
> Attachments: NUTCH-1325-1.6-1.patch, NUTCH-1325.trunk.v2.path
>
>
> A HostDB for Nutch and associated tools to create and read a database
> containing information on hosts.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)