[ 
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849143#comment-13849143
 ] 

Markus Jelsma commented on NUTCH-1325:
--------------------------------------

Hi Tejas,

(1):
Current mapper is:
{code}
        if (datum.numFailures() >= failureThreshold) {

          // TODO: also write to external storage, i.e. memcache
          context.write(key, emptyText);
        }
{code}

If we change this to:
{code}
          context.write(key, datum.numFailures());
{code}

Then in the reducer we can check if all hosts have failed, then emit domain 
name. If one host hasn't failed, we have to emit all the failed host names.

(2):
Perhaps we can retry with https:// and other scheme's if the first fails with 
http://. It is ugly but should work.

Cheers,

> HostDB for Nutch
> ----------------
>
>                 Key: NUTCH-1325
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1325
>             Project: Nutch
>          Issue Type: New Feature
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.9
>
>         Attachments: NUTCH-1325-1.6-1.patch, NUTCH-1325.trunk.v2.path
>
>
> A HostDB for Nutch and associated tools to create and read a database 
> containing information on hosts.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to