Markus Jelsma created NUTCH-2445:
------------------------------------

             Summary: Fetcher following outlinks to keep track of already 
fetched items
                 Key: NUTCH-2445
                 URL: https://issues.apache.org/jira/browse/NUTCH-2445
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: 1.13
            Reporter: Markus Jelsma
            Assignee: Markus Jelsma
             Fix For: 1.14


When fetcher.follow.outlinks.depth is non-zero, fetcher follows outlinks. This 
patch keeps track of already fetched URL's and thus avoid fetching the same URL 
twice.

A Set is used to keep track of them, hashcodes to reduce memory usage. This is 
not used if fetcher doesn't follow outlinks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to