[ https://issues.apache.org/jira/browse/NUTCH-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018232#comment-15018232 ]
Julien Nioche commented on NUTCH-2069: -------------------------------------- no probs. Would be good to find a way to format based on the Eclipse XML config with an ANT task. There is a way to do it with Maven but haven't seen one for ANT. > Ignore external links based on domain > ------------------------------------- > > Key: NUTCH-2069 > URL: https://issues.apache.org/jira/browse/NUTCH-2069 > Project: Nutch > Issue Type: Improvement > Components: fetcher, parser > Affects Versions: 1.10 > Reporter: Julien Nioche > Fix For: 1.11 > > Attachments: NUTCH-2069.patch, NUTCH-2069.v2.patch > > > We currently have `db.ignore.external.links` which is a nice way of > restricting the crawl based on the hostname. This adds a new parameter > 'db.ignore.external.links.domain' to do the same based on the domain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)