[ https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157816#comment-15157816 ]
Sebastian Nagel commented on NUTCH-2221: ---------------------------------------- +1 Just to consider: the additional argument to ParseOutputFormat.filterNormalize(...) may conflict with changes for NUTCH-2144. > Introduce db.ignore.internal.links to FetcherThread > --------------------------------------------------- > > Key: NUTCH-2221 > URL: https://issues.apache.org/jira/browse/NUTCH-2221 > Project: Nutch > Issue Type: Improvement > Components: fetcher > Affects Versions: 1.11 > Reporter: Markus Jelsma > Fix For: 1.12 > > Attachments: NUTCH-2216-NUTCH-2220-NUTCH-2221.patch, NUTCH-2221.patch > > > FetcherThread has support for db.ignore.external.links. In config you can > find db.ignore.internal.links as well, but it only operates on LinkDB, which > is confusing. This patch will introduce db.ignore.internal.links to > FetcherThread, similar to db.ignore.external.links. With both parameter set > to true you can limit the crawl to the injected seed list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)