[ 
https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma resolved NUTCH-2220.
----------------------------------
    Resolution: Fixed

Committed to trunk in revision 1731831. Thanks for your comments Sebastian!

> Rename db.* options used only by the linkdb to linkdb.*
> -------------------------------------------------------
>
>                 Key: NUTCH-2220
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2220
>             Project: Nutch
>          Issue Type: Task
>          Components: linkdb
>    Affects Versions: 1.11
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.12
>
>         Attachments: NUTCH-2220.patch
>
>
> We need an option db.ignore.internal.links that operates in FetcherThread, 
> just like db.ignore.external.links. It already exists but it only used by the 
> LinkDB, and defaults to true, which is no good option for FetcherThread.
> I propose to make a clear distinction between which are used for LinkDB or 
> not. Most options used by LinkDB already use the right prefix but 
> db.ignore.*.links, db.max.inlinks and db.max.anchor.length not yet.
> This patch will rename those options to linkdb.* prefixes so afterwards we 
> can implement db.ignore.internal.links that operates in FetcherThread, just 
> like db.ignore.external.links.
> This will introduce a change in default parameters. Please comment.
> h2. How to upgrade from earlier releases
> * replace your old conf/nutch-default.xml with the conf/nutch-default.xml 
> from Nutch 1.12 release
> * if you use LinkDB (e.g. invertlinks) and modified parameters 
> {{db.max.inlinks}} and/or {{db.max.anchor.length}} and/or 
> {{db.ignore.internal.links}}, rename those parameters to 
> {{linkdb.max.inlinks}} and {{linkdb.max.anchor.length}} and 
> {{linkdb.ignore.internal.links}}
> * {{db.ignore.internal.links}} and {{db.ignore.external.links}} now operate 
> on the CrawlDB only
> * {{linkdb.ignore.internal.links}} and {{linkdb.ignore.external.links}} now 
> operate on the LinkDB only



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to