[ 
https://issues.apache.org/jira/browse/NUTCH-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601847#comment-14601847
 ] 

Sebastian Nagel commented on NUTCH-1730:
----------------------------------------

* there is a typo in the property name ("exteral")
{code}
ignoreExternal = conf.getBoolean("scoring.depth.ignore.exteral", false);
{code}
* the property should be described in nutch-default.xml
* what's the aim of
{code}
-    if (curDepth >= curMaxDepth) {
+    if (curMaxDepth > 0 && curDepth >= curMaxDepth) {
{code}
A curMaxDepth of 0 (or -1) would mean: accept everything up to an unlimited 
linkage depth. Right? Since -1 is used quite often with meaning "unlimited", 
that's a good idea. But we should make this explicit (add to Java doc, 
nutch-default,xml), and use it also for DEFAULT_MAX_DEPTH.

Just curious what's the use case? Doesn't the depth easily get out of control? 
E.g., if a seed document links to an external page which links back to a page 
deep on the first site, the deep page becomes equivalent to the seed doc.

> Scoring-depth optionally not to increment depth for external hosts
> ------------------------------------------------------------------
>
>                 Key: NUTCH-1730
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1730
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 1.7
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.11
>
>         Attachments: NUTCH-1730-trunk.patch, NUTCH-1730.patch
>
>
> Currently, the plugin always increments depth, even when coming or going to 
> external hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to