[
https://issues.apache.org/jira/browse/NUTCH-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601847#comment-14601847
]
Sebastian Nagel commented on NUTCH-1730:
----------------------------------------
* there is a typo in the property name ("exteral")
{code}
ignoreExternal = conf.getBoolean("scoring.depth.ignore.exteral", false);
{code}
* the property should be described in nutch-default.xml
* what's the aim of
{code}
- if (curDepth >= curMaxDepth) {
+ if (curMaxDepth > 0 && curDepth >= curMaxDepth) {
{code}
A curMaxDepth of 0 (or -1) would mean: accept everything up to an unlimited
linkage depth. Right? Since -1 is used quite often with meaning "unlimited",
that's a good idea. But we should make this explicit (add to Java doc,
nutch-default,xml), and use it also for DEFAULT_MAX_DEPTH.
Just curious what's the use case? Doesn't the depth easily get out of control?
E.g., if a seed document links to an external page which links back to a page
deep on the first site, the deep page becomes equivalent to the seed doc.
> Scoring-depth optionally not to increment depth for external hosts
> ------------------------------------------------------------------
>
> Key: NUTCH-1730
> URL: https://issues.apache.org/jira/browse/NUTCH-1730
> Project: Nutch
> Issue Type: New Feature
> Affects Versions: 1.7
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.11
>
> Attachments: NUTCH-1730-trunk.patch, NUTCH-1730.patch
>
>
> Currently, the plugin always increments depth, even when coming or going to
> external hosts.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)