[
https://issues.apache.org/jira/browse/NUTCH-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171632#comment-14171632
]
Sebastian Nagel commented on NUTCH-1872:
----------------------------------------
The way the injected URL is set for the prefix rule seems not correct: it's set
in distributeScoreToOutlinks() to fromUrl if it's not in parse meta data. But
that's always the case (it's never added to parse meta data). We must pass the
injected URL the same way the metatags are passed: via
passScoreBeforeParsing(), passScoreAfterParsing(), distributeScoreToOutlinks().
It would be most transparent to set it initially in injectedScore(). If we find
no injected URL later in parse meta data, that's most likely an error (for
example, because of an configuration change).
> enables control over how injected metadata is propagated
> --------------------------------------------------------
>
> Key: NUTCH-1872
> URL: https://issues.apache.org/jira/browse/NUTCH-1872
> Project: Nutch
> Issue Type: New Feature
> Reporter: Jonathan Cooper-Ellis
> Priority: Minor
> Attachments: urlmeta_propagation.diff, urlmeta_propagation2.diff
>
>
> This builds on NUTCH-655 and NUTCH-855, allowing users some control over
> which outlinks receive injected metadata. A new configuration property
> "urlmeta.rule" has been introduced, with a default value of "all".
> The value "all" indicated that "urlmeta.tags" should be propagated to all
> outlinks. Other options include: "host" (propagated to outlinks with the same
> host as the url with which the metadata was injected), "domain" (same, except
> with the same domain), "prefix" (treats the injected url as a prefix, so
> metadata is only propagated to urls that extend the injected url).
> Would appreciate feedback on whether you think this is a useful feature, and
> if its implemented properly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)