[ 
https://issues.apache.org/jira/browse/NUTCH-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168785#comment-14168785
 ] 

Jonathan Cooper-Ellis edited comment on NUTCH-1872 at 10/12/14 9:05 PM:
------------------------------------------------------------------------

Hi Sebastian, thanks for the feedback!

My thought process with keeping the injected URL was for a case like this:

1) www.whatever.com/foo is injected with some metadata and "prefix" as the 
propagation rule
2) www.whatever.com/foo contains an outlink to www.whatever.com/foo/bar, which 
fits the rule so the metadata is propagated (www.whatever.com/foo does not 
contain an outlink to www.whatever.com/foo/baz)
3) www.whatever.com/foo/bar contains an outlink to www.whatever.com/foo/baz, 
which does fit the rule for the injected prefix (www.whatever.com/foo), but 
using www.whatever.com/foo/bar as the prefix will fail the rule

Does that make sense, or am I confusing myself?

I was/am on the fence about it anyways though, because it seems like a rare 
case, but I feel like there must be situations where something like that could 
show up. I'd have no issue with removing that in favor of making it a little 
cleaner, though. What do you think?


was (Author: jcoopere):
Hi Sebastian, thanks for the feedback!

My thought process with keeping the injected URL was for a case like this:

1) www.whatever.com/foo is injected with some metadata and "prefix" as the 
propagation rule
2) www.whatever.com/foo contains an outlink only to www.whatever.com/foo/bar, 
which fits the rule so the metadata is propagated
3) www.whatever.com/foo/bar contains an outlink to www.whatever.com/foo/baz, 
which does fit the rule for the injected prefix (www.whatever.com/foo), but 
using www.whatever.com/foo/bar as the prefix will fail the rule

Does that make sense, or am I confusing myself?

I was/am on the fence about it anyways though, because it seems like a rare 
case, but I feel like there must be situations where something like that could 
show up. I'd have no issue with removing that in favor of making it a little 
cleaner, though. What do you think?

> enables control over how injected metadata is propagated
> --------------------------------------------------------
>
>                 Key: NUTCH-1872
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1872
>             Project: Nutch
>          Issue Type: New Feature
>            Reporter: Jonathan Cooper-Ellis
>            Priority: Minor
>         Attachments: urlmeta_propagation.diff
>
>
> This builds on NUTCH-655 and NUTCH-855, allowing users some control over 
> which outlinks receive injected metadata. A new configuration property 
> "urlmeta.rule" has been introduced, with a default value of "all".
> The value "all" indicated that "urlmeta.tags" should be propagated to all 
> outlinks. Other options include: "host" (propagated to outlinks with the same 
> host as the url with which the metadata was injected), "domain" (same, except 
> with the same domain), "prefix" (treats the injected url as a prefix, so 
> metadata is only propagated to urls that extend the injected url).
> Would appreciate feedback on whether you think this is a useful feature, and 
> if its implemented properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to