[ 
https://issues.apache.org/jira/browse/NUTCH-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987520#comment-13987520
 ] 

Julien Nioche commented on NUTCH-1622:
--------------------------------------

Hi Daniel

Sorry for not commenting on your patch before, I hadn't seen it. We need a more 
generic mechanism than the HTMLParser for this as it would have to be done 
potentially for all the flavours of Parsers that can exist (e.g. Tika one). 
Nutch 2.x does not have a ParseOutputFormat. Wouldn't it be better to write the 
metadata for the outlinks as part of the DbUpdate* code? 

> Create Outlinks with metadata
> -----------------------------
>
>                 Key: NUTCH-1622
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1622
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.7, 2.2.1
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>             Fix For: 1.8, 2.4
>
>         Attachments: NUTCH-1622-2.x.patch, NUTCH-1622.patch
>
>
> Having the possibility to specify metadata when creating an outlink is 
> extremely useful as it allows to pass information from a source page to the 
> pages it links to. We use that routinely within our custom parsers in 
> combination with the url-meta plugin.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to