[jira] [Commented] (NUTCH-1622) Create Outlinks with metadata

Daniel Kugel (JIRA) Tue, 06 May 2014 03:39:04 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990521#comment-13990521
 ]


Daniel Kugel commented on NUTCH-1622:
-------------------------------------

I don't have any strong feeling to where this code should be, so feel free to 
move it around. :-)

To my understanding the content should be only parsed in the parsing phase, so 
if any metadata is extracted it should be extracted at that stage.
Are you suggesting the DbUpdate code to parse the content again?
Metadata extraction seems like a parser feature because it is the only 
component that should read ("parse") the content and it seems reasonable to 
have a metadata aware parsers and metadata-ignorant parsers.
When adding a metadata element the parser is the only one who know what type of 
data he is currently parsing.
Perhaps we can add some form of hook methods or plugins for the parsers 
themselves to control what to do with each element they encounter? To decide if 
its metadata or not and if so what to do with it? I agree it seems complicated 
but on the other hand who else is eligible to parse content other than the 
parser?

> Create Outlinks with metadata
> -----------------------------
>
>                 Key: NUTCH-1622
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1622
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.7, 2.2.1
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>             Fix For: 1.8, 2.4
>
>         Attachments: NUTCH-1622-2.x.patch, NUTCH-1622.patch
>
>
> Having the possibility to specify metadata when creating an outlink is 
> extremely useful as it allows to pass information from a source page to the 
> pages it links to. We use that routinely within our custom parsers in 
> combination with the url-meta plugin.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (NUTCH-1622) Create Outlinks with metadata

Reply via email to