Julien,

I am trying to save myself a bit of time here by asking you this question (and 
making all subscribers listen!) before digging into the code:

Based on this patch (which I have applied), where will the metadata show up 
when it gets to my IndexingFilter extension?  CrawlDatum.getMetaData()?  
Somewhere else?  Do I have to modify an Html parser to ensure the metadata gets 
to my IndexingFilter?

With the current "feed" Parser and IndexingFilter the metadata I am interested 
in is stuffed into the parse metadata: Parse.getData().getParseMeta().

Thank you!

Rich Bergmann

-----Original Message-----
From: Julien Nioche (JIRA) [mailto:j...@apache.org] 
Sent: Thursday, August 08, 2013 11:07 AM
To: dev@nutch.apache.org
Subject: [jira] [Updated] (NUTCH-1622) Create Outlinks with metadata


     [ 
https://issues.apache.org/jira/browse/NUTCH-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated NUTCH-1622:
---------------------------------

    Attachment: NUTCH-1622.patch
    
> Create Outlinks with metadata
> -----------------------------
>
>                 Key: NUTCH-1622
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1622
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.7, 2.2.1
>            Reporter: Julien Nioche
>         Attachments: NUTCH-1622.patch
>
>
> Having the possibility to specify metadata when creating an outlink is 
> extremely useful as it allows to pass information from a source page to the 
> pages it links to. We use that routinely within our custom parsers in 
> combination with the url-meta plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators 
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to