Julien, I am trying to save myself a bit of time here by asking you this question (and making all subscribers listen!) before digging into the code:
Based on this patch (which I have applied), where will the metadata show up when it gets to my IndexingFilter extension? CrawlDatum.getMetaData()? Somewhere else? Do I have to modify an Html parser to ensure the metadata gets to my IndexingFilter? With the current "feed" Parser and IndexingFilter the metadata I am interested in is stuffed into the parse metadata: Parse.getData().getParseMeta(). Thank you! Rich Bergmann -----Original Message----- From: Julien Nioche (JIRA) [mailto:j...@apache.org] Sent: Thursday, August 08, 2013 11:07 AM To: dev@nutch.apache.org Subject: [jira] [Updated] (NUTCH-1622) Create Outlinks with metadata [ https://issues.apache.org/jira/browse/NUTCH-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1622: --------------------------------- Attachment: NUTCH-1622.patch > Create Outlinks with metadata > ----------------------------- > > Key: NUTCH-1622 > URL: https://issues.apache.org/jira/browse/NUTCH-1622 > Project: Nutch > Issue Type: New Feature > Components: parser > Affects Versions: 1.7, 2.2.1 > Reporter: Julien Nioche > Attachments: NUTCH-1622.patch > > > Having the possibility to specify metadata when creating an outlink is > extremely useful as it allows to pass information from a source page to the > pages it links to. We use that routinely within our custom parsers in > combination with the url-meta plugin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira