[ 
https://issues.apache.org/jira/browse/NUTCH-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002927#comment-14002927
 ] 

Markus Jelsma commented on NUTCH-1757:
--------------------------------------

Hi, metadata is passed via CrawlDatum, yet a CrawlDatum never makes it to a 
parser filter nor a parser implementation, what am i missing?

By the way, it may be a good idea to have it passed (and optionally returned) 
to a parser filter. It would allow to change the CrawlDatum status in a parser 
or parser filter. This is useful when you want to run Javascript to detect 
redirects or have a classifier for soft-404's.


> ParserChecker to take custom metadata as input
> ----------------------------------------------
>
>                 Key: NUTCH-1757
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1757
>             Project: Nutch
>          Issue Type: Improvement
>          Components: nutchNewbie, parser
>    Affects Versions: 1.8
>            Reporter: Julien Nioche
>            Priority: Minor
>             Fix For: 1.9
>
>         Attachments: NUTCH-1757.patch
>
>
> The patch attached allows to pass custom metadata on the command line (-md 
> key=value) to the ParserChecker. This allows to have a similar behaviour as 
> injecting metadata via the seed files. Some custom parser implementations can 
> rely on such metadata, which is why the ParserChecker must allow to pass 
> them. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to