Injecting Crawl metadata
------------------------

                 Key: NUTCH-655
                 URL: https://issues.apache.org/jira/browse/NUTCH-655
             Project: Nutch
          Issue Type: Improvement
          Components: injector
            Reporter: julien nioche
            Priority: Minor
         Attachments: Injector.patch

the patch attached allows to inject metadata into the crawlDB. The input file 
has to contain fields separated by tabs, with the URL being on the first 
column. The metadata names and values are separated by '='. A input line might 
look like this:
http://www.myurl.com  \t  categ=value1 \t categ2=value2

This functionality can be useful to store external knowledge and index it with 
a custom plugin

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to