[ 
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1478:
----------------------------------------

    Attachment: NUTCH-1478v3.patch

Previous patch did not compile.
This patch adds in index-metadata plugin as per origin patch, adds correct 
formatting. Finally, in addition to the existing patch, I've added a small 
improvement which checks that the metatags string array has more than one value 
before adding \t.
if you apply the patch you will see the test failing for TestMetatagsParser... 
this needs fixed but i won't be able to do it right now.

> Parse-metatags and index-metadata plugin for Nutch 2.x series 
> --------------------------------------------------------------
>
>                 Key: NUTCH-1478
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1478
>             Project: Nutch
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 2.1
>            Reporter: kiran
>             Fix For: 2.3
>
>         Attachments: NUTCH-1478-parse-v2.patch, NUTCH-1478v3.patch, 
> Nutch1478.patch, Nutch1478.zip, metadata_parseChecker_sites.png
>
>
> I have ported parse-metatags and index-metadata plugin to Nutch 2.x series.  
> This will take multiple values of same tag and index in Solr as i patched 
> before (https://issues.apache.org/jira/browse/NUTCH-1467).
> The usage is same as described here 
> (http://wiki.apache.org/nutch/IndexMetatags) but one change is that there is 
> no need to give 'metatag' keyword before metatag names. For example my 
> configuration looks like this 
> (https://github.com/salvager/NutchDev/blob/master/runtime/local/conf/nutch-site.xml)
>  
> This is only the first version and does not include the junit test. I will 
> update the new version soon.
> This will parse the tags and index the tags in Solr. Make sure you create the 
> fields in 'index.parse.md' in nutch-site.xml in schema.xml in Solr.
> Please let me know if you have any suggestions
> This is supported by DLA (Digital Library and Archives) of Virginia Tech.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to