[
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13541320#comment-13541320
]
J. Gobel commented on NUTCH-1478:
---------------------------------
Hi Kiran,
I unpacked the zip file in my plugin folder.Then I wget the file to my
/src/plugin folder and applied the patch using . patch -p0 < Nutch1478.patch .
I used your xml file, changed a few things and rebuild runtime with ant. I use
MYSQL for example, and changed the path to my plugins folder.
I checked with parsechecker and this is the result:
:~/nutch2/nutch/runtime/local# bin/nutch parsechecker http://www.google.nl
---------
Url
---------------
http://www.google.nl
---------
Metadata
---------
I emptied my SQL database, to start from scratch. Did a crawl, and still in the
field Metadata what I see is still 'garbage'. I have my Nutch 2.1 configured
according to : http://nlp.solutions.asia/?p=180
Perhaps you can share your schema.xml file as well? Maybe I am doing something
wrong in there??
Thanks in advance,
Jaap
> Parse-metatags and index-metadata plugin for Nutch 2.x series
> --------------------------------------------------------------
>
> Key: NUTCH-1478
> URL: https://issues.apache.org/jira/browse/NUTCH-1478
> Project: Nutch
> Issue Type: Improvement
> Components: parser
> Affects Versions: 2.1
> Reporter: kiran
> Attachments: Nutch1478.patch, Nutch1478.zip
>
>
> I have ported parse-metatags and index-metadata plugin to Nutch 2.x series.
> This will take multiple values of same tag and index in Solr as i patched
> before (https://issues.apache.org/jira/browse/NUTCH-1467).
> The usage is same as described here
> (http://wiki.apache.org/nutch/IndexMetatags) but one change is that there is
> no need to give 'metatag' keyword before metatag names. For example my
> configuration looks like this
> (https://github.com/salvager/NutchDev/blob/master/runtime/local/conf/nutch-site.xml)
>
> This is only the first version and does not include the junit test. I will
> update the new version soon.
> This will parse the tags and index the tags in Solr. Make sure you create the
> fields in 'index.parse.md' in nutch-site.xml in schema.xml in Solr.
> Please let me know if you have any suggestions
> This is supported by DLA (Digital Library and Archives) of Virginia Tech.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira