[
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939553#comment-13939553
]
Shanaka Jayasundera commented on NUTCH-1478:
--------------------------------------------
Hi All,
I've downloaded latest code from 2.x branch and try to index meta data to Solr
but Solr query results are not showing meta data.
But , parsechecker working fine . Do I need to do any additional configurations
to get meta data on solr query results.
$ ./bin/nutch parsechecker http://nutch.apache.org/
fetching: http://nutch.apache.org/
parsing: http://nutch.apache.org/
contentType: text/html
signature: b2bb805dcd51f12784190d58d619f0bc
---------
Url
---------------
http://nutch.apache.org/
---------
Metadata
---------
meta_forrest-version : 0.10-dev
meta_generator : Apache Forrest
meta_forrest-skin-name : nutch_rs_ : �
meta_content-type : text/html; charset=UTF-8
Command I'm using to crawl and Index is ,
bin/crawl urls/seed.txt TestCrawl3.1 http://localhost:8983/solr/ 2
I've not done much configuration changes, I've configure nutch-sites.xml and
gora.properties to use hbase & gora
Appreciate if anyone can help me to identify the missing configurations.
Thanks in advance.
> Parse-metatags and index-metadata plugin for Nutch 2.x series
> --------------------------------------------------------------
>
> Key: NUTCH-1478
> URL: https://issues.apache.org/jira/browse/NUTCH-1478
> Project: Nutch
> Issue Type: Improvement
> Components: parser
> Affects Versions: 2.1
> Reporter: kiran
> Fix For: 2.3
>
> Attachments: NUTCH-1478-parse-v2.patch, NUTCH-1478v3.patch,
> NUTCH-1478v4.patch, NUTCH-1478v5.1.patch, NUTCH-1478v5.patch,
> NUTCH-1478v6.patch, Nutch1478.patch, Nutch1478.zip,
> metadata_parseChecker_sites.png
>
>
> I have ported parse-metatags and index-metadata plugin to Nutch 2.x series.
> This will take multiple values of same tag and index in Solr as i patched
> before (https://issues.apache.org/jira/browse/NUTCH-1467).
> The usage is same as described here
> (http://wiki.apache.org/nutch/IndexMetatags) but one change is that there is
> no need to give 'metatag' keyword before metatag names. For example my
> configuration looks like this
> (https://github.com/salvager/NutchDev/blob/master/runtime/local/conf/nutch-site.xml)
>
> This is only the first version and does not include the junit test. I will
> update the new version soon.
> This will parse the tags and index the tags in Solr. Make sure you create the
> fields in 'index.parse.md' in nutch-site.xml in schema.xml in Solr.
> Please let me know if you have any suggestions
> This is supported by DLA (Digital Library and Archives) of Virginia Tech.
--
This message was sent by Atlassian JIRA
(v6.2#6252)