[
https://issues.apache.org/jira/browse/NUTCH-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066494#comment-13066494
]
Lewis John McGibbney commented on NUTCH-62:
-------------------------------------------
There are various comments above which create slight confusion about what to do
to resolve this issue... or infact what exactly the issue is that needs to be
resolved!
Is there a requirement to rework the htmlMetaProcessor class to incorporate the
suggestions above e.g. "consistent schema in both cases..."
Protocol.metadata aside, what we are essentially talking about is picking up
all Parsedata.metadata included within meta tags which I assume we would wish
to index at a later stage. Focussing on the HTMLMetaProcessor class we already
acquire name, http-equiv and content attributes from meta tags. WOuld an
improvement be to configure the class to pick up other attributes not already
mentioned?
> Add html META tag information into metaData in index-more plugin
> ----------------------------------------------------------------
>
> Key: NUTCH-62
> URL: https://issues.apache.org/jira/browse/NUTCH-62
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Reporter: Jack Tang
> Priority: Trivial
> Attachments: index-more.patch.zip
>
>
> Now(version dev-0.7), only some metaData in http response such as type,
> date, content-length are available int the index-more plugin. And we cannot
> index/sotre the meta data in html header (<META> exactly)
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira