Alfonso Presa created NUTCH-1553:
------------------------------------
Summary: Property 'indexer.delete.robots.noindex' not working if
using parser-html.
Key: NUTCH-1553
URL: https://issues.apache.org/jira/browse/NUTCH-1553
Project: Nutch
Issue Type: Bug
Components: indexer, parser
Affects Versions: 1.6
Reporter: Alfonso Presa
Priority: Minor
May be I'm doing something wrong, but it seems to me that +NUTCH-1434+ patch
only works when using tika's parser. When using parser-html, "robots" metatag
is only populated if parse-metatags plugin is enabled and it's done with the
prefix "metatag.". So parseData.getMeta("robots") return nothing if not using
tika.
I suppose the simplest solution would be to provide a fallback in case
parseData.getMeta("robots") is null and get parseData.getMeta("metatag.robots")
in that case.
Thanks!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira