[ https://issues.apache.org/jira/browse/NUTCH-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Irinel updated NUTCH-2546: -------------------------- Description: The parse-(metatags|html) plugin "extracts" meta tags like "<meta property=", but tags like "<meta *name*=" are not processed. HTML e.g.: <meta {color:#ff0000}property{color}="og:title" content="Content in this property..."/> - not extracted <meta *name*="description" content="Content in this meta..."/> - OK When using parse-tika plugin for parsing, meta property fields are processed. <name>plugin.includes</name> <value>parse-(*html*|tika|metatags)...</value> was: The parse-metatags/html plugin "extracts" meta tags like "<meta property=", but tags like "<meta *name*=" are not processed. HTML e.g.: <meta {color:#ff0000}property{color}="og:title" content="Content in this property..."/> - not extracted <meta *name*="description" content="Content in this meta..."/> - OK When using parse-tika plugin for parsing, meta property fields are processed. <name>plugin.includes</name> <value>parse-(*html*|tika|metatags)...</value> > parse-(metatags|html) plugin - "meta property" not extracted only "meta name" > ----------------------------------------------------------------------------- > > Key: NUTCH-2546 > URL: https://issues.apache.org/jira/browse/NUTCH-2546 > Project: Nutch > Issue Type: Improvement > Components: parser > Affects Versions: 1.15 > Reporter: Irinel > Priority: Major > > The parse-(metatags|html) plugin "extracts" meta tags like "<meta property=", > but tags like "<meta *name*=" are not processed. > HTML e.g.: > <meta {color:#ff0000}property{color}="og:title" content="Content in this > property..."/> - not extracted > <meta *name*="description" content="Content in this meta..."/> - OK > > When using parse-tika plugin for parsing, meta property fields are processed. > <name>plugin.includes</name> > <value>parse-(*html*|tika|metatags)...</value> -- This message was sent by Atlassian JIRA (v7.6.3#76005)