Gerard Bouchar created NUTCH-2586:
-------------------------------------
Summary: Add a fallback mechanism for missing meta tags
Key: NUTCH-2586
URL: https://issues.apache.org/jira/browse/NUTCH-2586
Project: Nutch
Issue Type: New Feature
Reporter: Gerard Bouchar
While using nutch, we faced the following issue: some web pages miss a
"description" meta tag, but include an "og:description" meta (using the [open
graph protocol|http://ogp.me/]).
Here are two examples:
*
http://imagenesdelavirgenmaria.com/17-imagenes-de-la-virgen-maria-de-guadalupe/
*
http://mixcdsource.com/product/dj-arson-dj-sin-cerothe-hit-list-18-5-reggaeton-edition/
It would be nice to have a configurable list of fallback meta tags to use when
the main meta tag is absent. Something that would allow us to specify, in the
configuration, "when the 'description' meta is missing, use 'og:description',
when 'title' is missing, use 'og:title', etc..." .
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)