Gerard Bouchar created NUTCH-2586:
-------------------------------------

             Summary: Add a fallback mechanism for missing meta tags
                 Key: NUTCH-2586
                 URL: https://issues.apache.org/jira/browse/NUTCH-2586
             Project: Nutch
          Issue Type: New Feature
            Reporter: Gerard Bouchar


While using nutch, we faced the following issue: some web pages miss a 
"description"  meta tag, but include an "og:description" meta (using the [open 
graph protocol|http://ogp.me/]).

Here are two examples: 

* 
http://imagenesdelavirgenmaria.com/17-imagenes-de-la-virgen-maria-de-guadalupe/
* 
http://mixcdsource.com/product/dj-arson-dj-sin-cerothe-hit-list-18-5-reggaeton-edition/

It would be nice to have a configurable list of fallback meta tags to use when 
the main meta tag is absent. Something that would allow us to specify, in the 
configuration, "when the 'description' meta is missing, use 'og:description', 
when 'title' is missing, use 'og:title', etc..." .




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to