[ https://issues.apache.org/jira/browse/NUTCH-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542898#comment-16542898 ]
Tim Allison commented on NUTCH-2586: ------------------------------------ Is this better handled at the Tika level...or is this something we should also add to Tika? > Add a fallback mechanism for missing meta tags > ---------------------------------------------- > > Key: NUTCH-2586 > URL: https://issues.apache.org/jira/browse/NUTCH-2586 > Project: Nutch > Issue Type: New Feature > Reporter: Gerard Bouchar > Priority: Major > > While using nutch, we faced the following issue: some web pages miss a > "description" meta tag, but include an "og:description" meta (using the > [open graph protocol|http://ogp.me/]). > Here are two examples: > * > http://imagenesdelavirgenmaria.com/17-imagenes-de-la-virgen-maria-de-guadalupe/ > * > http://mixcdsource.com/product/dj-arson-dj-sin-cerothe-hit-list-18-5-reggaeton-edition/ > It would be nice to have a configurable list of fallback meta tags to use > when the main meta tag is absent. Something that would allow us to specify, > in the configuration, "when the 'description' meta is missing, use > 'og:description', when 'title' is missing, use 'og:title', etc..." . -- This message was sent by Atlassian JIRA (v7.6.3#76005)