Gerard Bouchar created NUTCH-2567:

             Summary: parse-metatags writes every meta tags twice
                 Key: NUTCH-2567
             Project: Nutch
          Issue Type: Bug
            Reporter: Gerard Bouchar

Using nutch witch the following configuration, MetaTagsParser writes HTML meta 
tags to the metadata twice:
The problem seems to come from 

Both the meta tags from the existing ParseResult and from the HTMLMetaTags are 
added to the metadata with a "metatag." prefix. But the ParseResult object 
already contains the HTML meta tags, because they have been added by TikaParser 


This bug is concerning, because it makes the segments uselessly big, especially 
if we want to store all metatags (by default, only metatag.description and 
metatag.keywords are stored, and thus duplicated).

This message was sent by Atlassian JIRA

Reply via email to