Francesco Capponi created NUTCH-2275:
----------------------------------------

             Summary: MD5Signature by default doesn't take in account parse
                 Key: NUTCH-2275
                 URL: https://issues.apache.org/jira/browse/NUTCH-2275
             Project: Nutch
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.11
            Reporter: Francesco Capponi


I'm testing Apache Nutch with the feed's plugin. I've noticed that for each 
page it generates the same digest/signature, therefore the dedup cleans 
everything up from the database.

I'm wondering why the class MD5Signature is the default one instead of 
TextMD5Signature.

Anyhow now I've modified a little bit the MD5Signature to let it work with the 
feed plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to