[
https://issues.apache.org/jira/browse/TIKA-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609976#comment-14609976
]
Tim Allison commented on TIKA-1663:
-----------------------------------
For those curious, I found no speed hit in adding md5 hashing to a batch run
against the ~1million documents in govdocs1. Admittedly, I didn't do thorough
benchmarking, but the one digesting run with trunk I ran was a little bit
faster than the one non-digesting run I did, where "little bit faster" =
"difference was small enough to be in the noise."
> Add a DigestingParser to add MD5/SHA-X hashes as fields in Metadata
> -------------------------------------------------------------------
>
> Key: TIKA-1663
> URL: https://issues.apache.org/jira/browse/TIKA-1663
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Minor
> Attachments: digesting_parser_v1.patch
>
>
> It might be useful to integrate commons' DigestUtils and allow users to
> easily add the MD5 or other supported hashes to the Metadata object.
> Anyone else find this of use?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)