[ 
https://issues.apache.org/jira/browse/TIKA-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609976#comment-14609976
 ] 

Tim Allison commented on TIKA-1663:
-----------------------------------

For those curious, I found no speed hit in adding md5 hashing to a batch run 
against the ~1million documents in govdocs1.  Admittedly, I didn't do thorough 
benchmarking, but the one digesting run with trunk I ran was a little bit 
faster than the one non-digesting run I did, where "little bit faster" = 
"difference was small enough to be in the noise."

> Add a DigestingParser to add MD5/SHA-X hashes as fields in Metadata
> -------------------------------------------------------------------
>
>                 Key: TIKA-1663
>                 URL: https://issues.apache.org/jira/browse/TIKA-1663
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>         Attachments: digesting_parser_v1.patch
>
>
> It might be useful to integrate commons' DigestUtils and allow users to 
> easily add the MD5 or other supported hashes to the Metadata object.
> Anyone else find this of use?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to