[ 
https://issues.apache.org/jira/browse/TIKA-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589252#comment-15589252
 ] 

Hudson commented on TIKA-2123:
------------------------------

SUCCESS: Integrated in Jenkins build tika-2.x #162 (See 
[https://builds.apache.org/job/tika-2.x/162/])
TIKA-2123: digester fails with multiple digests on large files (tallison: rev 
7e66e49797b7bfcfe7928d442e1d04b924bf2b6c)
* (edit) tika-app/src/test/java/org/apache/tika/parser/DigestingParserTest.java
* (edit) 
tika-core/src/main/java/org/apache/tika/parser/digesting/CommonsDigester.java


> CommonsDigester calculates wrong hashes on large files
> ------------------------------------------------------
>
>                 Key: TIKA-2123
>                 URL: https://issues.apache.org/jira/browse/TIKA-2123
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata
>    Affects Versions: 1.13
>            Reporter: Yahav Amsalem
>             Fix For: 2.0, 1.14
>
>
> When passing more than one algorithm to CommonsDigester constructor and
> then trying to digest a file which is larger than 7.5 MB, results wrong
> hashe calculation for all the algorithms except the first.
> The next code will reproduce the bug:
> // The file that was used w as a simple plain text file with size > 7.5 MB 
> File file = new File("testLargeFile.txt");
> BufferedInputStream bufferedInputStream = new BufferedInputStream(new 
> FileInputStream(file));
> Metadata metadata = new Metadata();
> CommonsDigester digester = new CommonsDigester(20000000,
>                 CommonsDigester.DigestAlgorithm.MD5,
>                 CommonsDigester.DigestAlgorithm.SHA1,
>                 CommonsDigester.DigestAlgorithm.SHA256);
> digester.digest(bufferedInputStream, metadata, null);
> // Will print correct MD5 but wrong SHA1 and wrong SHA256
> System.out.println(metadata);
> Initial direction: it seems that the inner buffered stream that is being used 
> doesn't reset to 0 position after the first algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to