[ 
https://issues.apache.org/jira/browse/TIKA-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Libert updated TIKA-1103:
---------------------------------

    Environment: 
Windows 7 x64
Java 6 Update 33

  was:Windows 7 x64

    
> Tika.parseToString(InputStream) does not output the same content as 
> parseToString(File)
> ---------------------------------------------------------------------------------------
>
>                 Key: TIKA-1103
>                 URL: https://issues.apache.org/jira/browse/TIKA-1103
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.1, 1.2, 1.3
>         Environment: Windows 7 x64
> Java 6 Update 33
>            Reporter: Antoine Libert
>
> Tika.parseToString(...) outputs different results with the following PDF file 
> (iPhone user guide in german, bug also happens with french).
> http://manuals.info.apple.com/de_DE/iphone_benutzerhandbuch.pdf
> 1.3 parseToString(File) : actual content (good)
> 1.2 parseToString(File) : actual content (good)
> 1.1 parseToString(File) : actual content (good)
> 1.3 parseToString(InputStream) : empty
> 1.2 parseToString(InputStream) : PDF binary shown as text
> 1.1 parseToString(InputStream) : PDF binary shown as text
> Simple test case:
> Tika tika = new Tika();
> File f = new File("iphone_benutzerhandbuch.pdf")
> TikaInputStream is2 = TikaInputStream.get(f);
> String st2 = tika.parseToString(is2); // inputstream
> String stt2 = tika.parseToString(f); // file
> assertTrue(st2.equals(stt2)); // false

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to