[ https://issues.apache.org/jira/browse/TIKA-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156920#comment-15156920 ]
Tim Allison commented on TIKA-1863: ----------------------------------- Ok, y, I'm able to reproduce this with {{--text-main}}, but the text all comes through in the gui and with the other options: {{-t}}, {{-x}}, {{-h}}, {{-J}}. I'll take a look. No need to share another file. > --text-main content missing in output file > ------------------------------------------ > > Key: TIKA-1863 > URL: https://issues.apache.org/jira/browse/TIKA-1863 > Project: Tika > Issue Type: Bug > Affects Versions: 1.12 > Environment: Windows 10 64 > Reporter: Marcin Gil > > When converting both PDF and DOC files to text with following command > java -jar tika.jar --text-main --encoding=UTF-8 input.pdf > output.txt > The output file is missing a random amount of LAST and FIRST lines in input > file. > Example file: > https://dl.dropboxusercontent.com/u/11435743/tika-issue-1.pdf > Text starting from "15 Akt oskarżenia" is missing (at the bottom of the file). -- This message was sent by Atlassian JIRA (v6.3.4#6332)