[jira] [Commented] (TIKA-1863) --text-main content missing in output file

Marcin Gil (JIRA) Mon, 22 Feb 2016 02:33:52 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156746#comment-15156746
 ]


Marcin Gil commented on TIKA-1863:
----------------------------------

Hi! I do not know what that is. Sorry. I can only confirm it happens not only 
to PDF files, but also to DOC files.

> --text-main content missing in output file
> ------------------------------------------
>
>                 Key: TIKA-1863
>                 URL: https://issues.apache.org/jira/browse/TIKA-1863
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.12
>         Environment: Windows 10 64
>            Reporter: Marcin Gil
>
> When converting both PDF and DOC files to text with following command
> java -jar tika.jar --text-main --encoding=UTF-8 input.pdf > output.txt
> The output file is missing a random amount of LAST and FIRST lines in input 
> file. 
> Example file:
> https://dl.dropboxusercontent.com/u/11435743/tika-issue-1.pdf
> Text starting from "15 Akt oskarżenia" is missing (at the bottom of the file).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1863) --text-main content missing in output file

Reply via email to