[ 
https://issues.apache.org/jira/browse/TIKA-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin updated TIKA-1552:
-----------------------------
    Attachment: 2014_US_Federal_Budget.pdf

> Pdf document parser
> -------------------
>
>                 Key: TIKA-1552
>                 URL: https://issues.apache.org/jira/browse/TIKA-1552
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.7
>            Reporter: Konstantin
>         Attachments: 2014_US_Federal_Budget.pdf
>
>
> Hello,
> We found that when a pdf document has marked text inside frame (table) then 
> after parsing Tika insert tabs between words.
> Original text:
> Provides $17.7 billion in discretionary funding for the National Aeronautics 
> and Space
> Parsed text (jira removed tabs, so i will add -> symbols instead):
> •        Provides -> $17.7 -> 
> billion->in->discretionary->funding->for->the->National->Aeronautics->and->Space
> Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to