[
https://issues.apache.org/jira/browse/TIKA-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin updated TIKA-1552:
-----------------------------
Description:
Hello,
We found that when a pdf document has marked text inside frame (table) then
after parsing Tika insert tabs between words.
Original text:
Provides $17.7 billion in discretionary funding for the National Aeronautics
and Space
Parsed text (copy line below to your text editor and you will see tabs):
• Provides $17.7 billion in discretionary funding for
the National Aeronautics and Space
Thank you.
was:
Hello,
We found that when a pdf document has marked text inside frame (table) then
after parsing Tika insert tabs between words.
Original text:
Provides $17.7 billion in discretionary funding for the National Aeronautics
and Space
Parsed text:
• Provides $17.7 billion in discretionary funding for
the National Aeronautics and Space
Thank you.
> Pdf document parser
> -------------------
>
> Key: TIKA-1552
> URL: https://issues.apache.org/jira/browse/TIKA-1552
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.7
> Reporter: Konstantin
>
> Hello,
> We found that when a pdf document has marked text inside frame (table) then
> after parsing Tika insert tabs between words.
> Original text:
> Provides $17.7 billion in discretionary funding for the National Aeronautics
> and Space
> Parsed text (copy line below to your text editor and you will see tabs):
> • Provides $17.7 billion in discretionary funding for
> the National Aeronautics and Space
> Thank you.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)