David Pilato created TIKA-2030:
----------------------------------
Summary: A space is suppressed when parsing Odt file
Key: TIKA-2030
URL: https://issues.apache.org/jira/browse/TIKA-2030
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.13
Environment: MacOS X
Reporter: David Pilato
Priority: Minor
I have an ODT sample file which contains:
{code}
This is a sample text available in page 1
{code}
When I extract its content with Tika, I'm getting:
{code}
This isa sample text available in page 1
{code}
Note the missing space between {{is}} and {{a}}.
I'll link to an example ODT file which reproduces this issue.
Note that I generated this ODT file from MS Word. The original MS Word file is
correctly parsed by Tika.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)