feng ye created TIKA-2734:
-----------------------------
Summary: Tika addes extra characters at the end of text in
extracting from excel file
Key: TIKA-2734
URL: https://issues.apache.org/jira/browse/TIKA-2734
Project: Tika
Issue Type: Bug
Components: handler
Affects Versions: 1.18
Reporter: feng ye
Attachments: AIRPORTSOK.xls
when extracting text from some relatively large excel files (9000 rows or so),
I found an extra string of "&A PAGE &P" is added to the end of the resulting
text, when Tika.parseToString is called. Is it a known issue? Is there any
configuration that I can do that will opt out from outputting these extra
characters?
did not find a good answer over google.
the input excel spreadsheet is attached.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)