[
https://issues.apache.org/jira/browse/TIKA-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-3630:
------------------------------
Description:
When I was respinning 2.2.1-rc3 this morning, I noticed that we were getting
slightly different results on {{testEXCEL_charts.xlsx}} if we parsed an
InputStream created by Files.newInputStream(path) or TikaInputStream.get(path).
Specifically, there was an extra tab that breaks "May" into two words
({{Ma<tab/>y}}) if we used TikaInputStream.get(path), but not if we used
Files.newInputStream()
This difference exists on the 1.x branch as well.
was:
When I was respinning 2.2.1-rc3 this morning, I noticed that we were getting
slightly different results on {{testEXCEL_charts.xlsx}} if we parsed an
InputStream created by Files.newInputStream(path) or TikaInputStream.get(path).
Specifically, there was an extra tab in the word April (IIRC) if we used
Files.newInputStream
This difference exists on the 1.x branch as well.
> Weird extra tab in parsing charts in xlsx depending on construction of
> InputStream
> ----------------------------------------------------------------------------------
>
> Key: TIKA-3630
> URL: https://issues.apache.org/jira/browse/TIKA-3630
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Trivial
>
> When I was respinning 2.2.1-rc3 this morning, I noticed that we were getting
> slightly different results on {{testEXCEL_charts.xlsx}} if we parsed an
> InputStream created by Files.newInputStream(path) or
> TikaInputStream.get(path).
> Specifically, there was an extra tab that breaks "May" into two words
> ({{Ma<tab/>y}}) if we used TikaInputStream.get(path), but not if we used
> Files.newInputStream()
> This difference exists on the 1.x branch as well.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)