[
https://issues.apache.org/jira/browse/TIKA-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462816#comment-17462816
]
Tim Allison commented on TIKA-3630:
-----------------------------------
In {{OOXMLWordAndPowerPointTextHandler}} in the {{else if (inV)}} branch, we're
adding a tab at the end, but there's no guarantee that the characters are the
full value for the <c:v/> element.
We should see if adding a tab at the {{ } else if (V.equals(localName)
&& C_NS.equals(uri)) { // in value in a chart
}}} branch in the endElement call fixes this and causes no surprises.
> Weird extra tab in parsing charts in xlsx depending on construction of
> InputStream
> ----------------------------------------------------------------------------------
>
> Key: TIKA-3630
> URL: https://issues.apache.org/jira/browse/TIKA-3630
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Trivial
>
> When I was respinning 2.2.1-rc3 this morning, I noticed that we were getting
> slightly different results on {{testEXCEL_charts.xlsx}} if we parsed an
> InputStream created by Files.newInputStream(path) or
> TikaInputStream.get(path).
> Specifically, there was an extra tab that breaks "May" into two words
> ({{Ma<tab/>y}}) if we used TikaInputStream.get(path), but not if we used
> Files.newInputStream()
> This difference exists on the 1.x branch as well.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)