[ 
https://issues.apache.org/jira/browse/TIKA-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462816#comment-17462816
 ] 

Tim Allison commented on TIKA-3630:
-----------------------------------

In {{OOXMLWordAndPowerPointTextHandler}} in the {{else if (inV)}} branch, we're 
adding a tab at the end, but there's no guarantee that the characters are the 
full value for the <c:v/> element.

We should see if adding a tab at the {{        } else if (V.equals(localName) 
&& C_NS.equals(uri)) { // in value in a chart
}}} branch in the endElement call fixes this and causes no surprises.

> Weird extra tab in parsing charts in xlsx depending on construction of 
> InputStream
> ----------------------------------------------------------------------------------
>
>                 Key: TIKA-3630
>                 URL: https://issues.apache.org/jira/browse/TIKA-3630
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Trivial
>
> When I was respinning 2.2.1-rc3 this morning, I noticed that we were getting 
> slightly different results on {{testEXCEL_charts.xlsx}} if we parsed an 
> InputStream created by Files.newInputStream(path) or 
> TikaInputStream.get(path).
> Specifically, there was an extra tab that breaks "May" into two words 
> ({{Ma<tab/>y}}) if we used TikaInputStream.get(path), but not if we used 
> Files.newInputStream()
> This difference exists on the 1.x branch as well. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to