[ 
https://issues.apache.org/jira/browse/TIKA-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348442#comment-15348442
 ] 

Hudson commented on TIKA-2019:
------------------------------

SUCCESS: Integrated in tika-2.x #112 (See 
[https://builds.apache.org/job/tika-2.x/112/])
TIKA-2019 -- fix WordMLParser and SpreadsheetMLParser (tallison: rev 
1ce93ed9ece3b93ff28e532a76ce3b326d734593)
* 
tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/xml/AbstractXML2003Parser.java
* 
tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/xml/XML2003ParserTest.java
* 
tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/xml/SpreadsheetMLParser.java
* 
tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/xml/WordMLParser.java


> WordMLParser and SpreadsheetMLParser incorrectly concatenate tokens with 
> ToTextHandler
> --------------------------------------------------------------------------------------
>
>                 Key: TIKA-2019
>                 URL: https://issues.apache.org/jira/browse/TIKA-2019
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Tim Allison
>             Fix For: 2.0, 1.14
>
>
> The xml generated by these parsers was good, but when using the 
> ToTextHandler, spaces/tabs were not added correctly.  This leads to 
> incorrectly concatenated strings.  Further, because we are extending the 
> XMLParser, while the metadata is extracted, it isn't well represented the xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to