[ 
https://issues.apache.org/jira/browse/TIKA-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348378#comment-15348378
 ] 

Hudson commented on TIKA-2019:
------------------------------

SUCCESS: Integrated in Tika-trunk #1069 (See 
[https://builds.apache.org/job/Tika-trunk/1069/])
TIKA-2019 -- parsers for 2003 MS xml files fail to add spaces/tabs (tallison: 
rev 7ae760e29ad3ed5874f7f50c27c6f850ab1d8025)
* 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/xml/XML2003ParserTest.java
* 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/xml/AbstractXML2003Parser.java
* 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/xml/SpreadsheetMLParser.java
* 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/xml/WordMLParser.java
TIKA-2019 -- clean up -- move state variables to inner classes, convert 
(tallison: rev 2031de70c117fdabf793008fe22dd9c97c82d2c9)
* 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/xml/WordMLParser.java
* 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/xml/SpreadsheetMLParser.java
* 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/xml/AbstractXML2003Parser.java


> WordMLParser and SpreadsheetMLParser incorrectly concatenate tokens with 
> ToTextHandler
> --------------------------------------------------------------------------------------
>
>                 Key: TIKA-2019
>                 URL: https://issues.apache.org/jira/browse/TIKA-2019
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Tim Allison
>             Fix For: 2.0, 1.14
>
>
> The xml generated by these parsers was good, but when using the 
> ToTextHandler, spaces/tabs were not added correctly.  This leads to 
> incorrectly concatenated strings.  Further, because we are extending the 
> XMLParser, while the metadata is extracted, it isn't well represented the xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to