[
https://issues.apache.org/jira/browse/TIKA-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348378#comment-15348378
]
Hudson commented on TIKA-2019:
------------------------------
SUCCESS: Integrated in Tika-trunk #1069 (See
[https://builds.apache.org/job/Tika-trunk/1069/])
TIKA-2019 -- parsers for 2003 MS xml files fail to add spaces/tabs (tallison:
rev 7ae760e29ad3ed5874f7f50c27c6f850ab1d8025)
*
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/xml/XML2003ParserTest.java
*
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/xml/AbstractXML2003Parser.java
*
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/xml/SpreadsheetMLParser.java
*
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/xml/WordMLParser.java
TIKA-2019 -- clean up -- move state variables to inner classes, convert
(tallison: rev 2031de70c117fdabf793008fe22dd9c97c82d2c9)
*
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/xml/WordMLParser.java
*
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/xml/SpreadsheetMLParser.java
*
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/xml/AbstractXML2003Parser.java
> WordMLParser and SpreadsheetMLParser incorrectly concatenate tokens with
> ToTextHandler
> --------------------------------------------------------------------------------------
>
> Key: TIKA-2019
> URL: https://issues.apache.org/jira/browse/TIKA-2019
> Project: Tika
> Issue Type: Bug
> Reporter: Tim Allison
> Fix For: 2.0, 1.14
>
>
> The xml generated by these parsers was good, but when using the
> ToTextHandler, spaces/tabs were not added correctly. This leads to
> incorrectly concatenated strings. Further, because we are extending the
> XMLParser, while the metadata is extracted, it isn't well represented the xml.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)