[ https://issues.apache.org/jira/browse/TIKA-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated TIKA-2019: ------------------------------ Description: The xml generated by these parsers was good, but when using the ToTextHandler, spaces/tabs were not added correctly. This leads to incorrectly concatenated strings. Further, because we are extending the XMLParser, while the metadata is extracted, it isn't well represented the xml. (was: The xml generated by these parsers was good, but when using the ToTextHandler, spaces/tabs were not added correctly. This leads to incorrectly concatenated strings.) > WordMLParser and SpreadsheetMLParser incorrectly concatenate tokens with > ToTextHandler > -------------------------------------------------------------------------------------- > > Key: TIKA-2019 > URL: https://issues.apache.org/jira/browse/TIKA-2019 > Project: Tika > Issue Type: Bug > Reporter: Tim Allison > > The xml generated by these parsers was good, but when using the > ToTextHandler, spaces/tabs were not added correctly. This leads to > incorrectly concatenated strings. Further, because we are extending the > XMLParser, while the metadata is extracted, it isn't well represented the xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)