[ https://issues.apache.org/jira/browse/TIKA-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966888#comment-14966888 ]
Tim Allison edited comment on TIKA-1707 at 10/21/15 2:41 PM: ------------------------------------------------------------- [~kiwiwings], we found a regression in spacing around differently formatted runs in ppt (TIKA-1778). Do you see any problems if we don't {{trim}} here: {noformat} for (HSLFTextRun htr : htp.getTextRuns()) { String line = htr.getRawText(); if (line != null) { for (String fragment : line.split("\\u000b")){ .... xhtml.characters(fragment.trim()); ... {noformat} If we drop it, will we get spaces where we shouldn't? Perhaps trim the trailing carriage return? {noformat}fragment.replaceFirst("\r$", " "){noformat} was (Author: talli...@mitre.org): [~kiwiwings], we found a regression in spacing around differently formatted runs in ppt (TIKA-1778). Do you see any problems if we don't {{trim}} here: {noformat} for (HSLFTextRun htr : htp.getTextRuns()) { String line = htr.getRawText(); if (line != null) { for (String fragment : line.split("\\u000b")){ .... xhtml.characters(fragment.trim()); ... {noformat} If we drop it, will we get spaces where we shouldn't? > Upgrade to Apache POI 3.13 Beta 2 > --------------------------------- > > Key: TIKA-1707 > URL: https://issues.apache.org/jira/browse/TIKA-1707 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.9 > Reporter: Andreas Beeker > Assignee: Tim Allison > Attachments: common_sl.diff > > > In the not so far future, POI 3.13 Beta 2 will be available. > This contains a quite big change to the Powerpoint modules XSLF/HSLF, but > thankfully TIKA isn't much affected. > Please try the patch on our trunk and post side-effects. > As the work on the common_sl api hasn't been finished yet, there might be > another patch for the next POI beta version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)