[
https://issues.apache.org/jira/browse/TIKA-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14969173#comment-14969173
]
Tim Allison commented on TIKA-1707:
-----------------------------------
That was a bad idea.
The issue was that the first run ended with \u000b, and the split was hiding
that "paragraph break" before the next run.
So, how about adding the {{if line.endsWith("\u000b")}}:
{noformat}
if (line != null) {
boolean isfirst = true;
for (String fragment : line.split("\\u000b")) {
if (!isfirst) {
xhtml.startElement("br");
xhtml.endElement("br");
}
isfirst = false;
xhtml.characters(removePBreak(fragment));
}
if (line.endsWith("\u000b")) {
xhtml.startElement("br");
xhtml.endElement("br");
}
}
{noformat}
> Upgrade to Apache POI 3.13 Beta 2
> ---------------------------------
>
> Key: TIKA-1707
> URL: https://issues.apache.org/jira/browse/TIKA-1707
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.9
> Reporter: Andreas Beeker
> Assignee: Tim Allison
> Attachments: 075166.ppt, common_sl.diff, dont_trim_and_bullets.patch
>
>
> In the not so far future, POI 3.13 Beta 2 will be available.
> This contains a quite big change to the Powerpoint modules XSLF/HSLF, but
> thankfully TIKA isn't much affected.
> Please try the patch on our trunk and post side-effects.
> As the work on the common_sl api hasn't been finished yet, there might be
> another patch for the next POI beta version.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)