Dear poi developers, just fyi: The problem lays in the fact that the paragraph.getText() method refers to the toString() without checking whether there is a deletion associated with it. I now wrote a workaround to fix this issue for myself, feel free to adapt and include it in the code, if you like:
public static String getBasicTextFromParagraphWithoutDeletions(
XWPFParagraph par) {
StringBuffer out = new StringBuffer();
for (IRunElement run : par.getIRuns()) {
if (run instanceof XWPFRun) {
XWPFRun xRun = (XWPFRun) run;
if (xRun.getCTR().getRsidDel() == null) {
out.append(xRun.toString());
}
} else if (run instanceof XWPFSDT) {
out.append(((XWPFSDT)
run).getContent().getText());
} else {
out.append(run.toString());
}
}
return out.toString();
}
Best,
Henning
-------------------------------------------------------------------------------------------------
Technische Universität München
Henning Femmer . Phone +49 (89) 289-17080
Faculty of Informatics . Software & Systems Engineering
Room 00.11.064 . [email protected] <mailto:[email protected]> .
http://www4.in.tum.de/~femmer <http://www4.in.tum.de/~femmer>
-------------------------------------------------------------------------------------------------
> On 08 Apr 2015, at 16:35, Henning Femmer <[email protected]> wrote:
>
> Dear all,
>
> I’m looking for a simple solution to parse only the newest version of an XWPF
> file (as if all changes are accepted or so). As far as I could google and
> browse through the javadoc there is no such functionality in apache poi, is
> that correct?
> If I wanted to access this information, how would I have to approach the
> problem?
> I would be willing to contribute a patch, if that is necessary.
>
> Best,
> Henning
> -------------------------------------------------------------------------------------------------
>
> Technische Universität München
>
> Henning Femmer . Phone +49 (89) 289-17080
> Faculty of Informatics . Software & Systems Engineering
> Room 00.11.064 . [email protected] <mailto:[email protected]> .
> http://www4.in.tum.de/~femmer <http://www4.in.tum.de/~femmer>
> -------------------------------------------------------------------------------------------------
>
>
signature.asc
Description: Message signed with OpenPGP using GPGMail
