https://issues.apache.org/bugzilla/show_bug.cgi?id=50972

           Summary: XWPFWordExtractor ignores <w:br/> entries
           Product: POI
           Version: 3.8-dev
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XWPF
        AssignedTo: [email protected]
        ReportedBy: [email protected]


Created an attachment (id=26797)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26797)
Test document

Two words separated by a line break character are glued together.

I tried to debug the issue and found a code in XWPFRun.toString() method:

if (o instanceof CTEmpty) {
   // Some inline text elements get returned not as
   //  themselves, but as CTEmpty, owing to some odd
   //  definitions around line 5642 of the XSDs
   String tagName = o.getDomNode().getNodeName();
   if ("w:tab".equals(tagName)) {
      text.append("\t");
   }
   if ("w:br".equals(tagName)) {
      text.append("\n");
   }
   <...>
}

The issue is that "o" is an instance of CTBrImpl, not CTEmpty. So this element
is ignored.

Attached a test document.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to