christianAppl opened a new pull request, #511:
URL: https://github.com/apache/poi/pull/511

   **TL;DR:**
   I would like to suggest, to not create artificial rows and cells for tables 
in read documents, to prevent the misrepresentation of documents.
   
   **Problematic behaviour:**
   In org.apache.poi.xwpf.usermodel.XWPFTable the constructor assumes, that a 
table should at least contain one row. Should a table not contain 'tr' 
elements, such an element is created artificially:
   
   ```Java
   public XWPFTable(CTTbl table, IBody part) {
           this.part = part;
           this.ctTbl = table;
   
           // is an empty table: I add one row and one column as default
           if (table.sizeOfTrArray() == 0) {
               createEmptyTable(table);
           }
           ...
   ```
   
   **Claims and intentions:**
   I can see how this could possibly be usefull, when creating a document. 
However I would prefer it, if POI would not create such table rows, when 
reading a preexisting document.
   
   This is especially problematic, as this is not even checking for all 
possible contents.
   According to: "ECMA-376-1:2016 Office Open XML File Formats — Fundamentals 
and Markup Language Reference"
   Chapter "17.5.2.30 sdt (Row-Level Structured Document Tag)"
   A table must not necessarily contain the rows directly, but may contain row 
sdts instead.
   
   This behaviour of POI lead to the creation of an artificial and unwanted 
row, for the following table:
   ```
       <w:tbl>
         <w:tblPr>
           <w:tblStyle w:val="Tabellenraster"/>
           <w:tblW w:w="0" w:type="auto"/>
           <w:tblLook w:val="04A0" w:firstRow="1" w:lastRow="0" 
w:firstColumn="1" w:lastColumn="0" w:noHBand="0" w:noVBand="1"/>
         </w:tblPr>
         <w:tblGrid>
           <w:gridCol w:w="9062"/>
         </w:tblGrid>
         <w:sdt>
           <w:sdtContent>
             <w:tr>
               <w:sdt>
                 <w:sdtContent>
                   <w:tc>
                     <w:p>
                       <w:r>
                         <w:t>Test</w:t>
                       </w:r>
                     </w:p>
                   </w:tc>
                 </w:sdtContent>
               </w:sdt>
             </w:tr>
           </w:sdtContent>
         </w:sdt>
       </w:tbl>
   ```
   Which is not containing table rows as direct children, but can obviously not 
be treated as being "empty".
   
   This PR contains my suggestion on how to seperate the reading and writing of 
tables and to prevent the creation of superfluous contents, that would 
misrepresent the original document.
   
   Even if a table had been empty in the original, I would prefer it to reflect 
that in the resulting POI representation.
   
   **Further suggestion:**
   I would suggest to remove this feature entirely, as I can imagine how this 
could be counterproductive during writing aswell, but as I am mostly interested 
in reading such documents, I have not outright removed:
   ```
   if (table.sizeOfTrArray() == 0) {
     createEmptyTable(table);
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to