cowwoc opened a new issue, #951:
URL: https://github.com/apache/poi/issues/951

   # Feature Request: Add XSSFOptimiser (XSSF equivalent of HSSFOptimiser)
   
   ## Summary
   
   HSSFOptimiser provides methods to optimise cell styles and fonts in HSSF 
(xls) workbooks by removing duplicates and unused entries. There is no 
equivalent for XSSF (xlsx) workbooks. This feature request proposes adding 
XSSFOptimiser to provide similar optimization capabilities for XSSF workbooks.
   
   ## Problem Description
   
   When programmatically creating or modifying XSSF workbooks, it's easy to 
accidentally create "style explosion" - hundreds or thousands of duplicate cell 
styles and fonts. This happens when code creates new styles for each cell 
instead of reusing existing styles.
   
   For example, this common pattern causes style explosion:
   ```java
   for (int i = 0; i < 1000; i++) {
       CellStyle style = workbook.createCellStyle(); // Creates new style each 
time!
       Font font = workbook.createFont();
       font.setBold(true);
       style.setFont(font);
       cell.setCellStyle(style);
   }
   ```
   
   Style explosion causes:
   - Slow file opening in Excel
   - Excel hangs or crashes with large workbooks
   - Increased file size
   - The famous "Too many different cell formats" error
   
   HSSFOptimiser solves this for HSSF workbooks. No equivalent exists for XSSF.
   
   ## Proposed Solution
   
   Add `XSSFOptimiser` class in `org.apache.poi.xssf.usermodel` package with 
methods:
   
   1. `optimiseCellStyles(XSSFWorkbook)` - Remaps cells using duplicate styles 
to canonical (first) occurrence
   2. `optimiseFonts(XSSFWorkbook)` - Updates style font references to 
canonical fonts
   
   ### Implementation Notes
   
   Unlike HSSF, XSSF's StylesTable maintains a separate Java list from the 
underlying CTStylesheet XML. Removing entries from CTStylesheet can cause 
`XmlValueDisconnectedException` on save because the Java objects become 
disconnected from their XML backing.
   
   The proposed implementation takes a **safe remapping approach**:
   - Identifies duplicate styles/fonts by comparing all properties
   - Remaps cells to use canonical (first occurrence) styles
   - Does NOT remove entries from XML (avoids disconnection issues)
   - Prevents further style explosion
   
   This approach is safe and effective:
   - Workbooks save correctly without XML exceptions
   - Cells reference deduplicated styles
   - Further modifications won't create additional duplicates if using 
canonical styles
   
   ## Usage Example
   
   ```java
   XSSFWorkbook workbook = new XSSFWorkbook(new 
FileInputStream("bloated.xlsx"));
   
   // Optimize fonts first, then styles
   XSSFOptimiser.optimiseFonts(workbook);
   XSSFOptimiser.optimiseCellStyles(workbook);
   
   workbook.write(new FileOutputStream("optimized.xlsx"));
   ```
   
   ## Test Coverage
   
   The implementation includes comprehensive tests:
   - `testOptimiseCellStylesRemapsDuplicates` - Verifies duplicate styles are 
consolidated
   - `testOptimiseCellStylesHandlesUnusedStyles` - Verifies no exception on 
unused styles
   - `testOptimiseCellStylesPreservesDifferentStyles` - Verifies distinct 
styles preserved
   - `testOptimiseFontsRemapsDuplicates` - Verifies duplicate fonts are detected
   - `testWorkbookSaveableAfterOptimization` - Verifies no save exceptions
   - `testOptimizationPreservesFormulas` - Verifies formulas and formatting 
preserved
   
   ## Contribution
   
   I have a working implementation ready to contribute as a PR. The code:
   - Follows POI coding conventions
   - Includes Apache License headers
   - Has full test coverage
   - Is modeled after HSSFOptimiser
   
   Would this contribution be welcome? Happy to submit a PR if so.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to