cowwoc opened a new pull request, #952:
URL: https://github.com/apache/poi/pull/952
## Summary
Adds `XSSFOptimiser` class to provide optimization methods for XSSF
workbooks, equivalent to the existing `HSSFOptimiser` for HSSF workbooks.
This addresses the common "style explosion" problem where programmatically
created XSSF workbooks accumulate too many duplicate cell styles and fonts,
causing:
- Slow file opening in Excel
- Excel hangs or crashes with large workbooks
- Increased file size
- The "Too many different cell formats" error
## Changes
**New files:**
- `poi-ooxml/src/main/java/org/apache/poi/xssf/usermodel/XSSFOptimiser.java`
-
`poi-ooxml/src/test/java/org/apache/poi/xssf/usermodel/TestXSSFOptimiser.java`
**Methods added:**
- `optimiseCellStyles(XSSFWorkbook)` - Remaps cells using duplicate styles
to canonical (first) occurrence
- `optimiseFonts(XSSFWorkbook)` - Updates style font references to canonical
fonts
## Implementation Notes
Unlike HSSF, XSSF's `StylesTable` maintains a separate Java list from the
underlying `CTStylesheet` XML. Removing entries from `CTStylesheet` can cause
`XmlValueDisconnectedException` on save because the Java objects become
disconnected from their XML backing.
The implementation uses a **safe remapping approach**:
- Identifies duplicate styles/fonts by comparing all properties (not indices)
- Remaps cells to use canonical (first occurrence) styles
- Does NOT remove entries from XML (avoids disconnection issues)
- Prevents further style explosion
## Test Coverage
Includes 6 comprehensive tests:
- `testOptimiseCellStylesRemapsDuplicates`
- `testOptimiseCellStylesHandlesUnusedStyles`
- `testOptimiseCellStylesPreservesDifferentStyles`
- `testOptimiseFontsRemapsDuplicates`
- `testWorkbookSaveableAfterOptimization`
- `testOptimizationPreservesFormulas`
## Usage Example
```java
XSSFWorkbook workbook = new XSSFWorkbook(new
FileInputStream("bloated.xlsx"));
XSSFOptimiser.optimiseFonts(workbook);
XSSFOptimiser.optimiseCellStyles(workbook);
workbook.write(new FileOutputStream("optimized.xlsx"));
```
Closes #951
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]