ongdisheng commented on issue #696: URL: https://github.com/apache/fesod/issues/696#issuecomment-3555057853
## Root Cause Hi @xfn-jjw, I looked into this and found what's happening. The problem is [XML 1.0 ](https://www.w3.org/TR/xml/#charsets) doesn't allow control characters like `\u0002`. It only allows tab, newline and carriage return from the control character range. When POI tries to write these characters to the XML, it has to replace them with `?` because otherwise the XML would be invalid and couldn't be parsed. You can verify this by unzipping the `.xlsx` file and checking `xl/worksheets/sheet1.xml`: ```bash <c r="A2" s="2" t="inlineStr"><is><t>Product?Code</t></is></c> ``` The `?` is already there in the file. It's not a read problem since it happens during write. ## Workaround The Office Open XML spec ([ECMA-376, Part 1, Section 22.4.2.4](https://ecma-international.org/publications-and-standards/standards/ecma-376/)) has a solution for this. You can escape control characters using `_xHHHH_` format. For example, `\u0002` becomes `_x0002_`. Here's a custom write handler you can use: ```java import org.apache.fesod.sheet.enums.CellDataTypeEnum; import org.apache.fesod.sheet.metadata.Head; import org.apache.fesod.sheet.metadata.data.WriteCellData; import org.apache.fesod.sheet.write.handler.CellWriteHandler; import org.apache.fesod.sheet.write.metadata.holder.WriteSheetHolder; import org.apache.fesod.sheet.write.metadata.holder.WriteTableHolder; import org.apache.poi.ss.usermodel.Cell; import java.util.List; public class ControlCharacterEscapeWriteHandler implements CellWriteHandler { @Override public void afterCellDataConverted(WriteSheetHolder writeSheetHolder, WriteTableHolder writeTableHolder, WriteCellData<?> cellData, Cell cell, Head head, Integer relativeRowIndex, Boolean isHead) { if (cellData != null && CellDataTypeEnum.STRING.equals(cellData.getType())) { String originalString = cellData.getStringValue(); if (originalString != null) { String escapedString = escapeControlCharacters(originalString); cellData.setStringValue(escapedString); } } } private String escapeControlCharacters(String str) { StringBuilder sb = new StringBuilder(); for (int i = 0; i < str.length(); i++) { char c = str.charAt(i); if (c >= 0x00 && c <= 0x1F && c != '\t' && c != '\n' && c != '\r') { sb.append(String.format("_x%04X_", (int) c)); } else { sb.append(c); } } return sb.toString(); } @Override public void afterCellCreate(WriteSheetHolder writeSheetHolder, WriteTableHolder writeTableHolder, Cell cell, Head head, Integer relativeRowIndex, Boolean isHead) { } @Override public void afterCellDispose(WriteSheetHolder writeSheetHolder, WriteTableHolder writeTableHolder, List<WriteCellData<?>> cellDataList, Cell cell, Head head, Integer relativeRowIndex, Boolean isHead) { } } ``` ### Usage ```java FesodSheet.write(file, DemoData.class) .registerWriteHandler(new ControlCharacterEscapeWriteHandler()) .sheet("Sheet1") .doWrite(data); ``` This will store control characters in a format that survives the XML serialization. The file can be opened in Excel and the control characters will be preserved. Checking the XML inside the generated file where the control character is now stored as `_x0002_ ` instead of `?`: ```bash <c r="A2" s="2" t="inlineStr"><is><t>Product_x0002_Code</t></is></c> ``` [issue696_with_handler.xlsx](https://github.com/user-attachments/files/23639120/issue696_with_handler.xlsx) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
