ongdisheng commented on issue #696:
URL: https://github.com/apache/fesod/issues/696#issuecomment-3555057853

   ## Root Cause
   Hi @xfn-jjw, I looked into this and found what's happening. The problem is 
[XML 1.0 ](https://www.w3.org/TR/xml/#charsets) doesn't allow control 
characters like `\u0002`. It only allows tab, newline and carriage return from 
the control character range. When POI tries to write these characters to the 
XML, it has to replace them with `?` because otherwise the XML would be invalid 
and couldn't be parsed. You can verify this by unzipping the `.xlsx` file and 
checking `xl/worksheets/sheet1.xml`:
   ```bash
   <c r="A2" s="2" t="inlineStr"><is><t>Product?Code</t></is></c>
   ```
   The `?` is already there in the file. It's not a read problem since it 
happens during write.
   
   ## Workaround
   The Office Open XML spec ([ECMA-376, Part 1, Section 
22.4.2.4](https://ecma-international.org/publications-and-standards/standards/ecma-376/))
 has a solution for this. You can escape control characters using `_xHHHH_` 
format. For example, `\u0002` becomes `_x0002_`. Here's a custom write handler 
you can use:
   ```java
   import org.apache.fesod.sheet.enums.CellDataTypeEnum;
   import org.apache.fesod.sheet.metadata.Head;
   import org.apache.fesod.sheet.metadata.data.WriteCellData;
   import org.apache.fesod.sheet.write.handler.CellWriteHandler;
   import org.apache.fesod.sheet.write.metadata.holder.WriteSheetHolder;
   import org.apache.fesod.sheet.write.metadata.holder.WriteTableHolder;
   import org.apache.poi.ss.usermodel.Cell;
   
   import java.util.List;
   
   public class ControlCharacterEscapeWriteHandler implements CellWriteHandler {
   
       @Override
       public void afterCellDataConverted(WriteSheetHolder writeSheetHolder, 
WriteTableHolder writeTableHolder,
               WriteCellData<?> cellData, Cell cell, Head head, Integer 
relativeRowIndex, Boolean isHead) {
           if (cellData != null && 
CellDataTypeEnum.STRING.equals(cellData.getType())) {
               String originalString = cellData.getStringValue();
               if (originalString != null) {
                   String escapedString = 
escapeControlCharacters(originalString);
                   cellData.setStringValue(escapedString);
               }
           }
       }
   
       private String escapeControlCharacters(String str) {
           StringBuilder sb = new StringBuilder();
           for (int i = 0; i < str.length(); i++) {
               char c = str.charAt(i);
               if (c >= 0x00 && c <= 0x1F && c != '\t' && c != '\n' && c != 
'\r') {
                   sb.append(String.format("_x%04X_", (int) c));
               } else {
                   sb.append(c);
               }
           }
           return sb.toString();
       }
   
       @Override
       public void afterCellCreate(WriteSheetHolder writeSheetHolder, 
WriteTableHolder writeTableHolder, Cell cell,
               Head head, Integer relativeRowIndex, Boolean isHead) {
       }
   
       @Override
       public void afterCellDispose(WriteSheetHolder writeSheetHolder, 
WriteTableHolder writeTableHolder,
               List<WriteCellData<?>> cellDataList, Cell cell, Head head, 
Integer relativeRowIndex, Boolean isHead) {
       }
   }
   ```
   
   ### Usage
   ```java
   FesodSheet.write(file, DemoData.class)
       .registerWriteHandler(new ControlCharacterEscapeWriteHandler())
       .sheet("Sheet1")
       .doWrite(data);
   ```
   This will store control characters in a format that survives the XML 
serialization. The file can be opened in Excel and the control characters will 
be preserved. Checking the XML inside the generated file where the control 
character is now stored as `_x0002_ ` instead of `?`:
   ```bash
   <c r="A2" s="2" t="inlineStr"><is><t>Product_x0002_Code</t></is></c>
   ```
   
   
[issue696_with_handler.xlsx](https://github.com/user-attachments/files/23639120/issue696_with_handler.xlsx)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to