Hello, I'm katahiromz. Thank you for your great software. I want to add UTF-16/UTF-32 support to your C preprocessor.
This patch (attached) might add automatic character encoding detection to `libcpp/files.cc` by examining the first 4 bytes of input files. I hope this patch helps. --- Technical information: **Detection logic in `read_file_guts`:** - Binary files (all zeros in first 4 bytes) --> error - BOM detection: - `FF FE 00 00` --> UTF-32LE - `00 00 FE FF` --> UTF-32BE - `FF FE` --> UTF-16LE - `FE FF` --> UTF-16BE - `EF BB BF` --> UTF-8 (handled by existing code) - Null byte pattern inference (no BOM): - bytes[1]==0 && bytes[3]==0 --> UTF-16LE - bytes[0]==0 && bytes[2]==0 --> UTF-16BE - bytes[1,2,3]==0 --> UTF-32LE - bytes[0,1,2]==0 --> UTF-32BE **Changes:** - Added `detect_encoding()` function for BOM/pattern detection - Modified `read_file_guts()` to auto-detect and strip BOM before conversion Files less than 4 bytes are processed normally without inference. --- Best regards, Katayama Hirofumi MZ <[email protected]>
cpp-utf16.patch
Description: Binary data
