https://issues.dlang.org/show_bug.cgi?id=15949
Issue ID: 15949 Summary: Improve readtext handling of byte order mark (BOM) Product: D Version: D2 Hardware: All OS: All Status: NEW Severity: enhancement Priority: P1 Component: phobos Assignee: nob...@puremagic.com Reporter: jesse.k.phillip...@gmail.com Problem: I've hit this many times in Windows. I try to read in a file with std.file.readText and get: "Syntax error at line 0" This is because some Microsoft program has decided to insert a UTF-8 Byte Order Mark (BOM) into the beginning of the file (0xEF 0xBB 0xBF). But readText really shouldn't automatically convert a file's content based on the BOM specified. Suggested fix: I think readText should validate and skip the BOM. It should check that the BOM is not UTF-16LE (0xFF 0xFE), UTF-16BE (0xFE 0xFF), UTF-32LE (FF FE 00 00), UTF-32BE (0x00 0x00 0xFE 0xFF), if it is one of those then it should throw an exception that the file being read is one of those encoding and will not be converted to UTF-8 string. The corresponding std.file.readText!wstring and std.file.readText!dstring should perform equivalent validation. If it is no cost to change the byte order then that should be done. 1. https://en.wikipedia.org/wiki/Byte_order_mark --