On Tuesday, 17 June 2014 at 12:54:39 UTC, Marc Schütz wrote:
On Tuesday, 17 June 2014 at 02:27:43 UTC, jicman wrote:
Greetings!
I have a bunch of files plain ASCII, UTF8 and UTF16 with and
without BOM (Byte Order Mark). I had, "I thought", a nice way
of figuring out what type of encoding the file was (ASCII,
UTF8 or UTF16) when the BOM was missing, by reading the
content and applying the std.utf.validate function to the
char[] or, wchar[] string. The problem is that lately, I am
hitting into a wall with the "array cast misalignment" when
casting wchar[].
ie.
auto text = cast(string) file.read();
wchar[] temp = cast(wchar[]) text;
If the length of the data is odd, it cannot be (valid) UTF16.
You can check for that, and skip the test for UTF16 in this
case.
Another thing: it is better not to cast the data to `string`
before you know that it's actually UTF8. Better make it
`ubyte[]`; this way you don't need all the casts inside the
if-blocks.
Indeed. Thanks.