On Monday, 1 October 2018 at 15:21:24 UTC, helxi wrote:
I tried out https://dlang.org/library/std/utf/validate.html before manually checking for encoding myself so I ended up with the code below. I was fairly surprised that "*.o" (object) files are UTF encoded! Is it normal?

Yes. Any random collection of bytes <= 127 is valid utf-8. Lines will read until it sees a byte 10, and cut off from there.

Quite a few file formats have a 10 early on to detect text/binary transmission corruption, but even if they don't, it is a fairly common byte to see before too long and that cuts off your scan for later bytes.


You really are better off looking for those <32 bytes like I described earlier - a .o file will likely have some 1's and 3's early on which that will quickly detect, but those will also pass the validate test.

Reply via email to