Hi Sean,

On Wed, Nov 07, 2018 at 11:00:24PM +0000, Sean Brogan via edk2-devel wrote:
> Is there a definitive answer for the file encoding for all UNI files in edk2?
> If not I would like to propose one.  Incorrect encoding causes tool
> issues and is something we can easily check for and fix.
> 
> Proposal: All UNI files in edk2 should be
> 
>   1.  UTF-8
> Or
> 
>   1.  Use a BOM and be UTF-16
> 
> https://en.wikipedia.org/wiki/Byte_order_mark
> 
> Results from searching edk2:
> 1 - UTF-16 LE BOM file: 
> EdkCompatibilityPkg\Compatibility\FrameworkHiiOnUefiHiiThunk\Strings.uni

Which is going to be deleted at some point anyway.

> 919 - Without BOM and decoded as UTF-8
> 
> Thoughts?

I would be quite happy to make UTF-8 the official norm if that doesn't
severely impact others.

(As a sidenote, the 'file' command gives the following summaries
      2 ASCII text
    815 ASCII text, with CRLF line terminators
     72 ASCII text, with very long lines, with CRLF line terminators
      3 C source, ASCII text, with CRLF line terminators
      1 Little-endian UTF-16 Unicode text, with CRLF, CR line terminators
     26 UTF-8 Unicode text, with CRLF line terminators
      1 UTF-8 Unicode text, with very long lines, with CRLF line terminators

I expect "ASCII text" is simply "doesn't contain any characters > 127".)

> Future question:  Can we make rule for all other standard file types
> (c, h, dec, dsc, fdf, inf,)?

I think c and h have toolchain implications that would need to be
investigated in greater detail (i.e., it is possible we would need to
retire some profiles from BaseTools that would no longer be able to
compile new code). But as long as we don't permit > ASCII 127 in the
C code, we probably wouldn't see build failures.

Other than that, I'd be happy to go full UTF-8.

Regards,

Leif
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to