This info is also somewhat stated in the coding standards.  
https://github.com/tianocore/tianocore.github.io/wiki/EDK-II-Specifications 

5.1.3 Files may only contain the ASCII characters 0x0A, 0x0D, and 0x20 through 
0x7E
        Files should be saved using either ASCII or UTF8 encoding.

It would be good for one of you who knows the detailed differences to clarify 
that text and link to the UNI spec as appropriate.

Regards,
Isaac

-----Original Message-----
From: edk2-devel [mailto:[email protected]] On Behalf Of Kinney, 
Michael D
Sent: Thursday, November 8, 2018 8:46 AM
To: Sean Brogan <[email protected]>; Gao, Liming 
<[email protected]>; [email protected]
Subject: Re: [edk2] Edk2 uni file encoding

Sean,

As a clarification.  The UNI specs does list 2 on-disk formats.
This was done so tools could support both in the transition from UTF-16LE with 
BOM to UTF-8 without BOM.

The strong recommendation is for all EDK II open source packages to use UTF-8 
without a BOM.  Since platform packages not maintained in EDK II could be 
pulling forward UNI files in UTF-16LE, we have not changed the UNI spec or 
tools to consider UTF-16LE as unsupported.

Doing patch email reviews of UNI files in UTF-16LE is a challenge so requiring 
UTF-8 without a BOM make this much easier.

The EDK II open source package conversion to UTF-8 without a BO was performed 
in late 2015.  Here is one example:

https://github.com/tianocore/edk2/commit/3f5287971ffdb5c42e3325a3a94c101f08d3a02a#diff-14d2171dacfcac1fd2e1b1f7b885e530

A helper python script was added to help perform these conversions:

https://github.com/tianocore/edk2/blob/master/BaseTools/Scripts/ConvertUni.py

At some point, it may make sense to *require* UTF-8 without a BOM for all UNI 
files and all tools and for tools to reject UNI files that are not in UTF-8 
without a BOM format.

Mike

> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-
> [email protected]] On Behalf Of Sean Brogan via edk2-devel
> Sent: Wednesday, November 7, 2018 11:11 PM
> To: Gao, Liming <[email protected]>
> Cc: [email protected]
> Subject: Re: [edk2] Edk2 uni file encoding
> 
> Liming,
> That was exactly what I was looking for.
> 
> Thanks
> Sean
> 
> 
> 
> 
> -----Original Message-----
> From: Gao, Liming <[email protected]>
> Sent: Wednesday, November 7, 2018 10:01 PM
> To: Sean Brogan <[email protected]>
> Cc: [email protected]
> Subject: RE: Edk2 uni file encoding
> 
> Sean:
>   EDKII UNI spec
> (https://na01.safelinks.protection.outlook.com/?url=htt
> ps%3A%2F%2Fgithub.com%2Ftianocore%2Ftianocore.github.io
> %2Fwiki%2FEDK-II-
> Specifications&amp;data=02%7C01%7Csean.brogan%40microso
> ft.com%7C5ffeb105737e4c00150208d6453fa46a%7C72f988bf86f
> 141af91ab2d7cd011db47%7C1%7C0%7C636772536983024335&amp;
> sdata=veov60rbEtr3ub7RcreuFuqJvc4%2BdtAowph7kBGXW54%3D&
> amp;reserved=0) Chapter 2 defines UNI file format.
> EdkCompatibilityPkg is obsolete. BZ
> https://na01.safelinks.protection.outlook.com/?url=http
> s%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D
> 1103&amp;data=02%7C01%7Csean.brogan%40microsoft.com%7C5
> ffeb105737e4c00150208d6453fa46a%7C72f988bf86f141af91ab2
> d7cd011db47%7C1%7C0%7C636772536983024335&amp;sdata=LOLe
> zJzuK9kwu8QK78UM5nnCD%2FZEY5fxr1VQzk8sqY8%3D&amp;reserv
> ed=0 is submitted to delete EdkCompatibilityPkg from edk2/master. We 
> will work on it.
> 
> EDK II Unicode files are used for mapping token names to localized 
> strings that are identified by an RFC4646 language code. The format 
> for storing EDK II Unicode files on disk is UTF-8 (without a BOM 
> character) or UTF-16LE (with a BOM character). The character content 
> must be UCS-2.
> 
> Thanks
> Liming
> >-----Original Message-----
> >From: edk2-devel [mailto:edk2-devel-
> [email protected]] On Behalf Of
> >Sean Brogan via edk2-devel
> >Sent: Thursday, November 08, 2018 7:00 AM
> >To: [email protected]
> >Subject: [edk2] Edk2 uni file encoding
> >
> >Is there a definitive answer for the file encoding for
> all UNI files in edk2?
> >If not I would like to propose one.  Incorrect
> encoding causes tool
> >issues and is something we can easily check for and
> fix.
> >
> >Proposal: All UNI files in edk2 should be
> >
> >
> >  1.  UTF-8
> >Or
> >
> >  1.  Use a BOM and be UTF-16
> >
> >https://na01.safelinks.protection.outlook.com/?url=htt
> ps%3A%2F%2Fen.wik
> >ipedia.org%2Fwiki%2FByte_order_mark&amp;data=02%7C01%7
> Csean.brogan%40mi
> >crosoft.com%7C5ffeb105737e4c00150208d6453fa46a%7C72f98
> 8bf86f141af91ab2d
> >7cd011db47%7C1%7C0%7C636772536983024335&amp;sdata=1IET
> 4LN5l9FfMscffzgk0
> >t7IqYGyYNU9IrZafvi9osU%3D&amp;reserved=0
> >
> >Results from searching edk2:
> >1 - UTF-16 LE BOM file:
> >EdkCompatibilityPkg\Compatibility\FrameworkHiiOnUefiHi
> iThunk\Strings.un
> >i
> >919 - Without BOM and decoded as UTF-8
> >
> >Thoughts?
> >
> >Future question:  Can we make rule for all other
> standard file types
> >(c, h, dec, dsc, fdf, inf,)?
> >
> >Thanks
> >Sean
> >
> >
> >
> >_______________________________________________
> >edk2-devel mailing list
> >[email protected]
> >https://na01.safelinks.protection.outlook.com/?url=htt
> ps%3A%2F%2Flists.
> >01.org%2Fmailman%2Flistinfo%2Fedk2-
> devel&amp;data=02%7C01%7Csean.brogan
> >%40microsoft.com%7C5ffeb105737e4c00150208d6453fa46a%7C
> 72f988bf86f141af9
> >1ab2d7cd011db47%7C1%7C0%7C636772536983024335&amp;sdata
> =HhfPaCyS0sKHu1fF
> >Gkfh%2FQ4pm34X68YKiaM6IN7%2Fzj0%3D&amp;reserved=0
> _______________________________________________
> edk2-devel mailing list
> [email protected]
> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to