Sean, As a clarification. The UNI specs does list 2 on-disk formats. This was done so tools could support both in the transition from UTF-16LE with BOM to UTF-8 without BOM.
The strong recommendation is for all EDK II open source packages to use UTF-8 without a BOM. Since platform packages not maintained in EDK II could be pulling forward UNI files in UTF-16LE, we have not changed the UNI spec or tools to consider UTF-16LE as unsupported. Doing patch email reviews of UNI files in UTF-16LE is a challenge so requiring UTF-8 without a BOM make this much easier. The EDK II open source package conversion to UTF-8 without a BO was performed in late 2015. Here is one example: https://github.com/tianocore/edk2/commit/3f5287971ffdb5c42e3325a3a94c101f08d3a02a#diff-14d2171dacfcac1fd2e1b1f7b885e530 A helper python script was added to help perform these conversions: https://github.com/tianocore/edk2/blob/master/BaseTools/Scripts/ConvertUni.py At some point, it may make sense to *require* UTF-8 without a BOM for all UNI files and all tools and for tools to reject UNI files that are not in UTF-8 without a BOM format. Mike > -----Original Message----- > From: edk2-devel [mailto:edk2-devel- > boun...@lists.01.org] On Behalf Of Sean Brogan via > edk2-devel > Sent: Wednesday, November 7, 2018 11:11 PM > To: Gao, Liming <liming....@intel.com> > Cc: edk2-devel@lists.01.org > Subject: Re: [edk2] Edk2 uni file encoding > > Liming, > That was exactly what I was looking for. > > Thanks > Sean > > > > > -----Original Message----- > From: Gao, Liming <liming....@intel.com> > Sent: Wednesday, November 7, 2018 10:01 PM > To: Sean Brogan <sean.bro...@microsoft.com> > Cc: edk2-devel@lists.01.org > Subject: RE: Edk2 uni file encoding > > Sean: > EDKII UNI spec > (https://na01.safelinks.protection.outlook.com/?url=htt > ps%3A%2F%2Fgithub.com%2Ftianocore%2Ftianocore.github.io > %2Fwiki%2FEDK-II- > Specifications&data=02%7C01%7Csean.brogan%40microso > ft.com%7C5ffeb105737e4c00150208d6453fa46a%7C72f988bf86f > 141af91ab2d7cd011db47%7C1%7C0%7C636772536983024335& > sdata=veov60rbEtr3ub7RcreuFuqJvc4%2BdtAowph7kBGXW54%3D& > amp;reserved=0) Chapter 2 defines UNI file format. > EdkCompatibilityPkg is obsolete. BZ > https://na01.safelinks.protection.outlook.com/?url=http > s%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D > 1103&data=02%7C01%7Csean.brogan%40microsoft.com%7C5 > ffeb105737e4c00150208d6453fa46a%7C72f988bf86f141af91ab2 > d7cd011db47%7C1%7C0%7C636772536983024335&sdata=LOLe > zJzuK9kwu8QK78UM5nnCD%2FZEY5fxr1VQzk8sqY8%3D&reserv > ed=0 is submitted to delete EdkCompatibilityPkg from > edk2/master. We will work on it. > > EDK II Unicode files are used for mapping token names > to localized strings that are identified by an RFC4646 > language code. The format for storing EDK II Unicode > files on disk is UTF-8 (without a BOM character) or > UTF-16LE (with a BOM character). The character content > must be UCS-2. > > Thanks > Liming > >-----Original Message----- > >From: edk2-devel [mailto:edk2-devel- > boun...@lists.01.org] On Behalf Of > >Sean Brogan via edk2-devel > >Sent: Thursday, November 08, 2018 7:00 AM > >To: edk2-devel@lists.01.org > >Subject: [edk2] Edk2 uni file encoding > > > >Is there a definitive answer for the file encoding for > all UNI files in edk2? > >If not I would like to propose one. Incorrect > encoding causes tool > >issues and is something we can easily check for and > fix. > > > >Proposal: All UNI files in edk2 should be > > > > > > 1. UTF-8 > >Or > > > > 1. Use a BOM and be UTF-16 > > > >https://na01.safelinks.protection.outlook.com/?url=htt > ps%3A%2F%2Fen.wik > >ipedia.org%2Fwiki%2FByte_order_mark&data=02%7C01%7 > Csean.brogan%40mi > >crosoft.com%7C5ffeb105737e4c00150208d6453fa46a%7C72f98 > 8bf86f141af91ab2d > >7cd011db47%7C1%7C0%7C636772536983024335&sdata=1IET > 4LN5l9FfMscffzgk0 > >t7IqYGyYNU9IrZafvi9osU%3D&reserved=0 > > > >Results from searching edk2: > >1 - UTF-16 LE BOM file: > >EdkCompatibilityPkg\Compatibility\FrameworkHiiOnUefiHi > iThunk\Strings.un > >i > >919 - Without BOM and decoded as UTF-8 > > > >Thoughts? > > > >Future question: Can we make rule for all other > standard file types > >(c, h, dec, dsc, fdf, inf,)? > > > >Thanks > >Sean > > > > > > > >_______________________________________________ > >edk2-devel mailing list > >edk2-devel@lists.01.org > >https://na01.safelinks.protection.outlook.com/?url=htt > ps%3A%2F%2Flists. > >01.org%2Fmailman%2Flistinfo%2Fedk2- > devel&data=02%7C01%7Csean.brogan > >%40microsoft.com%7C5ffeb105737e4c00150208d6453fa46a%7C > 72f988bf86f141af9 > >1ab2d7cd011db47%7C1%7C0%7C636772536983024335&sdata > =HhfPaCyS0sKHu1fF > >Gkfh%2FQ4pm34X68YKiaM6IN7%2Fzj0%3D&reserved=0 > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel