Jordan,

I did some investigations into this issue a while ago and even prototyped some 
backwards compatible BaseTools changes.  

Looks like a problem several of us have evaluated.

The .uni file extension is used for few purposes today.  One is generation of 
HII String Packages and another is providing localized strings to describe 
packages, modules, and PCDs.  The UEFI Specification uses the UCS-2 character 
representation for the HII String Packages, and the binary encoding for the HII 
String Packages uses 16-bits per character.  Many UEFI APIs and data structures 
also use CHAR16 strings which are 16-bits per character also use the UCS-2 
character representation.

The initial build tools used to generate UEFI conformant HII String Packages 
required a source file that used the UTF16-LE file format with a BOM.  Because 
this source file format has been used for a long time, there may be a large 
amount of platform specific UNI file content in the UTF16-LE file format, and 
there may also be custom tools for managing UNI source file contents that 
assume UNI source files are in the UTF16-LE file format.  As a result, I think 
it is very important for BaseTools to continue to provide backwards 
compatibility with UNI source files in the UTF16-LE file format.

I do not think a separate file extension is required because the current UNI 
source files are required to have the BOM for UTF16-LE.  BaseTools could be 
extended in a backwards compatible manner to support additional source file 
formats (i.e. UTF-8 with BOM, UTF-8 without a BOM, UTF16-BE, and even the UTF32 
file formats if there was a request) using the same .uni file extension.  The 
BOMs make the detection of the source file encoding unambiguous.  The BOM 
character can only be removed if the source and target of the string content 
already can assume the format. Since we are discussion extensions to BaseTools, 
we can define what assumptions are allowed.  I would prefer that only UTF-8 
encoded files be allowed to drop the BOM.

I tend to agree with the observation that there are more editors, source 
control systems, and diff/merge utilities that tend to work better with UTF-8.

If there are really good reasons for the .uni file format for the EDK II open 
source project to prefer UTF-8 due to requirements from development tools the 
EDK II community prefers, then an additional helper tool may be required to 
convert all .uni files in a source tree to UTF16-LE.  This helper tool could be 
used when UDK releases are made or could be used by developers that use custom 
tools that assume .uni source files are UTF16-LE. 

Best regards,

Mike




-----Original Message-----
From: Justen, Jordan L 
Sent: Monday, May 04, 2015 3:02 PM
To: edk2-devel@lists.sourceforge.net; Brian J. Johnson; Kinney, Michael D
Subject: Re: [edk2] [PATCH 0/9] Support UTF-8 (.utf8) string files

On 2015-05-04 14:32:48, Brian J. Johnson wrote:
> On 05/04/2015 02:11 PM, Laszlo Ersek wrote:
> > I do think such files should be distinguished with a separate filename
> > suffix.
> 
> Yes.  Otherwise developers will get confused why some ".uni" files work 
> with their tools, and some do not.

Mike, what do you think about Laszlo and Brian's feedback that a
separate extension should be used?

Laszlo mentioned that we can't use the lack of the UTF-16-LE BOM
because that is supposed to be interpreted as a UTF16 file with BE
characters.

Brian mentioned that developers may try to use utf-8 .uni files with
older tools, and get errors.

> Nice work, Jordan!

Thanks Brian. Any Reviewed-by from you for the series? :)

-Jordan
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel

Reply via email to