Jordan, I did some investigations into this issue a while ago and even prototyped some backwards compatible BaseTools changes.
Looks like a problem several of us have evaluated. The .uni file extension is used for few purposes today. One is generation of HII String Packages and another is providing localized strings to describe packages, modules, and PCDs. The UEFI Specification uses the UCS-2 character representation for the HII String Packages, and the binary encoding for the HII String Packages uses 16-bits per character. Many UEFI APIs and data structures also use CHAR16 strings which are 16-bits per character also use the UCS-2 character representation. The initial build tools used to generate UEFI conformant HII String Packages required a source file that used the UTF16-LE file format with a BOM. Because this source file format has been used for a long time, there may be a large amount of platform specific UNI file content in the UTF16-LE file format, and there may also be custom tools for managing UNI source file contents that assume UNI source files are in the UTF16-LE file format. As a result, I think it is very important for BaseTools to continue to provide backwards compatibility with UNI source files in the UTF16-LE file format. I do not think a separate file extension is required because the current UNI source files are required to have the BOM for UTF16-LE. BaseTools could be extended in a backwards compatible manner to support additional source file formats (i.e. UTF-8 with BOM, UTF-8 without a BOM, UTF16-BE, and even the UTF32 file formats if there was a request) using the same .uni file extension. The BOMs make the detection of the source file encoding unambiguous. The BOM character can only be removed if the source and target of the string content already can assume the format. Since we are discussion extensions to BaseTools, we can define what assumptions are allowed. I would prefer that only UTF-8 encoded files be allowed to drop the BOM. I tend to agree with the observation that there are more editors, source control systems, and diff/merge utilities that tend to work better with UTF-8. If there are really good reasons for the .uni file format for the EDK II open source project to prefer UTF-8 due to requirements from development tools the EDK II community prefers, then an additional helper tool may be required to convert all .uni files in a source tree to UTF16-LE. This helper tool could be used when UDK releases are made or could be used by developers that use custom tools that assume .uni source files are UTF16-LE. Best regards, Mike -----Original Message----- From: Justen, Jordan L Sent: Monday, May 04, 2015 3:02 PM To: edk2-devel@lists.sourceforge.net; Brian J. Johnson; Kinney, Michael D Subject: Re: [edk2] [PATCH 0/9] Support UTF-8 (.utf8) string files On 2015-05-04 14:32:48, Brian J. Johnson wrote: > On 05/04/2015 02:11 PM, Laszlo Ersek wrote: > > I do think such files should be distinguished with a separate filename > > suffix. > > Yes. Otherwise developers will get confused why some ".uni" files work > with their tools, and some do not. Mike, what do you think about Laszlo and Brian's feedback that a separate extension should be used? Laszlo mentioned that we can't use the lack of the UTF-16-LE BOM because that is supposed to be interpreted as a UTF16 file with BE characters. Brian mentioned that developers may try to use utf-8 .uni files with older tools, and get errors. > Nice work, Jordan! Thanks Brian. Any Reviewed-by from you for the series? :) -Jordan ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ edk2-devel mailing list edk2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/edk2-devel