Larry -
Thanks for the cleanup.
Let me see whether our current extensions would be appropriate for a proposal.
If not, I'll find something that is appropriate.
Tim
From: Hauch, Larry [mailto:larry.ha...@intel.com]
Sent: Tuesday, August 19, 2014 2:54 PM
To: edk2-devel@lists.sourceforge.net
Subject: Re: [edk2] [RFC] EDK II UNI Unicode File Specification
Hi Tim,
I agree with your proposal for #1. - See highlighting in the original RFC text
(add and delete).
For #2 & 3, After looking through the UEFI specifications, the escape character
sequences are not defined for the code, so I have removed the majority of them.
The only ones that are now listed in the EBNF are the ones that are currently
processed by the EDK II build system (see
Source\Python\AutoGen\UniClassObject.py).
Actually, I would like to see a proposal that would update both this spec and
our tools for alternate methods for specifying the control codes listed in UEFI
Spec v2.4B (section 29.2.6.2.4). Using something like "\" (a-zA-Z)+ ";" that
uses the semi-colon as the terminator for the control code, or use hypertext
markings like: "<" ["\"] (a-zA-Z)+ ">" as an additional method that must be
supported by the EDI II Build tools. Since the tools have to convert these into
the Double-Byte encodings specified in the UEFI Spec, they do need to be
defined in a table.
I would also suggest that we do not add any unterminated control code sequences
for any new content that must be supported. (Only \wide, \narrow and \nbr along
with the standard \n, \r and \t escapes sequences would ever be non-terminated).
Cheers,
Larry
From: Tim Lewis [mailto:tim.le...@insyde.com]
Sent: Monday, August 18, 2014 4:17 PM
To: edk2-devel@lists.sourceforge.net<mailto:edk2-devel@lists.sourceforge.net>
Subject: Re: [edk2] [RFC] EDK II UNI Unicode File Specification
Larry -
The description of the extensions for modules/package abstracts/description are
much better.
Here are a few comments which are not specific to your update (although they
are contained in the text below)
1. It is readable. I do think that adding <> terminals for single
characters makes it harder to read, but otherwise the text is clear. Why not
"/" instead of <FS> and "(" instead of <LP>?
2. I don't think there is any UEFI spec requirement that a \endbold be
preceded by a \bold. Since the font for any string may include the bold
attribute, it may be that the \endbold might be desirable. This is further
complicated by the fact the the .UNI specification doesn't not provide
font-select capabilities.
3. The current escape character mechanism prevents future expansion,
because the escape sequences are neither fixed length nor well-delimited.
Consider what would happen if someone wanted to add \bolder to the grammar.
This would make older strings suspect, since it could be interpreted as "\bold"
and "er" or "\bolder" I mentioned this long ago.
Tim
From: Hauch, Larry [mailto:larry.ha...@intel.com]
Sent: Monday, August 18, 2014 3:54 PM
To: edk2-devel@lists.sourceforge.net<mailto:edk2-devel@lists.sourceforge.net>
Subject: [edk2] [RFC] EDK II UNI Unicode File Specification
Hi Folks,
Here are the proposed changes to the EDK II UNI Unicode File Specification.
Hopefully, HTML format for the chapters will be easier to review and respond
with feedback.
Please provide feedback by the end of this week (22 Aug. 2014).
Updates:
* Updated EBNF to follow syntax specified in EBNF by the ANTLR project
* Added content related to EDK II Meta-Data Unicode files
* Restructured document
* Removed security and C format GUID definitions, not required for HII
or other UNI files.
Cheers,
Larry
2
Unicode Strings File Format
EDK II Unicode files are used for mapping token names to localized strings that
are identified by an RFC4646 language code. The format for storing EDK II
Unicode files is UTF-16LE. The character content must be UCS-2.
Strings ends are determined by the first of the following items found:
* a control character
* a comment
* the end of the file
* a blank line
Comments may appear anywhere within the string file.
All the files must begin with a Unicode BOM character.
Note: Please make sure you select an editor that supports UCS-2 characters
that can be stored in a UTF-16LE file.
2.1 2. 1 Common EBNF
The following EBNF uses quoted (double quotes) encapsulated characters to
represent UCS-2 string literals. In the following definitions, the semi-colon
is used to denote a comment.
<US> ::= \u0020" " ; Space Character
<FW> ::= \u0027 ; Forward Slash, /
<LP> ::= \u0028 ; Left Parenthesis, (
<RP> ::= \u0029 ; Right Parenthesis, )
<Letter> ::= {(\u0041-\u005A)} ; Characters A - Z
{(\u0061-\u007A)} ; Characters a - z
<Digit> ::= (\u0030-\u0039) ; Characters 0 - 9
<UN> ::= \u005F ; Underscore Character, _
<MS> ::= <US>+
<ME> ::= {<MS>} {<EOL>}
<CommentLine> ::= <FW> <FW>"//" <US>* <PCHars> <EOL>
<BlankLine> ::= <EOL>
<Chars> ::= (\u0001-\uF6FF)
<PChars> ::= (\u0020-\uF6FF)
<VChars> ::= (\u0021-\uF6FF)
<UnicodeLines> ::= <Token> <ME>
[<Ldef> [<String> <ME>]+]+
<Ldef> ::= <CtrlChar> "language" <MS> <LangCode> <ME>
<HexDigitU> ::= {<Digit>}
{(\u0041-\u0046)} ; Characters A - F
{(\u0061-\u0066)} ; Characters a - f
<CtrlChar> ::= \u0023"#" ; Hash Character, #
<Token> ::= <CtrlChar> "string" <MS> <Identifier>
<Identifier> ::= <Letter> [{<Letter>} {<Digit>} {<UN>}]*
<LangCode> ::= <RFC4646>
<DH> ::= (\u002D)"-" ; Dash Character, -
<RFC4646> ::= <Letter>{2,8} [<ShortExt> <LongExt>*]
<ShortExt> ::= <DH>"-" [{<Letter>} {<Digit>}]{1,8}
<LongExt> ::= <DH>"-" [{<Letter>} {<Digit>}]{1,}
<UDblQuote> ::= \u0022 ; Double Quote Character, "
<String> ::= <UDblQuote> <SContent>* <UDblQuote>
<SContent> ::= {<PChars>} {<Attributes>} {<CtrlCode>}
<Attributes> ::= <StartAttribute> <SContent>*
[<StopAttribute>]
<StartAttribute> ::= <AttrCtrlChar> <FontAttr>
<AttrCtrlChar> ::= \u005C"/" ; Backslash Character, \
<StopAttribute> ::= <AttrCtrlChar> "end" <FontAttr>
<FontAttr> ::= {<SimpleAttrs>} {<StandardAttrs>}
<SimpleAttrsAttributes> ::= "\" {"narrow">} {"wide"} {UDblQuote}
{"n"} {"r"} {"t"} {"nbr"} {"\"} {"'"}
<StandardAttrs> ::= {"normal"} {"bold"} {"italic"}
{"emboss"}
{"shadow"} {"underline"} {"dblunder"}
<CtrlCode> ::= <EscChar> {"n"} {"f"} {"r"} {"p"}
{"ospace"} {"enquad"} {"emquad"}
{"ensp"} {"emsp"} {"em3sp"} {"em4sp"}
{"em6sp"} {"usp"} {"tsp"} {"hsp"}
{"msp"} {"!bsp"} {"!nbsp"}
{"zsp"} {"ah"} { "hy"} { "df"} {"den"}
{"dem"} {"!bh"} {"g"} {"osp"} {"k"}
<EscChar> ::= \u005D"\" ; Backslash Character, \
2.1.1 Definitions
LanguageCodes
The language code must be a valid RFC4646 language code.
EscChar
In order to include some standard characters, such as the "\" back-slash
character within a string, the character must be prefixed with the escape
character. Characters that may require a prefixed escape character include the
following, back slash "\" character, single-quote "'" character, double-quote
'"' character and the forward slash "/" character. The back slash always
requires the escape character.
StandardAttrs
The standard font attribute, "normal" was not defined in the UEFI
Specification; however it has been proposed and is included here. Additional
attributes defined in the UEFI Specification, such as double underline
(dblunder), did not have the double-byte encoding for the character mapping,
however recommendations have been given for these characters (see
below).
Token
The token (strong identifier) may only contain numbers, upper and lower case
letters, underscore character, and dash character.
Include
An include line is used to parse another file, also compliant with this
specification, as if it was in the file. The tokens should not overlap between
the file for the same language.
Table 1 HII Double-Byte Encoding Map
String
Double-Byte Encoding
String
Double-Byte Encoding
\bold
0xF620
\endbold
0xF621
\italic
0xF622
\enditalic
0xF623
\underline
0xF624
\endunderline
0xF625
\dblunder
0xF62A
\enddblunder
0xF62B5
\emboss
0xF6265
\endemboss
0xF6275
\shadow
0xF6285
\endshadow
0xF6295
\n (newline)
0x2028
\f (formfeed)
0x000C
\r (carriage return)
0x000D
\p (paragraph separator)
0x2029
\ospace (ogham space mark)
0x1680
\enquad
0x2000
\emquad
0x2001
\ensp (en space)
0x2002
\emsp
0x2003
\em3sp (three-per-em space)
0x2004
\em4sp
0x2005
\em6sp
0x2006
\usp (punctuation space)
0x2008
\tsp (thin space)
0x2009
\hsp (hair space)
0x200A
\msp (medium math space)
0x205F
\!bsp (no-break space)
0x00A0
\!nbsp (narrow no-break space)
0x0202F
\zsp (zero width space)
0x200B
\ah (Armenian hyphen)
0x058A
\hy (hyphen)
0x2010
\df (figure dash)
0x2012
\den (en dash)
0x2013
\dem (em dash)
0x2014
\!bh (non-breaking hyphen)
0x2011
\g (Tibetan mark intersyllabic tsheg)
0x0F0B
\osp (Ethiopic wordspace)
0x1361
\k (Khmer sign bariyoosan)
0x17D5
3
HII String Packs
Unicode files used for creating HII String Packs have the following format:
<StringFileFormat> ::= <CommentLine>*
<LanguageDefs>
<Content>+
The following EBNF describes content is specific to the Unicode files used for
generating HII String Packs.
<Content> ::= {<CommentLine>} {<BlankLine>}
{<UnicodeLines>} {<ControlRefactor>}
{<LanguageDefs>} {<SecurityLines>}
{<IncludeLines>}
Additional Definitions used for Unicode files used to create HII String Packs.
<LanguageDefs> ::= <CtrlChar> "langdef" <MS> <LangCode> <MS>
<LangDesc> <EOL>
<LangDesc> ::= <UDblQuote> <Chars> <UDblQuote>
<IncludeLines> ::= <CtrlChar> "include" <UniFile> <EOL>
<UniFile> ::= <UDblQuote> <UniFilename> <UDblQuote>
<UniFilename> ::= <FilenameChars> <MoreFNameChars>* {".uni"}
{".UNI"}
<FilenameChars> ::= {<Letter>} {<Digit>}
<MoreFNameChars> ::= {<Letter>} {<Digit>} {<UN>"_"}
<DefaultCtrlChar> ::= <FW>"/"
<EQ> ::= \u003D ; Equal Character, =
<ControlRefactor> ::= <CtrlChar> <EQ>"=" <NewCtrlChar> <EOL>
<NewCtrlChar> ::= (0x0021 - 0xF6FF)
Note: Unicode files that are used for generating HII String Packs are the
only type of Unicode file that allows for refactoring the control character
(providing backward compatibility), <CtrlChar>.
3.1 Example file:
//
// Cpu I/O Strings
//
// Copyright (c) 2006, Intel Corporation. All rights reserved.<BR>
//
// This program and the accompanying materials are licensed and made
// available under the terms and conditions of the BSD License which
// accompanies this distribution. The full text of the license may be
// found at:
// http://opensource.org/licenses/bsd-license.php
//
// THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
// WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS
// OR IMPLIED.
//
/=#
#langdef en-US "English, US"
#langdef fr-FR "Français"
#string STR_PROCESSOR_VERSION
#language en-US
"NT32 Emulated Processor"
#language fr-FR
"Processeur Émulé par NT32"
4 Meta-Data UNI Files
In order to support distributions conforming to the UEFI PI Distribution
Package Specification, Unicode files may be used to contain localization
content passed along in the XML file for content that cannot be passed using
ASCII characters.
Literal strings (encapsulated by double quotation marks) in the following ENBF
represent UCS-2 encoded character strings.
The format of the Unicode files that contain the optional Module and Package
localization content for distribution is as follows:
<MetaUniFile> ::= <CommentLine>*
<MetaData>+
<MetaData> ::= {<CommentLine>} {<BlankLine>} {<UnicodeLines>}
Additional Definitions used for Package Meta-Data <Identifier> entry of a
<Token> used in the Unicode file.
<CtrlChar> ::= "#"
<ErrorNumber> ::= <HexDigitU>{1,8}
<PcdName> ::= <CName> <UN>"_" <CName>
<CName> ::= <Letter>({<Letter>} {<Digit>})*
Refer to Chapter 2.1, Unicode Strings File Format, Extended Backus-Naur Form
(EBNF) for the definitions of CommentLine, BlankLine and UnicodeLines.
It is also recommended that the comment section at the start of the files
(described in the following sections of this chapter) use content consistent
with content described for EDK II meta-data file headers, including a start tag
line, "// @file", and include an abstract, description, copyright and license
information.
4.1 4.1 Module Meta-Data
If a Module's INF file contains a MODULE_UNI_FILE entry in its [Defines]
section, then the Unicode file specified may contain localization extensions
for information found in the Module's Abstract, Description, Copyright and
Licenses part of the @file header in described in the "EDK II Module
Information (INF) File Specification".
The following <Identifier> entries are reserved for extending the Module's
Abstract and Description content.
"STR_MODULE_ABSTRACT"
"STR_MODULE_DESCRIPTION"
If a Module's INF file contains a Unicode file entry in its
[UserExtensions.TianoCore."ExtraFiles"] section, then that Unicode file may
contain a localized version of a name for the module as well as other content.
This file is used to hold content that is not required by UEFI PI Distribution
Package, but may be useful for User Interface tools.
The following <Identifier> may be used to extend the name of the module.
"STR_PRORPERTIES_MODULE_NAME"
Other content may be provided in this file as the file itself will be carried
along with the Module in a UEFI PI Distribution Package.
4.2 4.2 Package Meta-Data
If a Package's DEC file contains a PACKAGE_UNI_FILE entry in its [Defines]
section, , then the Unicode file specified may contain localization extensions
for information found in the Module's Abstract, Description, Copyright and
Licenses part of the @file header in described in the "EDK II Package
Declaration (DEC) File Specification". It may also contain content relevant to
PCDs declared in the package.
The following <Identifier> entries are reserved for extending the Package's
Abstract and Description content.
"STR_PACKAGE_ABSTRACT"
"STR_PACKAGE_DESCRIPTION"
The following <Identifier> is reserved for extending the localization of a
Token Space GUID's error messages that are referenced by an error number. The C
Name is the Token Space GUID's C Name declared in the DEC file's [Guids]
section.
"STR_" <CName> "_ERR_" <ErrorNumber>
The following <Identifier> entries are reserved for extending the localization
of a PCD's @HELP and @PROMPT content.
"STR_" <PcdName> "_HELP"
"STR_" <PcdName> "_PROMPT"
If a Package's DEC file contains a Unicode file entry in its
[UserExtensions.TianoCore."ExtraFiles"] section, then that Unicode file may
contain a localized version of a name for the package as well as other content.
This file is used to hold content that is not required by UEFI PI Distribution
Package, but may be useful for User Interface tools.
The following <Identifier> may be used to extend the name of the package.
"STR_PRORPERTIES_PACKAGE_NAME"
Other content may be provided in this file as the file itself will be carried
along with the Package in a UEFI PI Distribution Package.
------------------------------------------------------------------------------
Slashdot TV.
Video for Nerds. Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel