UniClassObject: Verify string data is 16-bit

Laszlo Ersek Mon, 04 May 2015 00:49:54 -0700

On 05/04/15 08:14, Jordan Justen wrote:
> With .uni UTF-16 files, it is impossible for unicode code points to be
> larger than 0xFFFF.
> 
> To support .utf8 UTF-8 unicode files, we need to also deal with the
> possibility that the UTF-8 file contains unicode code points larger
> than 16-bits. Since UEFI only supports 16-bit string data, we make
> UniClassObject generate an error if a larger code point is seen in a
> UTF-8 string value.
> 
> We only check string value data, so it is possible to use larger code
> points in comments.
> 
> Cc: Yingke D Liu <yingke.d....@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com>
> ---
>  BaseTools/Source/Python/AutoGen/UniClassObject.py | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/BaseTools/Source/Python/AutoGen/UniClassObject.py 
> b/BaseTools/Source/Python/AutoGen/UniClassObject.py
> index df30c48..fef7135 100644
> --- a/BaseTools/Source/Python/AutoGen/UniClassObject.py
> +++ b/BaseTools/Source/Python/AutoGen/UniClassObject.py
> @@ -259,6 +259,14 @@ class UniFileClassObject(object):
>                      self.OrderedStringDict[LangName][Item.StringName] = 
> len(self.OrderedStringList[LangName]) - 1
>          return True
>  
> +    def Verify16bitCodePoints(self, String):
> +        for cp in String:
> +            if ord(cp) > 0xffff:


Not sure how the current (ie. pre-patch) tree regulates surrogate pairs,
but since you're adding this check anyway, I'd propose to explicitly
reject surrogate codes.

http://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane

http://en.wikipedia.org/wiki/UTF-16#U.2B10000_to_U.2B10FFFF

Thanks
Laszlo

> +                tmpl = 'The string {} defined in file {} ' + \
> +                       'contains a character with a code point above 0xFFFF.'
> +                error = tmpl.format(repr(String), self.File)
> +                EdkLogger.error('Unicode File Parser', FORMAT_INVALID, error)
> +
>      #
>      # Get String name and value
>      #
> @@ -280,6 +288,7 @@ class UniFileClassObject(object):
>                  Language = LanguageList[IndexI].split()[0]
>                  Value = 
> LanguageList[IndexI][LanguageList[IndexI].find(u'\"') + len(u'\"') : 
> LanguageList[IndexI].rfind(u'\"')] #.replace(u'\r\n', u'')
>                  Language = GetLanguageCode(Language, self.IsCompatibleMode, 
> self.File)
> +                self.Verify16bitCodePoints(Value)
>                  self.AddStringToList(Name, Language, Value)
>  
>      #
> @@ -433,6 +442,7 @@ class UniFileClassObject(object):
>                      MatchString = re.match('[A-Z0-9_]+', Name, re.UNICODE)
>                      if MatchString == None or MatchString.end(0) != 
> len(Name):
>                          EdkLogger.error('Unicode File Parser', 
> FORMAT_INVALID, 'The string token name %s defined in UNI file %s contains the 
> invalid lower case character.' %(Name, self.File))
> +                self.Verify16bitCodePoints(Value)
>                  self.AddStringToList(Name, Language, Value)
>                  continue
>  
> 


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel

Re: [edk2] [PATCH 6/9] BaseTools/UniClassObject: Verify string data is 16-bit

Reply via email to