On 06/04/15 08:42, Jordan Justen wrote:
> Surrogate pair characters can be encoded in UTF-8 files, but they are
> not valid UCS-2 characters.
> 
> For example, this python interpreter code:
>>>> import codecs
>>>> codecs.encode(u'\ud801', 'utf-8')
> '\xed\xa0\x81'
> 
> But, the range of 0xd800 - 0xdfff should be rejected as unicode code
> points because they are reserved for the surrogate pair usage in
> UTF-16 files.
> 
> We test that this case is rejected for UTF-8 with and without the
> UTF-8 BOM.
> 
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com>
> Cc: Yingke D Liu <yingke.d....@intel.com>
> Cc: Michael D Kinney <michael.d.kin...@intel.com>
> Cc: Laszlo Ersek <ler...@redhat.com>
> ---
>  BaseTools/Tests/CheckUnicodeSourceFiles.py | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/BaseTools/Tests/CheckUnicodeSourceFiles.py 
> b/BaseTools/Tests/CheckUnicodeSourceFiles.py
> index 102dc3c..2eeb0f5 100644
> --- a/BaseTools/Tests/CheckUnicodeSourceFiles.py
> +++ b/BaseTools/Tests/CheckUnicodeSourceFiles.py
> @@ -139,6 +139,30 @@ class Tests(TestTools.BaseToolsTest):
>  
>          self.CheckFile('utf_8', shouldPass=False, string=data)
>  
> +    def testSurrogatePairUnicodeCharInUtf8File(self):
> +        #
> +        # Surrogate Pair code points are used in UTF-16 files to
> +        # encode the Supplementary Plane characters. In UTF-8, it is
> +        # trivial to encode these code points, but they are not valid
> +        # code points for characters, since they are reserved for the
> +        # UTF-16 Surrogate Pairs.
> +        #
> +        # This test makes sure that BaseTools rejects these characters
> +        # if seen in a .uni file.
> +        #
> +        data = '\xed\xa0\x81'
> +
> +        self.CheckFile(encoding=None, shouldPass=False, string=data)
> +
> +    def testSurrogatePairUnicodeCharInUtf8FileWithBom(self):
> +        #
> +        # Same test as testSurrogatePairUnicodeCharInUtf8File, but add
> +        # the UTF-8 BOM
> +        #
> +        data = codecs.BOM_UTF8 + '\xed\xa0\x81'
> +
> +        self.CheckFile(encoding=None, shouldPass=False, string=data)
> +
>  TheTestSuite = TestTools.MakeTheTestSuite(locals())
>  
>  if __name__ == '__main__':
> 

Reviewed-by: Laszlo Ersek <ler...@redhat.com>

------------------------------------------------------------------------------
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel

Reply via email to