On Thu, 14 Sep 2017 09:44:54 +0200, p...@cpan.org wrote:

> > BYTE/BLOB/TEXT tests require three types of data
> > 
> > • Pure ASCII
> > • Correct UTF-8 (with complex combinations)  
> 
> subtest: Correct UTF-8 TEXT with only code points in range U+00 .. U+7F 
> (ASCII subset)
> subtest: Correct UTF-8 TEXT with only code points in range U+00 .. U+FF 
> (Latin1 subset)

ASCII:            U+000000 .. U+00007F
iso-8859-*:     + U+000080 .. U+0000FF (includes cp1252)
iso-10646:      + U+000100 .. U+0007FF
                + U+000800 .. U+00D7FF
                + U+00E000 .. U+00FFFF
utf-8 1):       + U+010000 .. U+10FFFF
                + surrogates
                + bidirectionality
                + normalization
                + collation (order by)

1) some iso-10646 implementations already support supplementary
   codepoints. Depends on the version of the standard

With 100% Unicode, data my go bust if stored in UTF-8 fields

Unify defines a "correct" order of combined characters. I don't know
exactly what the order is, but if a letter has more than one combined
characters in it, like

 ờ U01edd \N{LATIN SMALL LETTER O WITH HORN AND GRAVE}
 ȭ U0022d \N{LATIN SMALL LETTER O WITH TILDE AND MACRON}

inserting "LATIN SMALL LETTER O" "WITH GRAVE" "WITH HORN"
is allowed to return as "LATIN SMALL LETTER O" "WITH HORN" "WITH GRAVE"
or as "LATIN SMALL LETTER O WITH GRAVE" "WITH HORN" or
"LATIN SMALL LETTER O WITH HORN" "WITH GRAVE" or
"LATIN SMALL LETTER O WITH HORN AND GRAVE"

They all represent the same grapheme. From a user perpective when
dealing with Unicode, that is fine. From a testing purpose this is
not :(

So, *if* you test with combining characters (that do not represent in a
single codepoint) make sure it matches the Unicode defined order

FYI This is why I still don't support *real* binary in perl6' Text::CSV

-- 
H.Merijn Brand  http://tux.nl   Perl Monger  http://amsterdam.pm.org/
using perl5.00307 .. 5.27   porting perl5 on HP-UX, AIX, and openSUSE
http://mirrors.develooper.com/hpux/        http://www.test-smoke.org/
http://qa.perl.org   http://www.goldmark.org/jeff/stupid-disclaimers/

Attachment: pgpHUIoEfQuKu.pgp
Description: OpenPGP digital signature

Reply via email to