On Thursday 14 September 2017 08:21:42 H.Merijn Brand wrote:
> On Wed, 13 Sep 2017 13:27:58 -0700, Darren Duncan
> <dar...@darrenduncan.net> wrote:
> 
> > On 2017-09-13 12:58 PM, Dan Book wrote:
> > > On Wed, Sep 13, 2017 at 3:53 AM, Peter Rabbitson wrote:
> > >
> > >     On 09/12/2017 07:12 PM, p...@cpan.org wrote:
> > >         And here is promised script:
> > >     <snip>
> > >
> > >     The script side-steps showcasing the treatment of BLOB/BYTEA columns, 
> > > which
> > >     was one of the main ( albeit not the only ) reason the userbase lost 
> > > data.
> > >
> > >     Please extend the script with a BLOB/BYTEA test.
> > >
> > > I'm not sure how to usefully make such a script, since correct insertion 
> > > of BLOB
> > > data (binding with the SQL_BLOB type or similar) would work correctly both
> > > before and after the fix.  
> > 
> > Perhaps the requirement of the extra tests is to ensure that BLOB/BYTEA 
> > data is 
> > NOT mangled during input or output, that on input any strings with a true 
> > utf8 
> > flag are rejected and that on output any strings have a false utf8 flag.  
> > Part 
> > of the idea is to give regression testing that changes regarding Unicode 
> > handling with text don't inadvertently break blob handling. -- Darren Duncan
> 
> BYTE/BLOB/TEXT tests require three types of data
> 
> • Pure ASCII
> • Correct UTF-8 (with complex combinations)

subtest: Correct UTF-8 TEXT with only code points in range U+00 .. U+7F (ASCII 
subset)
subtest: Correct UTF-8 TEXT with only code points in range U+00 .. U+FF (Latin1 
subset)

> • Pure binary
>   - random bytes ranging 0x00..0xFF
>   - Images (png, jpg, gif, tiff)
> 
> None of those is hard to generate.
> 
> 1. create two tables with a field of the type to check
> 2. insert the data in table 1
> 3. use SQL to copy the data to table 2
> 4. extract data from table 2
> 5. compare to original data
> 6. drop tables
> 7. goto 1
> 
> If MySQL support different ways to do this, test all ways (in the
> string, placeholders, bind_columns, bind_params, other ...)

MySQL server and its databases has some limitations, so reflect it:

* it does not provide information if placeholder is TEXT, VARCHAR, VARBINARY or 
BLOB
* placeholder's bind value does not have to point to column, it can be also SQL 
function
  --> for caller/user all placeholders are equivalent and caller itself
      needs to know how to treat bind variable and needs to specify if
      it is TEXT or BLOB

* VARBINARY is right padded with 0x00
  --> there is no difference between binary "\x01\x02\x00" and "\x01\x02"

And note that perl itself does not distinguish between "binary bytes"
and "unicode string". User itself needs to know what he should expect in
perl scalar and how to handle it.

And if you are thinking about utf8 flag of scalar (utf8::is_utf8()) to
use as way how to distinguish character and binary data, then do not try
to do it.

In (updated) documentation in utf8.pm module is:
https://metacpan.org/pod/release/WOLFSAGE/perl-5.27.3/lib/utf8.pm

  Don't use this flag as a marker to distinguish character and binary
  data, that should be decided for each variable when you write your
  code.

Reply via email to