Dear Perl gurus,
This is my first post. I'm using Perl with great joy, and I'd like to
express my
gratitude for all you are doing to keep Perl stable and fun to use.
I'd like to ask to object to re-releasing this version and discuss on
how to
make 4.043 backwards compatible instead.
This change will with 100% certainty corrupt all BLOB data written to
the
database when the developer did not read the release notes before
applying the
latest version of DBD::mysql (and changed its code consequently).
Knowing that sysadmins have the habit of not always reading the
release notes of
each updated package the likelihood that this will happen will
therefore high.
I myself wasn't even shown the release notes as it was a dependency
of an
updated package that I applied.
The exposure of this change is big as DBD::mysql affects multiple
applications
and many user bases.
I believe deliberately introducing industry wide database corruption is
something that will significantly harm peoples confidence in using Perl.
I believe that not providing backwards compatibility is not in line
with the
Perl policy that has been carefully put together by the community to
maintain
the quality of Perl as it is today.
http://perldoc.perl.org/perlpolicy.html#BACKWARD-COMPATIBILITY-AND-DEPRECATION
I therefore believe the only solution is an upgrade that is by
default backwards
compatible, and where it is the user who decides when to start UTF8
encode the
input values of a SQL request instead.
If it is too time consuming or too difficult it should be considered
to park the
UTF8-encoding "fix" and release a version with the security fix first.
I have the following objections against this release:
1. the upgrade will corrupt more records than it fixes (it does more
harm than good)
2. the reason given for not providing backward compatibility
("because it was
hard to implement") is not plausible given the level of unwanted side
effects.
This especially knowing that there is already a mechanism in place
to signal
if its wants UTF8 encoding or not
(mysql_enable_utf8/mysql_enable_utf8mb4).
3. it costs more resources to coordinate/discuss a "way forward" or
options than
to implement a solution that addresses backwards compatibility
4. it is unreasonable to ask for changing existing source knowing
that depending
modules may not be actively maintained or proprietary
It can be argued that such module should always be maintained but
it does not
change the fact that a good running Perl program becomes unusable
5. it does not inform the user that after upgrading existing code
will start
write corrupt BLOB records
6. it does not inform the user about the fact that a code review of
all existing
code is necessary, and how it needs to be changed and tested
7. it does not give the user the option to decide how the BLOB's
should be
stored/encoded (opt in)
8. it does not provide backwards compatibility
By doing so it does not respect the Perl policy that has been
carefully put
together by the community to maintain the quality of Perl as it is
today.
http://perldoc.perl.org/perlpolicy.html#BACKWARD-COMPATIBILITY-AND-DEPRECATION
9. it blocks users from using DBD::mysql upgrades as long as they
have not
rewritten their existing code
10. not all users from DBD::mysql can be warned beforehand about the
side
effects as it is not known which private parties have code that use
DBD::mysql
12. I believe development will go faster when support for backwards
compatibility is addressed
13. having to write 1 extra line for each SQL query value is a monks
job that
will make the module less attractive to use
About forking to DBD::mariadb?:
The primary reason to create such a module is when the communication
protocol of
Mariadb has become incompatible with Mysql.
To use this namespace to fix a bug in DBD::mysql does not meet that
criteria and
causes confusion for developers and unnecessary pollution of the DBD
namespace.
---
For people that do not know the impact of the change that is pending
to be
committed:
(see Github issue that includes 3 reports of companies that suffered
data loss
https://github.com/perl5-dbi/DBD-mysql/issues/117 )
Issue: some UTF8 characters are not properly displayed after retrieval
Cause: SQL query values are not UTF8 encoded when sent to the
database but they
are all decoded once retrieved.
Occurence: Only records with string data that can only be written
with UTF8. It
can be considered rare as people haven't reported this issue after 10
years of
usage.
Regional impact: Only affects countries which characters need UTF8
encoding and
only affects string values.
Steps to recover from it: Read string data unencoded and write it
encoded.
Changes of upgrade pending to be re-released:
SQL query values are both UTF8 encoded when sent to the database as
when its
retrieved (including BLOB fields).
BLOB fields will be excluded from encoding only if you specify its
data type.
Side effects from installing upgrade:
- BLOB data will be written after UTF8 encoding and will therefore be
corrupt
- no possibility to detect if a BLOB field is corrupt or not. Only
when known
when the INSERT/UPDATE took place, and when the upgrade was installed
- existing data will still display incorrect
Occurence: every INSERT/UPDATE statement will start writing corrupted
BLOB data
Regional impact: worldwide
Steps to recover from it corrupted BLOBs? You cannot. Your binary
blobs are
encoded as if they were UTF8 strings. Your binary data is
unrecoverable (as in
"gone forever").
If you are a dentist you have to ask your customers to come back to
make another
x-ray as the made photo's are gone.
What is asked from the developer to prevent this from happening?
- do not miss reading the release notes before upgrading
- review all source code (including written by other included
modules) and
specify the data type of each SQL parameter value
before: $dbh->do('INSERT INTO test (BLOB1,BLOB2,BLOB3,BLOB4)
VALUES(?,?,?,?)',undef,$col1,$col2,$col3);
after: $dbh->do('INSERT INTO test (BLOB1,BLOB2,BLOB3,BLOB4)
VALUES(?,?,?,?)');
$sth->bind_param(1, $file, SQL_BLOB);
$sth->bind_param(2, $file, SQL_BLOB);
$sth->bind_param(3, $file, SQL_BLOB);
...
One line more for each SQL statement. This will be a time consuming
monks task
during which the user will ask why this is necessary while it worked
before.
- upgrade scripts need to be written to UTF8 encode existing string data
- retest all source code