For future reference, here is what I did to fix the encoding problem:

MariaDB [phpbugsdb]> select sdesc from bugdb where id=76553;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| sdesc

             |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Ð˜Ð¼Ñ Ð¿ÐµÑ€ÐµÐ¼ÐµÐ½Ð½Ð¾Ð¹ может Ñ Ð¾Ð´ÐµÑ€Ð¶Ð°Ñ‚ÑŒ управлÑ
ющие
                |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

MariaDB [phpbugsdb]> alter table bugdb drop index email;
Query OK, 76298 rows affected (0.85 sec)
Records: 76298  Duplicates: 0  Warnings: 0

MariaDB [phpbugsdb]> alter table bugdb modify sdesc varbinary(80) NOT NULL
DEFAULT '', modify ldesc binary NOT NULL, modify email varbinary(40) NOT
NULL DEFAULT '';
Query OK, 76298 rows affected, 65535 warnings (0.65 sec)
Records: 76298  Duplicates: 0  Warnings: 76091

MariaDB [phpbugsdb]> alter table bugdb modify sdesc varchar(80) CHARACTER
SET utf8mb4 NOT NULL DEFAULT '', modify ldesc text CHARACTER SET utf8mb4
NOT NULL, modify email varchar(40) CHARACTER SET utf8mb4 NOT NULL DEFAULT
'';
Query OK, 76298 rows affected, 127 warnings (0.57 sec)
Records: 76298  Duplicates: 0  Warnings: 127

MariaDB [phpbugsdb]> alter table bugdb add FULLTEXT INDEX `email`
(`email`,`sdesc`,`ldesc`);
Query OK, 76298 rows affected (1.56 sec)
Records: 76298  Duplicates: 0  Warnings: 0

MariaDB [phpbugsdb]> select sdesc from bugdb where id=76553;
+----------------------------------------------------------------------------------+
| sdesc
       |
+----------------------------------------------------------------------------------+
| Имя переменной может содержать управляющие
        |
+----------------------------------------------------------------------------------+
1 row in set (0.00 sec)

The trick was to convert the columns to binary first. When I went straight
from latin1 to utf8 I got the utf8 equivalent of the latin1 characters. By
telling it that the data was actually binary first, it converted from
binary to utf8 which appears to have worked. There were some warnings,
which I assume are invalid utf8 byte sequences somewhere.

-Rasmus

Reply via email to