Not a unicode one that I know of, converting it to latin1 for the grouping
works for that particular use case, but I can't make any promises how it'd
work on your entire set which may hold any unicode character, a lot of
which cannot be converted to latin1:

mysql> SET NAMES utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE test ( foo VARCHAR(3)) ENGINE=InnoDB COLLATE=utf8_bin;
Query OK, 0 rows affected (0.14 sec)

mysql> SELECT GROUP_CONCAT(foo) FROM test GROUP BY foo;
Empty set (0.00 sec)

mysql> INSERT INTO test VALUES ('Ete'),('été'),('ete');
Query OK, 3 rows affected (0.05 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> SELECT *, GROUP_CONCAT(foo) FROM test GROUP BY foo;
+-------+-------------------+
| foo   | GROUP_CONCAT(foo) |
+-------+-------------------+
| Ete   | Ete               |
| ete   | ete               |
| été   | été               |
+-------+-------------------+
3 rows in set (0.00 sec)

mysql> SELECT *, GROUP_CONCAT(foo) FROM test GROUP BY foo COLLATE
utf8_general_ci;
+------+-------------------+
| foo  | GROUP_CONCAT(foo) |
+------+-------------------+
| Ete  | Ete,été,ete       |
+------+-------------------+
1 row in set (0.00 sec)

mysql> SELECT *, GROUP_CONCAT(foo) FROM test GROUP BY CONVERT(foo USING
latin1) COLLATE latin1_general_ci;
+-------+-------------------+
| foo   | GROUP_CONCAT(foo) |
+-------+-------------------+
| Ete   | Ete,ete           |
| été   | été               |
+-------+-------------------+
2 rows in set (0.00 sec


If you entire dataset fits in latin1, creating the table as such might be
the best solution in this case entirely, depending on the environment.
Another option is just to use utf8_bin as collation, but grouping by
LOWER(yourcolumnname), or if that's not enough performance, denormalizing
into an extra lowercase column.


On Mon, Nov 24, 2014 at 11:36 PM, Martin Mueller <
martinmuel...@northwestern.edu> wrote:

> Is there a unicode setting on mysql that is case insensitive but
> diacritics sensitive? Given 'Ete', 'été',  'ete' a group by routine for
> such a setting would return two values: 'été',  'ete'.  I couldn't find
> it, but I may not have known where to look.
>
> Martin Mueller
>
> Professor emeritus of English and Classics
> Northwestern University
>
>
> >
>

Reply via email to