Thought that, for reference, I'd add some of the things that is needed
when converting an existing app/database to utf-8 (utf8). Actually
altering the table definitions, and connection configurations does not
save data already inserted when the charset was set all wrong.

How to convert existing data was a bit hard to find out. Here is what
I found in a comment in some blog I can't remember now. It ia a bit
tedious but it works as long as you can shut down your application for
a few minutes.

ALTER TABLE my_table CHANGE my_field my_field TEXT CHARACTER SET
latin1;
ALTER TABLE my_table CHANGE my_field my_field BLOB;
ALTER TABLE my_table CHANGE my_field my_field TEXT CHARACTER SET utf8;

The trick apparently is to first change the charset of the field back
to what the database had when the data was inserted. Then you change
it to a blob (no charset for binary data) and finally to the desired
charset. The tedious part is doing this for many fields in many
tables... but it sure saved my backside.

The only alternative I have found is to do a search for each messed up
character and doing a replace on them. But then you must first find
each character that has been mangled by the charset change.
For example:
UPDATE my_table SET my_field = replace(my_field,'√•','å');


/Martin


On Jun 16, 9:53 am, "[EMAIL PROTECTED]"
<[EMAIL PROTECTED]> wrote:
> Yeah, I got that the reason is linguistic in its origin. It is great
> when trying to search a mass of text. But when you try to do a
> matching search for an exact string it does complicate things a lot
> when you still think that = really means exactly equal.
>
> Doing WHERE username = 'myname' I (as a programmer) never ever want to
> match anything else but exactly that.
> Doing WHERE article LIKE '%cake%' I would not at all be this critical
> or surprised since it is a different kind of searching in my world.
>
> Also, I was under the mistaken impression that COLLATE was ONLY
> related to how to sort these special characters. This I have not
> problem with either btw. Previously, I had no idea that collation also
> affected simple matching searches.
>
> The equal sign has a special place in my heart. :)
> I guess the binary collation will be my preference for general data.
>
> Do you have any advice for a web-application with multiple languages?
> You can only take advantage of the linguistic advantages as long as
> the language in the data and the collation match. How would cater for,
> say, a blog in both german and french? Set the database defaults to
> general or binary and then add COLLATE utf8_french_ci to the queries?
>
> thanks
> Martin
>
> On Jun 13, 4:28 pm, "Jonathan Snook" <[EMAIL PROTECTED]> wrote:
>
> > > A am a bit shocked that it is a "feature" when å is the same as a in
> > > MySQL. That sounds just plain wrong to me. If it had been so for
> > > utf8_some_special_ci, fine, but not for general (the default default)
> > > collations. To me that would be like PHP saying (1 == 1.2) is true
> > > because it is "close enough". :) Very strange but I guess they must
> > > have some very good reason for it.
>
> > It's not really the same thing and yes, there's a very good reason.
> > Most languages, diacritics are meant to alter the pronunciation of a
> > letter. In other words, e, é and è are the same "letter" but have
> > different pronunciations because of the accent marks. Therefore, when
> > a French person does a search for a word, they might simply type in
> > "ecole" but they fully expect école to show up. Another example, I
> > live in a city known as Orléans but has been known as Orleans (note
> > the lack of accent) for a number of years (they only recently added
> > the accent back in where it belongs). However, a search for Orleans
> > should bring up either result. Also, collations determine how content
> > is ordered when results are returned. Take Ecole A, ecole B, École C
> > and école D. How should that be ordered? The _ci indicates
> > case-insensitive so we get the order we expect (as I've listed). It'd
> > be pretty confusing to do a search and get ecole B, Ecole A, [the rest
> > of the latin character results], école D, École C.
>
> > I hope that explains it a little better.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"CakePHP" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to