Re: New server where strings get utf8 encoded before being saved to MySQL. Any hints for me?

David C. Zentgraf Thu, 12 Jun 2008 17:51:28 -0700

You're mixing up a lot of terms and concepts here.
It's not about double-encoding, it's about interpreting the data. A  
string 'ABCD' is not saved as 'ABCD', it's saved as 1101001001011001  
(just pulling that out of my behind for illustrational purposes, no  
guarantee for accuracy). "1101001001011001" in the latin1 charset  
means "[EMAIL PROTECTED]&", in shift-jis it means "馬鹿" and in UTF8 it means  
'ABCD'. Setting the right encoding basically means that you're telling  
your application and database how to *interpret* the data. And if just  
one part of the database/db-connection/application chain is handling  
data in a different encoding, it'll get screwed up.


> selecting "halla" would find both rows with "halla" and
> "hallå".

That's the point of MySQL's UTF8 handling, it knows about similar  
characters. That's why there are a bunch of different UTF8 collations.  
Please read the manual to figure out which collation would serve you  
best: http://dev.mysql.com/doc/refman/5.1/en/charset-general.html

> This caused a world of problems since user
> names were not unique anymore when searching the database.

Not sure if a non-alphanumeric username is such a good idea, for  
exactly these reasons. For uniqueness you should stick to the lowest  
common denominator (basic latin letters), unless you really know what  
you're doing.

> What I learned from that was that MySQL can ignore table-collation and
> table charsets. I had to alter my tables and modify all varchar fields
> to include charset and collation settings.

MySQL doesn't ignore collations, it does exactly what you tell it to  
do. And yes, it's always a good idea to explicitly specify charsets,  
otherwise some default will be used and cause unintended consequences.  
That's what you should have learned.

Chrs,
Dav

> On Jun 12, 10:17 am, leo <[EMAIL PROTECTED]> wrote:
>> I'm inclined to agree. I had the same problem. All my MySql
>> installations default to Swedish, so need to be converted. Have a  
>> look
>> at this 
>> page:http://ragrawal.wordpress.com/2008/02/15/dummys-guide-for-converting- 
>> ...
>>
>> Be aware that MySql is 'utf8' whereas everything else is utf-8 or
>> UTF-8. I don't know how important the case is.
>>
>> On Jun 12, 9:06 am, "David C. Zentgraf" <[EMAIL PROTECTED]> wrote:
>>
>>> I'd say it's the other way around.
>>> Text gets stored in some non-UTF8 format in the database (latin-1  
>>> most
>>> likely), so it comes back as gobbledygook. When you explicitly force
>>> the gobbledygook to be interpreted as UTF8, it's being morphed back
>>> into what it was meant to be.
>>
>>> Check your database settings and set all collations to UTF8.
>>
>>> On 12 Jun 2008, at 15:58, [EMAIL PROTECTED] wrote:
>>
>>>> Hi,
>>>> I have started putting a project onto a rented "production server"
>>>> running a pretty standard Ubuntu installation AFAIK.
>>>> I have been able to fix some initial problems with charsets but one
>>>> thing I just can seem to figure out.
>>
>>>> On my dev system
>>>> $this->params['form']['hello_text'] // in MySQL = hallå
>>
>>>> On production system
>>>> $this->params['form']['hello_text'] // inMySQL = hallÃ¥
>>>> utf8_decode( $this->params['form']['hello_text'] ) // inMySQL =  
>>>> hallå
>>
>>>> Looks like the text gets encoded to utf8 before being put into the
>>>> database. Problem is that the text is already utf8 so I get these
>>>> nasty characters.
>>
>>>> Does anyone know what triggers Cake or PHP to encode the text  
>>>> before
>>>> sending it to MySQL?
>>
>>>> Any pointer in the right direction would be great. thanks.
>>
>>>> Martin
> >


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"CakePHP" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: New server where strings get utf8 encoded before being saved to MySQL. Any hints for me?

Reply via email to