-- Pádraic Brady <[EMAIL PROTECTED]> wrote
(on Friday, 05 October 2007, 06:49 AM -0700):
> The email text you sent showed:
> <h1>La Ca�a de Az�car</h1>
> 
> Which means my client doesn't recognise what you pasted into your message as
> UTF-8.
> Did the output show the actual accented letters? Entities? or the same marks 
> as
> above?
> 
> Based on the headers being correct, it looks more like the original text not
> being properly encoded - as far as I know ZF does no string manipulation that
> would corrupt the encoding. 

Correct. If you're using Zend_View, you're using whatever encoding PHP
is using at the time. If you're unsure what that is, then make sure you
set the appropriate php.ini variables to ensure you get UTF-8. And, as
Paddy has noted previously, make sure you're sending the charset in your
HTTP headers.

> I'd suggest taking the file you have with the string, re-saving it as
> UTF-8, and then deleting and re-typing the text (sounds really weird,
> but some editors/IDEs can be pretty bad at re-opening files in the
> wrong encoding without you noticing, esp. on Windows).

I'm guessing that the above is the issue. My own experience working with
UTF-8 has shown that the editing environment has a lot to do with
whether or not characters are mangled. Some rules of thumb:

  * Don't cut-and-paste from Word documents. The character sets and
    fonts Word uses are non-standard.

  * Make sure your editor is UTF-8 capable, and that you have UTF-8 set
    as your character set. Yes, I know I work at Zend, but Zend Studio
    is remarkably good at this; I can't speak for other IDEs as I don't
    typically use an IDE.

  * If using a console-based editor, such as Vim, make sure that your
    console is speaking UTF-8. I had issues using PuTTY with Vim for a
    long time because I didn't realize I didn't have the encoding set
    properly.


> ----- Original Message ----
> From: Roberto Bouza <[EMAIL PROTECTED]>
> To: Pádraic Brady <[EMAIL PROTECTED]>
> Sent: Friday, October 5, 2007 2:09:21 PM
> Subject: Re: [fw-general] UTF-8 and Views... weird chars
> 
> Padraic,
> 
> Here is the source:
> -------
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " http://
> www.w3.org/TR/2000/REC-xhtml1-20000126/DTD/xhtml1-transitional.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
> <head>
>     <meta http-equiv="content-type" content="text/html; charset=utf-8" />
>     <title>La Ca�a de Az�car</title>
> </head>
> <body>
> 
> <h1>La Ca�a de Az�car</h1>
> Encoding: UTF-8<br>
> La Caa de Azcar<br>
> </body>
> </html>
> --------
> 
> But what I'm worried about is that on a plain PHP file I can see the encoding
> just fine.
> 
> I'm using the mb_* just as a test. I was using plain controller/view with
> escape but I never saw the chars fine so I was just testing around.
> 
> I checked also with Live Headers. Here is the output (I see it fine, but maybe
> you can catch something).
> 
> -----
> GET / HTTP/1.1
> 
> Host: www.artcubbies.com
> 
> User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.7) 
> Gecko
> /20070914 Firefox/2.0.0.7
> 
> Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/
> plain;q=0.8,image/png,*/*;q=0.5
> 
> Accept-Language: en-us,en;q=0.5
> 
> Accept-Encoding: gzip,deflate
> 
> Accept-Charset: ISO-8859-1,utf-8;q= 0.7,*;q=0.7
> 
> Keep-Alive: 300
> 
> Connection: keep-alive
> 
> Cache-Control: max-age=0
> 
> 
> 
> HTTP/1.x 200 OK
> 
> Date: Fri, 05 Oct 2007 13:08:18 GMT
> 
> Server: Apache/2.2.4 (Fedora)
> 
> X-Powered-By: PHP/5.2.4
> 
> Content-Length: 428
> 
> Connection: close
> 
> Content-Type: text/html; charset=utf-8
> ------
> 
> Thanks for the tips with the Headers.
> 
> Thank you for your help.
> 
> 
> On 10/5/07, Pádraic Brady <[EMAIL PROTECTED]> wrote:
> 
>     Open up your source view for the bad output and check what it contains.
> 
>     If you want to display UTF-8, you need to maintain a UTF-8 encoding at all
>     times. I'm also not sure about all the mb_* contortions you're going
>     through. Why are you converting from UTF-8 to the IANA HTML-ENTITIES? Use
>     Zend_View's escape() method which will run it through htmlspecialchars()
>     and leave any multibyte characters intact.
> 
>     I would also avoid using header() directly - there's a response object
>     floating around your controller layer which can be used to handle headers.
>     You can access it from within a base controller or from the front
>     controller. This is one likely suspect. See:
>     http://framework.zend.com/manual/en/zend.controller.response.html#
>     zend.controller.response.headers
> 
>     Do you have tools like Live Headers (Firefox extension) to view the actual
>     headers being sent to your browser? The question marks are a sure sign 
> that
>     the header encoding and output encoding are not matching causing the
>     browser to substitute ? for characters it sees as being malformed.
> 
>     Paddy
>      
>     Pádraic Brady
> 
>     http://blog.astrumfutura.com
>     http://www.patternsforphp.com
>     OpenID Europe Foundation Member-Subscriber
> 
> 
>     ----- Original Message ----
>     From: Roberto Bouza <[EMAIL PROTECTED]>
>     To: [email protected]
>     Sent: Friday, October 5, 2007 5:41:24 AM
>     Subject: [fw-general] UTF-8 and Views... weird chars
> 
>     Hello Everyone.
> 
>     I've been battling with this for a while. Now I need some help to try to
>     figure this out.
> 
>     I just want to show on a view accented characters or something like:
> 
>     $str = "La Caña de Azúcar"
> 
>     If I use PHP by itself it works fine. Ex:
> 
>     $str2 = mb_convert_encoding($str, 'HTML-ENTITIES', 'UTF-8');
>     echo $str2
>     echo $str
> 
>     That code on PHP returns "La Caña de Azúcar" (twice) now if do that piece
>     of code on Zend Framework on the controller like:
> 
>     $this->view->title = $str;
>     $this->view->encoding = "Encoding: " . mb_detect_encoding($str);
>     $this->view->converted = mb_convert_encoding($str, 'HTML-ENTITIES',
>     mb_detect_encoding($str));
> 
>     I get:
> 
>     La Ca?a de Az?car
>     Encoding: UTF-8
>     La Caa de Azcar
> 
>     I've set up on the view:
> 
>     $this->setEncoding('UTF-8');
> 
>     I have set up the header:
> 
>     header('Content-Type: text/html; charset=utf-8');
> 
>     No luck. Any help would be greatly appreciated.
> 
>     Thank you.
> 
> 
> 
> 
> 
> 
> 
>     
> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>     Don't let your dream ride pass you by. Make it a reality with Yahoo! 
> Autos.
> 
> 
> 
> 
> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> Tonight's top picks. What will you watch tonight? Preview the hottest shows on
> Yahoo! TV.

-- 
Matthew Weier O'Phinney
PHP Developer            | [EMAIL PROTECTED]
Zend - The PHP Company   | http://www.zend.com/

Reply via email to