On 5/19/05, Randy Kobes <[EMAIL PROTECTED]> wrote:
> On Wed, 18 May 2005, Jay Savage wrote:
> 
> > On 5/18/05, angie ahl <[EMAIL PROTECTED]> wrote:
> > > I can confirm that it's happening before the data's gone
> > > to the database or anything. I'm getting the params from
> > > CGI.pm and then decoding via decode("utf8", $v) The page
> > > the params came from is set as utf-8 in the http header
> > > and> content type and firefox is believing the page is
> > > utf-8.> > It looks as though the browser isn't sending
> > > the data as UTF-8 unless> it contains text that has to
> > > be. As soon as I add a € or some other character that's
> > > utf-8 it comes through fine. Checking the params before
> > > it's decoded showed the £ as I expected to see it after
> > > if had been decoded leading me to think the form hasn't
> > > been passed as utf-8 . Any clues.....  anyone?
> 
> > That sounds about right.  Most (english) browsers default
> > to Latin-1even when they say they don't.  Make sure you
> > have "enctype" set inthe opening form tag. If it still
> > doesn't work, you'll need to figureout (or as the client)
> > what the encoding is, and translate itmanipulating the
> > layers and/or encodings. But the bottom line is: if you're
> > not putting utf-8 in at some point,you won't get utf-8
> > out.
> 
> For
>  http://perl.wtsbroadcast.com/about/Angies_second_test_page.html
> if (in Firefox on Win32) I set
>    View -> Character Encoding -> Western (Windows-1252)
> I get the £ displayed.
> 
> --
> best regards,
> randy kobes
> 

Just for the record it was the browser passing the form params as
Latin unless there was a character that couldn't be represented in
Latin. Then it would do as it was told and pass it as utf-8

in the end I had to use Encode::Guess to see if it was utf-8 if so
decode as that otherwise decode as iso-8859-1.

To make it a tiny bit more stable, and after a lot of trial and error
I ended up doing this.

1.Concat all the values that were passed in the form into one string.
2.Run Encode::Guess on that in order to give it enough data to have a
fair crack at it.
If $decoder is set use it to decode for values, otherwise use iso-8859-1.

Not very pretty I grant you, but the only thing that does actually
work seeing as the browser wont pass values as utf-8 all the time. Or
maybe it's the OS that's entering the text as  iso-8859-1.

HTH someone someday.

Reply via email to