On Mon, Mar 5, 2012 at 3:54 PM, Tim Brody <t...@ecs.soton.ac.uk> wrote:
> On Sat, 2012-03-03 at 15:25 +0700, Bill Moseley wrote: > > I use HTTP::Request::Common to build an application/x-www-form-urlencoded > > POST from a passed-in hash. The hash contains strings as values. > > > > $req = POST /foo, \%parameters; > > > > > <snip> > > > > The thing to notice here is how the encoding for $latin1 changed just > > because of the addition into the hash of the $unicode string. Things > thus > > break when the server tries to decode the query parameters on the server > > side if it assumes either latin1 or utf8 encoding. > I'm sorry, I made a mistake in my example. I meant: $req = POST /foo, Content => \%parameters It's the request *body* that I'm talking about, not the query parameters. I understand this is a known issue in URI: The escaping (percent encoding) of chars in the 128 .. 255 range passed to the URI constructor or when setting URI parts using the accessor methods depend on the state of the internal UTF8 flag (see utf8::is_utf8) of the string passed. *If the UTF8 flag is set the* * UTF-8 encoded version of the character is percent encoded. If the* * UTF8 flag isn't set the Latin-1 version (byte) of the character is* * percent encoded. This basically exposes the internal encoding of* * Perl strings. * And because the same character string can be represented either without (latin1) or with (utf8) the UTF8 flag, and Perl can upgrade character strings from latin1 to utf8 w/o me knowing, I cannot be be sure exactly what percent encoding will be used. It's not really a bug, rather it's just not clear what percent encoding will be used. That is, with two hashes passed to query_form, the second only with the addition of the "unicode" key with the value that has the utf8 flag true, changes the percent encoding of "latin1" value. { ascii => $ascii, latin1 => $latin1 } { ascii => $ascii, latin1 => $latin1, unicode => $unicode } Gisle, If I were to override query_form do you see any problems with either of these approaches to make sure that the final percent encoding is always of utf8 encoded octets? One by explicitly encoding everything to utf8 first: my %encoded_params = map { uri_escape( encode_utf8($_) ) } %{$params}; my $query = join '&', map { "$_=$encoded_params{$_}" } keys %encoded_params; Or the other approach would be to utf8::upgrade each value (and key) in the hash and let URI's query_param to work as-is. -- Bill Moseley mose...@hank.org