Re: [Mason] Feature request: adding "use utf8" as default to obj files.

Oliver Paukstadt Thu, 28 Jun 2012 14:28:11 -0700

On Tue, 2012-06-26 at 18:35 +0200, A Kobame wrote:

> 4.)
> everything what is coming from Pack (bytes) should be decoded into
> internal unicode-characters when entering to Mason. This is in the
> Stackoveflow's answer totally wrong (stated as coming from you, Jon).
> :)
> 
> Wrong, because the %params are Hash::Multivalue and not a simple
> %params as in the good-old Mason1. Unfortunately I havnen't any idea
> whrere is the best place decode every element form the query.
> 
> Reading Mason manual giving me an idea than that should be done
> somewhere in the "render", in the content wrapping chain. But, maybe
> i'm totally wrong.
> 
> The basic idea in the next source fragment (from the Stackoverflow) is
> partially OK.
> 
> around 'run' => sub {
>     my $orig = shift;
>     my $self = shift;
> 
>     my %params = @_;
>     while (my ($key, $value) = each(%params)) {
>         $value = decode_utf8($value);
>     }
>     $self->$orig(%params);
> }
> 
> But, it is probably taken from Mason1 solution. For Mason2 it is
> needed to be rewrited for Plack's Hash::Multivalue. In the current
> form it is (IMO) useless and wrong. (decoding only blessed refs), not
> the keys & values...
> In short - he plugin should decode bytes to characters:
> http://example.com/index?ááá=ééé&other=úúú regardless how them coming
> (GET/POST)
> 
I had a closer look into this issue some weeks ago because I could not
pass utf8 encoded input fields to my app while porting from Mason1 to
Mason2.


As far as I remember Plack::Request uses HTTP::Request which is based on
HTTP::Message. Plack::Request drops the information about the charset
provided in the message and passes the values as Hash::Multivalue using
bytes. From my point of view the damage is done in Plack::Request and
HTML::Mason has to deal with the bytes not knowing about the
HTTP-Headers sent before, which is not possible.

There are several options for the charset and how to interpret the
bytes. As far as I remember HTML 4.0 recommends UTF8 encoding for URLs
in %-notation. I am not sure if this is only for the path but for the
GET parameters as well. In POST-Requests the Content-Type has to define
the charset if not iso-8859-1, so several defaults are around.

Looks like the first sections provide a good overview on the problem:
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding

To cut and paste some nice unicode characters you can use
http://www.decodeunicode.org/ ☺

Regards,
Oliver Paukstadt

-- 
Oliver Paukstadt <pst...@sourcentral.org>


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Mason-users mailing list
Mason-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mason-users

Re: [Mason] Feature request: adding "use utf8" as default to obj files.

Reply via email to