Re: [Mason] Character set confusion with flush_buffer

John Williams Wed, 06 Dec 2006 10:55:58 -0800

utf8 handling in perl is still fraught with peril, but I won't go into
all the reasons for that here.

What you are probably seeing is perl auto-upgrading your strings to utf8
when it concatenates a non-utf8 string with a utf8 string.  This
concatenation is happening in HTML::Mason::Request::print when it
concatenates each string into the buffer.

So if a module such as XML::RSS is returning utf8-encoded strings, it will
infect all your other strings with the utf8-ness.  Because Mason
concatenates all the strings, the first time a utf8-encoded string is
output, it will cause the buffer string to be upgraded.  A good test to
confirm whether this is the case would be to remove the flush_buffer and
see if the first string also becomes utf8.  It was not upgraded before
because the buffer was flushed before a utf8 string was added to it.

Unfortunately I do not know of any elegant solution for this.  You could
patch your version of Mason like this to make sure the string never get
upgraded, but that is terribly ugly and I would not do that myself unless
there were bugs in perl itself[1] which prevented me from using a better
method.

--- HTML-Mason-1.33/lib/HTML/Mason/Request.pm
+++ Request.pm
@@ -1168,7 +1168,10 @@
         );

     # use 'if defined' for maximum efficiency; grep creates a list.
+    use Encode;
+    Encode::_utf8_off($$bufref);
     for ( @_ ) {
+       Encode::_utf8_off($_);
         $$bufref .= $_ if defined;
     }

~ John Williams

[1] such as <http://rt.perl.org/rt3//Public/Bug/Display.html?id=36248>

On Wed, 6 Dec 2006, Vegard Vesterheim wrote:

> I have experienced a puzzling behaviour with the use of
> flush_buffer. I use iso-latin-1 encoding on my pages, but I have a
> specific Mason page which immediately after a call to flush_buffers
> starts producing utf-8 encoded content.
>
> This is a snippet from the page which exhibits this behaviour
> ----- snip - snip -------------------------------------------------
> <h1>øæåØÆÅ</h1>
> % $m->flush_buffer;
> <h1>øæåØÆÅ</h1>
> ----- snip - snip -------------------------------------------------
> The first H1 content is correctly encoded as iso-latin-1, but the
> second is utf-8.
>
> This page is rather complex, and includes among other things some RSS
> processing (XML::RSS). I will try to produce a smaller test case which
> reproduces the problem, but until then I was wondering if anyone on
> this list could explain this behaviour.
>
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Mason-users mailing list
Mason-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mason-users

Re: [Mason] Character set confusion with flush_buffer

Reply via email to