Konnichiwa, On 8/12/06, Jonathan Rockway <[EMAIL PROTECTED]> wrote:
> The first unicode breakage I had was when I added Japanese-style dates > as timestamps on the pages. (Japanese day-name character in > parenthesis.) What's weird was, adding this to the page worked fine -- > but it broke OTHER unicode characters on the page (sourced from a file > or file attribute). Adding "use utf8" to the top of my source file > fixed my problems, on Linux anyway. (Never tried on OpenBSD.) Sounds like a traditional "Unicode string + UTF-8 bytes = BOOM" problem. To solve that, you should handle everything in Unicode string (utf-8 flagged), or everything in utf-8 bytes (utf::encode($str)). Mixing the two breaks the other one. But it's sometimes hard, since some CPAN modules don't care about Unicode string and just return strings in utf-8 bytes. > The next problem I noticed was that C::V::TT::ForceUTF8 broke TT's "uri" > filter. According to the HTML validator, URIs can't be unicode, so you > have to encode the URI to UTF-8. TT's URI filter was documented to do > this, but it translated anything with the 8th bit set to nothing, Yeah, Template::Stash::ForceUTF8 and Template::Provider::Encoding is made just to fix that issue. Interesting to hear that TT uri filter gets borked by that. Any working code that shows the breakage? BTW we use Stash::ForceUTF8 and Provider::Encoding on our production boxes and they work fine. > Any way I can tell perl, "trust me, everything is already UTF-8... don't > #^$ing touch it."? encoding::warnings might be for your help. Not sure if it works actually, but the documentation would be a great help at least. http://search.cpan.org/~audreyt/encoding-warnings-0.10/ -- Tatsuhiko Miyagawa _______________________________________________ List: [email protected] Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/[email protected]/ Dev site: http://dev.catalyst.perl.org/
