Hello Jean-Christophe, Am 12.06.2013 um 16:44 schrieb Jean-Christophe Boggio:
> Hello, > > Can someone help me understand what could cause this : > > warn "\$content : ".(utf8::is_utf8($content) ? "utf8" : "not utf8"); > warn "\$ticketdata[0]->[0] : ".(utf8::is_utf8($ticketdata[0]->[0]) ? "utf8" : > "not utf8"); > warn "content4=$content"; > if ($ticketdata[0]->[0] ne $content) { > warn "content5=$content"; > # > warn "content6=$content stored=".$ticketdata[0]->[0]; > warn "content7=$content"; > } > [...] > I guess the problem comes from the fact that on the same line I have one > utf-8 variable and one non-utf8 one. > > $content comes from $fdat{content} (not marked as utf8 while the page > encoding is declared and recognized as utf-8). > > What can I do to force embperl to always set the utf-8 flag on $fdat{...} ? > > If you know a way of telling Apache/EmbPerl that no encoding other than UTF-8 > exist in the world, I'll take it. And it's not a problem if I'm incompatible > with anything. I guess your guess is right - having one utf8 flagged variable in a statement converts all other things to utf8 also - and perl uses ISO-8895-1 for the conversion! So your string is destroyed after that. The same thing happens, when you use a Freeze::Thaw or a DataDumper - bad for serializing and storing something in a database :-( Embperl decides for itself, if the %fdat parameters are utf8 or not - I don't know, how it does so, maybe Gerald could say something about that - but we had a lot of "funny" things in the past regarding this problem. Our website is in different encodings (not UTF8 and not ISO-8859-1) so we ran in the trouble. We implemented an own "thaw" method which tries to thaw the data and if that fails, it converts the data to utf8 and thaws it again... A solution for you could be: use "$content=decode('UTF-8',$content)" to flag your variable or walk over %fdat to do it with all keys which are not already utf8-flagged. After that, you should have UTF8-only variables and everything works as expected. One little additional comment: using non utf8-flagged variables with utf8-content (as your $content variable) breaks a lot of perl stuff: lc, uc, cmp, le, gt, length, sort, .... With best regards, Dirk Melchers /// IT/Software-Development /// NUREG GmbH /// Dorfäckerstraße 31 | 90427 Nürnberg | Germany Tel. +49-911-32002-256 | Fax +49-911-32002-299 Mobil +49-172-9354670 | www.nureg.de Nürnberg HRB 22653 | USt.ID DE 814 685 653 Geschäftsführer: Michael Schmidt, Stefan Boas --------------------------------------------------------------------- To unsubscribe, e-mail: embperl-unsubscr...@perl.apache.org For additional commands, e-mail: embperl-h...@perl.apache.org