-=| André Warnier (tomcat/perl), 13.11.2019 19:12:10 +0100 |=- > while (my $sz = $f->read(my $buffer, BUFF_LEN)) { > .. > > and then I need to pass this data to another module for processing > (Template::Toolkit). > To make a long story short, Template::Toolkit misinterprets the data I'm > sending to it, because this data /is/ actually UTF-8, but apparently not > marked so internally by the $f->read(). So TT2 re-encodes it, leading to > double UTF-8 encoding. > > My question is : can I - and how -, set the filehandle that corresponds to > the $f->read(), to a UTF-8 layer ? > I have tried > > line 155: binmode($f,'encoding:(UTF-8)'); > > and that triggers an error : > Not a GLOB reference at (my filter) line 155.\n > ) > > Or do I need to read the data 'as is', and separately do an > > $decoded_buffer = decode('UTF-8', $buffer);
There's a middle ground - partial decoding. See Encode(1)/FB_QUIET: If CHECK is set to "Encode::FB_QUIET", encoding and decoding immediately return the portion of the data that has been processed so far when an error occurs. The data argument is overwritten with everything after that point; that is, the unprocessed portion of the data. This is handy when you have to call "decode" repeatedly in the case where your source data may contain partial multi-byte character sequences, (that is, you are reading with a fixed-width buffer). Here's some sample code to do exactly that: my($buffer, $string) = ("", ""); while (read($fh, $buffer, 256, length($buffer))) { $string .= decode($encoding, $buffer, Encode::FB_QUIET); # $buffer now contains the unprocessed partial character } Looks exactly like your case. -- Damyan