Re: Output filters, data encoding

Damyan Ivanov Wed, 13 Nov 2019 10:37:33 -0800

-=| André Warnier (tomcat/perl), 13.11.2019 19:12:10 +0100 |=-
>       while (my $sz = $f->read(my $buffer, BUFF_LEN)) {
> ..
> 
> and then I need to pass this data to another module for processing 
> (Template::Toolkit).
> To make a long story short, Template::Toolkit misinterprets the data I'm
> sending to it, because this data /is/ actually UTF-8, but apparently not
> marked so internally by the $f->read(). So TT2 re-encodes it, leading to
> double UTF-8 encoding.
> 
> My question is : can I - and how -, set the filehandle that corresponds to
> the $f->read(), to a UTF-8 layer ?
> I have tried
> 
> line 155: binmode($f,'encoding:(UTF-8)');
> 
> and that triggers an error :
>  Not a GLOB reference at (my filter) line 155.\n
> )
> 
> Or do I need to read the data 'as is', and separately do an
> 
>  $decoded_buffer = decode('UTF-8', $buffer);


There's a middle ground - partial decoding. See Encode(1)/FB_QUIET:

       If CHECK is set to "Encode::FB_QUIET", encoding and decoding
       immediately return the portion of the data that has been processed so
       far when an error occurs. The data argument is overwritten with
       everything after that point; that is, the unprocessed portion of the
       data.  This is handy when you have to call "decode" repeatedly in the
       case where your source data may contain partial multi-byte character
       sequences, (that is, you are reading with a fixed-width buffer). Here's
       some sample code to do exactly that:

           my($buffer, $string) = ("", "");
           while (read($fh, $buffer, 256, length($buffer))) {
               $string .= decode($encoding, $buffer, Encode::FB_QUIET);
               # $buffer now contains the unprocessed partial character
           }

Looks exactly like your case.


-- Damyan

Re: Output filters, data encoding

Reply via email to