> -----Original Message-----
> From: Willi Mann [mailto:wi...@debian.org] 
> Sent: Sunday, January 01, 2017 08:27
> To: Jason Pyeron; 849...@bugs.debian.org
> Cc: 'Klaus Ethgen'; logwatch-de...@lists.sourceforge.net
> Subject: Re: Bug#849531: [Logwatch-devel] Bug#849531: 
> Possible security problem,new logwatch sends mails with charset UTF-8
> 
> Hi,
> 
> Am 2017-01-01 um 00:20 schrieb Jason Pyeron:
> > Not exactly a valid test, besides it works for me. The 
> issue is internal ascii data being written as ascii but 
> instructing consumers
> > it is uft8. 
> > 
> > $ cat utf8_test.pl
> > #!/usr/bin/perl
> > #
> > use strict;
> > use File::Slurp;
> > 
> > my $inputfile = @ARGV[0];
> > my $logfilecontent = read_file($inputfile);
> > binmode(STDOUT, ":utf8");
> > print $logfilecontent;
> > 
> > $ ./utf8_test.pl testlog.txt
> > übersät
> > 
> > $ ./utf8_test.pl testlog.txt | hexdump -C
> > 00000000  c3 bc 62 65 72 73 c3 a4  74 0a                    
> |..bers..t.|
> > 0000000a
> > 
> > $ hexdump.exe -C testlog.txt
> > 00000000  fc 62 65 72 73 e4 74 0a                           
> |.bers.t.|
> > 00000008
> 
> What do you want to say with that? Your input is not in UTF-8.

That is the point. The OP complaines about ASCII being sent when labeld as 
UTF8, as such it created invalid UTF8 sequences.

Quoting https://en.wikipedia.org/wiki/UTF-8#Codepage_layout

> Invalid byte sequences[edit]
> 
> Not all sequences of bytes are valid UTF-8. A UTF-8 decoder should be 
> prepared for:
> * the red invalid bytes in the above table [192,103,245-255]
> * an unexpected continuation byte
> * a leading byte not followed by enough continuation bytes (can happen in 
> simple 
>   string truncation, when a string is too long to fit when copying it)
> * an overlong encoding as described above
> * a sequence that decodes to an invalid code point as described below
> 
> Many earlier decoders would happily try to decode these. Carefully crafted 
> invalid 
> UTF-8 could make them either skip or create ASCII characters such as NUL, 
> slash, 
> or quotes. Invalid UTF-8 has been used to bypass security validations in 
> high-profile 
> products including Microsoft's IIS web server[14] and Apache's Tomcat servlet 
> container.[15]
>



> 
> Just for the record, the output what I posted originally:

We have differences

> 
> % ./utf8_test.pl testlog
> übersät
> % ./utf8_test.pl testlog | hexdump -C
> 00000000  c3 83 c2 bc 62 65 72 73  c3 83 c2 a4 74 0a        
> |....bers....t.|
> 0000000e
> % hexdump -C testlog
> 00000000  c3 bc 62 65 72 73 c3 a4  74 0a                    
> |..bers..t.|
> 0000000a

$ hexdump.exe -C testlog.txt
00000000  fc 62 65 72 73 e4 74 0a                           |.bers.t.|
00000008

> 
> Bye
> Willi
> 

Reply via email to