> -----Original Message----- > From: Willi Mann [mailto:wi...@debian.org] > Sent: Sunday, January 01, 2017 08:27 > To: Jason Pyeron; 849...@bugs.debian.org > Cc: 'Klaus Ethgen'; logwatch-de...@lists.sourceforge.net > Subject: Re: Bug#849531: [Logwatch-devel] Bug#849531: > Possible security problem,new logwatch sends mails with charset UTF-8 > > Hi, > > Am 2017-01-01 um 00:20 schrieb Jason Pyeron: > > Not exactly a valid test, besides it works for me. The > issue is internal ascii data being written as ascii but > instructing consumers > > it is uft8. > > > > $ cat utf8_test.pl > > #!/usr/bin/perl > > # > > use strict; > > use File::Slurp; > > > > my $inputfile = @ARGV[0]; > > my $logfilecontent = read_file($inputfile); > > binmode(STDOUT, ":utf8"); > > print $logfilecontent; > > > > $ ./utf8_test.pl testlog.txt > > übersät > > > > $ ./utf8_test.pl testlog.txt | hexdump -C > > 00000000 c3 bc 62 65 72 73 c3 a4 74 0a > |..bers..t.| > > 0000000a > > > > $ hexdump.exe -C testlog.txt > > 00000000 fc 62 65 72 73 e4 74 0a > |.bers.t.| > > 00000008 > > What do you want to say with that? Your input is not in UTF-8.
That is the point. The OP complaines about ASCII being sent when labeld as UTF8, as such it created invalid UTF8 sequences. Quoting https://en.wikipedia.org/wiki/UTF-8#Codepage_layout > Invalid byte sequences[edit] > > Not all sequences of bytes are valid UTF-8. A UTF-8 decoder should be > prepared for: > * the red invalid bytes in the above table [192,103,245-255] > * an unexpected continuation byte > * a leading byte not followed by enough continuation bytes (can happen in > simple > string truncation, when a string is too long to fit when copying it) > * an overlong encoding as described above > * a sequence that decodes to an invalid code point as described below > > Many earlier decoders would happily try to decode these. Carefully crafted > invalid > UTF-8 could make them either skip or create ASCII characters such as NUL, > slash, > or quotes. Invalid UTF-8 has been used to bypass security validations in > high-profile > products including Microsoft's IIS web server[14] and Apache's Tomcat servlet > container.[15] > > > Just for the record, the output what I posted originally: We have differences > > % ./utf8_test.pl testlog > übersät > % ./utf8_test.pl testlog | hexdump -C > 00000000 c3 83 c2 bc 62 65 72 73 c3 83 c2 a4 74 0a > |....bers....t.| > 0000000e > % hexdump -C testlog > 00000000 c3 bc 62 65 72 73 c3 a4 74 0a > |..bers..t.| > 0000000a $ hexdump.exe -C testlog.txt 00000000 fc 62 65 72 73 e4 74 0a |.bers.t.| 00000008 > > Bye > Willi >