Just an update that I was using the wrong prove with a custom compiled perl so a newer perl did fix the issue.
Also, Giovanni's hint about the issue led to the patch that was causing the issue which I proved with backporting to 3.4.1 Anyway, much more info at https://bz.apache.org/ SpamAssassin/show_bug.cgi?id=7591 as we are working the issue! -- Kevin A. McGrail VP Fundraising, Apache Software Foundation Chair Emeritus Apache SpamAssassin Project https://www.linkedin.com/in/kmcgrail - 703.798.0171 On Tue, Jun 26, 2018 at 3:49 AM, Kevin A. McGrail <[email protected]> wrote: > I have custom compiled stock perl 5.26.2 and the issue persists. I will > test backing out that code too. > > On Tue, Jun 26, 2018, 02:32 Giovanni Bechis <[email protected]> wrote: > >> On Mon, Jun 25, 2018 at 11:22:07PM +0200, Giovanni Bechis wrote: >> > On Mon, Jun 25, 2018 at 09:54:27AM -0400, Kevin A. McGrail wrote: >> > > Just to be clear, others concur that with 3 uris or less, it works, 4 >> or >> > > more it fails. It's inconsistent and exists in trunk as well. >> > > >> > > It's inconsistent depending on the platform as well. >> > > >> > > I am not sure if it is a Perl bug or an SA bug or something we are >> doing >> > > wrong but it is a blocker. >> > > >> > reverting r1823205 makes the regression test work on 3.4 and trunk for >> me. >> > tested on CentOS6 (perl 5.10.1), can anybody confirm ? >> > >> > Regards >> > Giovanni >> > >> Confirmed to work on Rhel 7 with perl 5.16. >> I think that some of the code in that simple patch triggers a bug in >> some of the patches that RH added to their perl. >> Cheers >> Giovanni >> >> > > -- >> > > Kevin A. McGrail >> > > VP Fundraising, Apache Software Foundation >> > > Chair Emeritus Apache SpamAssassin Project >> > > https://www.linkedin.com/in/kmcgrail - 703.798.0171 >> > > >> > > On Sat, Jun 23, 2018 at 10:15 PM, Bill Cole < >> > > [email protected]> wrote: >> > > >> > > > On 22 Jun 2018, at 14:29 (-0400), Kevin A. McGrail wrote: >> > > > >> > > > Hi All, >> > > >> >> > > >> 3.4 is not passing tests for me with the idn_dots.t and it appears >> to >> > > >> point >> > > >> to a problem in P:M:S::get_uri_list. I'm bleary from looking at >> this for >> > > >> three days. Can someone take a look at this? >> > > >> >> > > >> If you modify the t/idn_dots to print the uri list from the >> generated >> > > >> message in the test, it fails in 3.4 but passes in Trunk and in >> the 3.4.1 >> > > >> release. See below for output but basically there is a missing >> URI which >> > > >> is why the test correctly fails. >> > > >> >> > > > >> > > > I have made the test work by adding "use utf8" to the test script. >> This is >> > > > just avoiding the underlying subtle bug. >> > > > >> > > > The breakage is only seen (so far) on the RedHat perl 5.16.3 >> packaged for >> > > > EL7 and derivatives. I believe that 5.16.x was the last major >> release to >> > > > NOT work in UTF-8 by default without "use utf8" explicitly used. I >> have >> > > > replicated the incorrect parse with the spamassassin script and a >> message >> > > > with all-ascii URLs, so the problem is somewhere in the >> spectacularly >> > > > complicated RE that extracts URIs from the cooked text array inside >> > > > PerMsgStatus->get_uri_detail_list. Making matters worse, if I run >> either >> > > > t/idn_dots.t or spamassassin with the Perl debugger (-d) the parse >> works. >> > > > >> > > > Anyone who is still using an even older Perl could assist simply by >> > > > confirming that the 3.4 branch from SVN fails subtest 4 of >> t/idn_dots.t if >> > > > you remove or comment out the "use utf8" line I added to that file >> today. >> > > > >> > > > It would be interesting to see it the problem would be solved by >> adding >> > > > "use utf8" to every .pm that had a "use bytes" declaration before >> 2017. >> > > > This is a bit of a shotgun approach but simpler than hunting for the >> > > > specific issue. I'd try it myself, but that I'm basically on my last >> > > > stealable minute for the weekend already. >> > > > >> > > > >> > > > -- >> > > > Bill Cole >> > > > [email protected] or [email protected] >> > > > (AKA @grumpybozo and many *@billmail.scconsult.com addresses) >> > > > Currently Seeking Steadier Work: https://linkedin.com/in/billcole >> > > > >> >> > Index: lib/Mail/SpamAssassin/Conf/Parser.pm >> > =================================================================== >> > --- lib/Mail/SpamAssassin/Conf/Parser.pm (revisione 1834377) >> > +++ lib/Mail/SpamAssassin/Conf/Parser.pm (copia locale) >> > @@ -259,7 +259,6 @@ >> > while (defined ($line = shift @conf_lines)) { >> > local ($1); # bug 3838: prevent random taint flagging of $1 >> > >> > - if (index($line,'#') > -1) { >> > # bug 5545: used to support testing rules in the ruleqa system >> > if ($keepmetadata && $line =~ /^\#testrules/) { >> > $self->{file_scoped_attrs}->{testrules}++; >> > @@ -275,12 +274,8 @@ >> > >> > $line =~ s/(?<!\\)#.*$//; # remove comments >> > $line =~ s/\\#/#/g; # hash chars are escaped, so unescape them >> > - } >> > - >> > - if ($line =~ tr{ \t\r\n\f}{}) { >> > $line =~ s/^\s+//; # remove leading whitespace >> > $line =~ s/\s+$//; # remove tailing whitespace >> > - } >> > next unless($line); # skip empty lines >> > >> > # handle i18n >> > @@ -289,7 +284,7 @@ >> > my($key, $value) = split(/\s+/, $line, 2); >> > $key = lc $key; >> > # convert all dashes in setting name to underscores. >> > - $key =~ tr/-/_/; >> > + $key =~ s/-/_/g; >> > $value = '' unless defined($value); >> > >> > # # Do a better job untainting this info ... >> > @@ -339,26 +334,26 @@ >> > } >> > >> > # now handle the commands. >> > - elsif ($key eq 'include') { >> > + if ($key eq 'include') { >> > $value = $self->fix_path_relative_to_current_file($value); >> > my $text = $conf->{main}->read_cf($value, 'included file'); >> > unshift (@conf_lines, split (/\n/, $text)); >> > next; >> > } >> > >> > - elsif ($key eq 'ifplugin') { >> > + if ($key eq 'ifplugin') { >> > $self->handle_conditional ($key, "plugin ($value)", >> > \@if_stack, \$skip_parsing); >> > next; >> > } >> > >> > - elsif ($key eq 'if') { >> > + if ($key eq 'if') { >> > $self->handle_conditional ($key, $value, >> > \@if_stack, \$skip_parsing); >> > next; >> > } >> > >> > - elsif ($key eq 'else') { >> > + if ($key eq 'else') { >> > # TODO: if/else/else won't get flagged here :( >> > if (!@if_stack) { >> > $parse_error = "config: found else without matching >> conditional"; >> > @@ -370,7 +365,7 @@ >> > } >> > >> > # and the endif statement: >> > - elsif ($key eq 'endif') { >> > + if ($key eq 'endif') { >> > my $lastcond = pop @if_stack; >> > if (!defined $lastcond) { >> > $parse_error = "config: found endif without matching >> conditional"; >> > @@ -509,7 +504,7 @@ >> > my $conf = $self->{conf}; >> > >> > my $lexer = ARITH_EXPRESSION_LEXER; >> > - my @tokens = ($value =~ m/($lexer)/og); >> > + my @tokens = ($value =~ m/($lexer)/g); >> > >> > my $eval = ''; >> > my $bad = 0; >> > @@ -989,14 +984,14 @@ >> > >> > # Lex the rule into tokens using a rather simple RE method ... >> > my $lexer = ARITH_EXPRESSION_LEXER; >> > - my @tokens = ($rule =~ m/$lexer/og); >> > + my @tokens = ($rule =~ m/$lexer/g); >> > >> > # Go through each token in the meta rule >> > my $conf_tests = $conf->{tests}; >> > foreach my $token (@tokens) { >> > # has to be an alpha+numeric token >> > - next if $token =~ tr{A-Za-z0-9_}{}c || substr($token,0,1) =~ >> tr{A-Za-z_}{}c; # even faster >> > - >> > + # next if $token =~ /^(?:\W+|[+-]?\d+(?:\.\d+)?)$/; >> > + next if $token !~ /^[A-Za-z_][A-Za-z0-9_]*\z/s; # faster >> > # and has to be a rule name >> > next unless exists $conf_tests->{$token}; >> > >> > @@ -1302,7 +1297,7 @@ >> > >> > # Lex the rule into tokens using a rather simple RE method ... >> > my $lexer = ARITH_EXPRESSION_LEXER; >> > - my @tokens = ($rule =~ m/$lexer/og); >> > + my @tokens = ($rule =~ m/$lexer/g); >> > if (length($name) == 1) { >> > for (@tokens) { >> > print "$name $_\n " or die "Error writing token: $!"; >> > Index: lib/Mail/SpamAssassin/Util.pm >> > =================================================================== >> > --- lib/Mail/SpamAssassin/Util.pm (revisione 1834377) >> > +++ lib/Mail/SpamAssassin/Util.pm (copia locale) >> > @@ -280,7 +280,11 @@ >> > sub untaint_var { >> > # my $arg = $_[0]; # avoid copying unnecessarily >> > if (!ref $_[0]) { # optimized by-far-the-most-common case >> > - return defined $_[0] ? scalar each %{ { $_[0] => undef } } : >> undef; ## no critic (ProhibitExplicitReturnUndef) - See Bug 7120 - fast >> untaint (hash keys cannot be tainted) >> > + no re 'taint'; # override a "use re 'taint'" from outer scope >> > + return undef if !defined $_[0]; ## no critic >> (ProhibitExplicitReturnUndef) - See Bug 7120 >> > + local($1); # avoid Perl taint bug: tainted global $1 propagates >> taintedness >> > + $_[0] =~ /^(.*)\z/s; >> > + return $1; >> > } else { >> > my $r = ref $_[0]; >> > if ($r eq 'ARRAY') { >> > Index: lib/Mail/SpamAssassin/Plugin/Check.pm >> > =================================================================== >> > --- lib/Mail/SpamAssassin/Plugin/Check.pm (revisione 1834377) >> > +++ lib/Mail/SpamAssassin/Plugin/Check.pm (copia locale) >> > @@ -533,7 +533,7 @@ >> > >> > # Lex the rule into tokens using a rather simple RE method ... >> > my $lexer = ARITH_EXPRESSION_LEXER; >> > - my @tokens = ($rule =~ m/$lexer/og); >> > + my @tokens = ($rule =~ m/$lexer/g); >> > >> > # Set the rule blank to start >> > $meta{$rulename} = ""; >> > @@ -545,7 +545,8 @@ >> > foreach my $token (@tokens) { >> > >> > # Numbers can't be rule names >> > - if ($token =~ tr{A-Za-z0-9_}{}c || substr($token,0,1) =~ >> tr{A-Za-z_}{}c) { >> > + # if ($token =~ /^(?:\W+|[+-]?\d+(?:\.\d+)?)$/) { >> > + if ($token !~ /^[A-Za-z_][A-Za-z0-9_]*\z/s) { # faster >> > $meta{$rulename} .= "$token "; >> > } >> > else { # token is a rule name >> > @@ -611,7 +612,7 @@ >> > warn "no meta_dependencies defined for $metas[$i]"; >> > } >> > my $alldeps = join ' ', grep { >> > - index( ($tflags->{$_}||''),'net') > -1 && >> ($tflags->{$_}||'') =~ /\bnet\b/ >> > + ($tflags->{$_}||'') =~ /\bnet\b/ >> > } split (' ', $conf->{meta_dependencies}->{ $metas[$i] } >> ); >> > >> > if ($alldeps ne '') { >> >> >> >>
