Please note the large changeset and have a try. I've been tweaking it all week, should be good for general use.
On Fri, Apr 30, 2021 at 06:17:51PM -0000, [email protected] wrote: > Author: hege > Date: Fri Apr 30 18:17:51 2021 > New Revision: 1889337 > > URL: http://svn.apache.org/viewvc?rev=1889337&view=rev > Log: > - Improved internal header address (From/To/Cc) parser, now also handles > multiple addresses. Optional support for external Email::Address::XS > parser, which can handle nested comments and other oddities. > > - Header :addr :name modifiers now returns all addresses. :first :last > select only first (topmost) or last header to process, when there are > multiple headers with the same name (:addr and :name may still return > multiple values from a single header). > > - API: $pms->get() can and should now be called in list context. Scalar > context continues to return multiple values newline separated, but this > should be considered deprecated. > > > Added: > spamassassin/trunk/t/data/spam/freemail1 > spamassassin/trunk/t/data/spam/freemail2 > spamassassin/trunk/t/data/spam/freemail3 > Modified: > spamassassin/trunk/MANIFEST > spamassassin/trunk/UPGRADE > spamassassin/trunk/lib/Mail/SpamAssassin.pm > spamassassin/trunk/lib/Mail/SpamAssassin/Conf.pm > spamassassin/trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm > spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Bayes.pm > spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FreeMail.pm > spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/HeaderEval.pm > spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/SPF.pm > spamassassin/trunk/lib/Mail/SpamAssassin/Util.pm > spamassassin/trunk/lib/Mail/SpamAssassin/Util/DependencyInfo.pm > spamassassin/trunk/t/SATest.pm > spamassassin/trunk/t/data/Dumpheaders.pm > spamassassin/trunk/t/data/nice/unicode1 > spamassassin/trunk/t/freemail.t > spamassassin/trunk/t/freemail_welcome_block.t > spamassassin/trunk/t/get_all_headers.t > spamassassin/trunk/t/get_headers.t (contents, props changed) > spamassassin/trunk/t/header_utf8.t (contents, props changed) > > Modified: spamassassin/trunk/MANIFEST > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/MANIFEST?rev=1889337&r1=1889336&r2=1889337&view=diff > ============================================================================== > --- spamassassin/trunk/MANIFEST (original) > +++ spamassassin/trunk/MANIFEST Fri Apr 30 18:17:51 2021 > @@ -414,6 +414,9 @@ t/data/spam/esp/sendgrid_id.eml > t/data/spam/esp/sendgrid_id.txt > t/data/spam/extracttext/gtube_jpg.eml > t/data/spam/extracttext/gtube_pdf.eml > +t/data/spam/freemail1 > +t/data/spam/freemail2 > +t/data/spam/freemail3 > t/data/spam/gtube.eml > t/data/spam/gtubedcc.eml > t/data/spam/gtubedcc_crlf.eml > > Modified: spamassassin/trunk/UPGRADE > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/UPGRADE?rev=1889337&r1=1889336&r2=1889337&view=diff > ============================================================================== > --- spamassassin/trunk/UPGRADE (original) > +++ spamassassin/trunk/UPGRADE Fri Apr 30 18:17:51 2021 > @@ -2,6 +2,19 @@ > Note for Users Upgrading to SpamAssassin 4.0.0 > ---------------------------------------------- > > +- Improved internal header address (From/To/Cc) parser, now also handles > + multiple addresses. Optional support for external Email::Address::XS > + parser, which can handle nested comments and other oddities. > + > +- Header :addr :name modifiers now returns all addresses. :first :last > + select only first (topmost) or last header to process, when there are > + multiple headers with the same name (:addr and :name may still return > + multiple values from a single header). > + > +- API: $pms->get() can and should now be called in list context. Scalar > + context continues to return multiple values newline separated, but this > + should be considered deprecated. > + > - New ExtractText plugin that extracts text from documents or images and > feed it > into SpamAssassin > > > Modified: spamassassin/trunk/lib/Mail/SpamAssassin.pm > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin.pm?rev=1889337&r1=1889336&r2=1889337&view=diff > ============================================================================== > --- spamassassin/trunk/lib/Mail/SpamAssassin.pm (original) > +++ spamassassin/trunk/lib/Mail/SpamAssassin.pm Fri Apr 30 18:17:51 2021 > @@ -1064,8 +1064,11 @@ sub add_all_addresses_to_blacklist { > > my @addrlist; > my @hdrs = $mail_obj->get_header('From'); > - if ($#hdrs >= 0) { > - push (@addrlist, $self->find_all_addrs_in_line (join (" ", @hdrs))); > + foreach my $hdr (@hdrs) { > + my @addrs = Mail::SpamAssassin::Util::parse_header_addresses($hdr); > + foreach my $addr (@addrs) { > + push @addrlist, $addr->{address} if defined $addr->{address}; > + } > } > > foreach my $addr (@addrlist) { > @@ -2244,8 +2247,12 @@ sub find_all_addrs_in_mail { > Errors-To Mail-Followup-To)) > { > my @hdrs = $mail_obj->get_header($header); > - if ($#hdrs < 0) { next; } > - push (@addrlist, $self->find_all_addrs_in_line(join (" ", @hdrs))); > + foreach my $hdr (@hdrs) { > + my @addrs = Mail::SpamAssassin::Util::parse_header_addresses($hdr); > + foreach my $addr (@addrs) { > + push @addrlist, $addr->{address} if defined $addr->{address}; > + } > + } > } > > # find addrs in body, too > > Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Conf.pm > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Conf.pm?rev=1889337&r1=1889336&r2=1889337&view=diff > ============================================================================== > --- spamassassin/trunk/lib/Mail/SpamAssassin/Conf.pm (original) > +++ spamassassin/trunk/lib/Mail/SpamAssassin/Conf.pm Fri Apr 30 18:17:51 2021 > @@ -5430,6 +5430,7 @@ sub feature_subjprefix { 1 } # add subje > sub feature_bayes_stopwords { 1 } # multi language stopwords in Bayes > sub feature_get_host { 1 } # $pms->get() :host :domain :ip :revip # was > implemented together with AskDNS::has_tag_header # Bug 7734 > sub feature_blocklist_welcomelist { 1 } # bz 7826 > +sub feature_header_address_parser { 1 } # improved header address parsing > using Email::Address::XS, $pms->get() list context > sub has_tflags_nosubject { 1 } # tflags nosubject > sub has_tflags_nolog { 1 } # tflags nolog > sub perl_min_version_5010000 { return $] >= 5.010000 } # perl version check > ("perl_version" not neatly backwards-compatible) > > Modified: spamassassin/trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm?rev=1889337&r1=1889336&r2=1889337&view=diff > ============================================================================== > --- spamassassin/trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm (original) > +++ spamassassin/trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm Fri Apr 30 > 18:17:51 2021 > @@ -62,7 +62,7 @@ use Mail::SpamAssassin::AsyncLoop; > use Mail::SpamAssassin::Conf; > use Mail::SpamAssassin::Util qw(untaint_var base64_encode idn_to_ascii > uri_list_canonicalize reverse_ip_address > - is_fqdn_valid); > + is_fqdn_valid parse_header_addresses); > use Mail::SpamAssassin::Timeout; > use Mail::SpamAssassin::Logger; > > @@ -1953,21 +1953,24 @@ sub extract_message_metadata { > # tags (explicitly required for DMARC, RFC 7489) > # > { local $1; > - my $addr = $self->get('EnvelopeFrom:addr', undef); > + my $host = ($self->get('EnvelopeFrom:first:addr:host'))[0]; > # collect a FQDN, ignoring potential trailing WSP > - if (defined $addr && $addr =~ /\@([^@. \t]+\.[^@ \t]+?)[ \t]*\z/s) { > - my $d = idn_to_ascii($1); > + if (defined $host) { > + my $d = idn_to_ascii($host); > $self->set_tag('SENDERDOMAIN', $d); > $self->{msg}->put_metadata("X-SenderDomain", $d); > dbg("metadata: X-SenderDomain: %s", $d); > } > - # TODO: the get ':addr' only returns the first address; this should be > - # augmented to be able to return all addresses in a header field, > multiple > - # addresses in a From header field are allowed according to RFC 5322 > - $addr = $self->get('From:addr', undef); > - if (defined $addr && $addr =~ /\@([^@. \t]+\.[^@ \t]+?)[ \t]*\z/s) { > - my $d = idn_to_ascii($1); > - $self->set_tag('AUTHORDOMAIN', $d); > + my @from_doms; > + my %seen; > + foreach ($self->get('From:addr:host')) { > + next if $seen{$_}++; > + my $d = idn_to_ascii($_); > + push @from_doms, $d; > + } > + if (@from_doms) { > + $self->set_tag('AUTHORDOMAIN', @from_doms > 1 ? \@from_doms : > $from_doms[0]); > + my $d = join(" ", @from_doms); > $self->{msg}->put_metadata("X-AuthorDomain", $d); > dbg("metadata: X-AuthorDomain: %s", $d); > } > @@ -2031,25 +2034,32 @@ sub get_decoded_stripped_body_text_array > > =item $status->get (header_name [, default_value]) > > -Returns a message header, pseudo-header, real name or address. > -C<header_name> is the name of a mail header, such as 'Subject', 'To', > -etc. If C<default_value> is given, it will be used if the requested > -C<header_name> does not exist. > - > -Appending C<:raw> to the header name will inhibit decoding of > quoted-printable > -or base-64 encoded strings. > - > -Appending a modifier C<:addr> to a header field name will cause everything > -except the first email address to be removed from the header field. It is > -mainly applicable to header fields 'From', 'Sender', 'To', 'Cc' along with > -their 'Resent-*' counterparts, and the 'Return-Path'. For example, all of > -the following will result in "example@foo": > +Returns a message header, pseudo-header or a real name, email-address or > +some other parsed value set by modifiers. C<header_name> is the name of a > +mail header, such as 'Subject', 'To', etc. > + > +Should be called in list context since 4.0. Will return list of headers > +content, or other values when modifiers used. > + > +If C<default_value> is given, it will be used if the requested > +C<header_name> does not exist. This is mainly useful when called in scalar > +context to set 'undef' instead of legacy '' return value when header does > +not exist. > + > +Appending C<:raw> modifier to the header name will inhibit decoding of > +quoted-printable or base-64 encoded strings. > + > +Appending C<:addr> modifier to the header name will return all > +email-addresses found in the header. It is mainly applicable to header > +fields 'From', 'Sender', 'To', 'Cc' along with their 'Resent-*' > +counterparts, and the 'Return-Path'. For example, all of the following will > +result in "example@foo" (and "example@bar"): > > =over 4 > > =item example@foo > > -=item example@foo (Foo Blah) > +=item example@foo (Foo Blah), <example@bar> > > =item example@foo, example@bar > > @@ -2063,18 +2073,18 @@ the following will result in "example@fo > > =back > > -Appending a modifier C<:name> to a header field name will cause everything > -except the first display name to be removed from the header field. It is > -mainly applicable to header fields containing a single mail address: 'From', > -'Sender', along with their 'Resent-From' and 'Resent-Sender' counterparts. > -For example, all of the following will result in "Foo Blah". One level of > -single quotes is stripped too, as it is often seen. > +Appending C<:name> modifier to the header name will return all "display > +names" from the header field. As with C<:addr>, it is mainly applicable to > +header fields 'From', 'Sender', 'To', 'Cc' along with their 'Resent-*' > +counterparts, and the 'Return-Path'. For example, all of the following will > +result in "Foo Blah" (and "Bar Baz"). One level of single quotes is > +stripped too, as it is often seen. > > =over 4 > > =item example@foo (Foo Blah) > > -=item example@foo (Foo Blah), example@bar > +=item example@foo (Foo Blah), "Bar Baz" <example@bar> > > =item display: example@foo (Foo Blah), example@bar ; > > @@ -2086,22 +2096,27 @@ single quotes is stripped too, as it is > > =back > > -Appending a modifier C<:host> to a header field name will return the first > -hostname-looking string that ends with a valid TLD. First it tries to find a > -match after @ character (possible email), then from any part of the header. > -Normal use of this would be for example 'From:addr:host' to return the > -hostname portion of a From-address. > - > -Appending a modifier C<:domain> to a header field name implies C<:host>, > -but will return only domain part of the hostname, as returned by > -RegistryBoundaries::trim_domain. > - > -Appending a modifier C<:ip> to a header field name, will return the first > -IPv4 or IPv6 address string found. Could be used for example as > -'X-Originating-IP:ip'. > - > -Appending a modifier C<:revip> to a header field name implies C<:ip>, > -but will return the found IP in reverse (usually for DNSBL usage). > +Appending C<:host> to the header name will return the first hostname-looking > +string that ends with a valid TLD. First it tries to find a match after @ > +character (possible email), then from any part of the header. Normal use of > +this would be for example 'From:addr:host' to return the hostname portion of > +a From-address. > + > +Appending C<:domain> to the header name implies C<:host>, but will return > +only domain part of the hostname, as returned by > +RegistryBoundaries::trim_domain(). > + > +Appending C<:ip> to the header name, will return the first IPv4 or IPv6 > +address string found. Could be used for example as 'X-Originating-IP:ip'. > + > +Appending C<:revip> to the header name implies C<:ip>, but will return the > +found IP in reverse (usually for DNSBL usage). > + > +Appending C<:first> modifier to the header name will return only the first > +(topmost) header, in case there are multiple ones. Similarly C<:last> will > +select the last one. These affect only the physical header line selection. > +If selected header is parsed further with C<:addr> or similar, it may return > +multiple results, if the selected header contains multiple addresses. > > There are several special pseudo-headers that can be specified: > > @@ -2143,6 +2158,12 @@ the message has passed through > =item C<X-Spam-Relays-Trusted> is the generated metadata of trusted relays > the message has passed through > > +=item C<X-Spam-Relays-External> is the generated metadata of external relays > +the message has passed through > + > +=item C<X-Spam-Relays-Internal> is the generated metadata of internal relays > +the message has passed through > + > =back > > =cut > @@ -2151,98 +2172,106 @@ the message has passed through > sub _get { > my ($self, $request) = @_; > > - my $result; > + my @results; > my $getaddr = 0; > my $getname = 0; > my $getraw = 0; > + my $needraw = 0; > my $gethost = 0; > my $getdomain = 0; > my $getip = 0; > my $getrevip = 0; > + my $getfirst = 0; > + my $getlast = 0; > > # special queries - process and strip modifiers > if (index($request,':') >= 0) { # triage > local $1; > while ($request =~ s/:([^:]*)//) { > if ($1 eq 'raw') { $getraw = 1 } > - elsif ($1 eq 'addr') { $getaddr = $getraw = 1 } > - elsif ($1 eq 'name') { $getname = 1 } > + elsif ($1 eq 'addr') { $getaddr = $needraw = 1 } > + elsif ($1 eq 'name') { $getname = $needraw = 1 } > elsif ($1 eq 'host') { $gethost = 1 } > elsif ($1 eq 'domain') { $gethost = $getdomain = 1 } > elsif ($1 eq 'ip') { $getip = 1 } > elsif ($1 eq 'revip') { $getip = $getrevip = 1 } > + elsif ($1 eq 'first') { $getfirst = 1 } > + elsif ($1 eq 'last') { $getlast = 1 } > } > } > my $request_lc = lc $request; > > # ALL: entire pristine or semi-raw headers > if ($request eq 'ALL') { > - return ($getraw ? $self->{msg}->get_pristine_header() > - : $self->{msg}->get_all_headers(0)); > + if ($getraw) { > + @results = $self->{msg}->get_pristine_header() =~ /^([^ \t].*?\n)(?![ > \t])/smgi; > + } else { > + @results = $self->{msg}->get_all_headers(0); > + } > + return \@results; > } > # ALL-TRUSTED: entire trusted raw headers > elsif ($request eq 'ALL-TRUSTED') { > # '+1' since we added the received header even though it's not considered > # trusted, so we know that those headers can be trusted too > - return $self->get_all_hdrs_in_rcvd_index_range( > + @results = $self->get_all_hdrs_in_rcvd_index_range( > undef, $self->{last_trusted_relay_index}+1, > undef, undef, $getraw); > + return \@results; > } > # ALL-INTERNAL: entire internal raw headers > elsif ($request eq 'ALL-INTERNAL') { > # '+1' for the same reason as in ALL-TRUSTED above > - return $self->get_all_hdrs_in_rcvd_index_range( > + @results = $self->get_all_hdrs_in_rcvd_index_range( > undef, $self->{last_internal_relay_index}+1, > undef, undef, $getraw); > + return \@results; > } > # ALL-UNTRUSTED: entire untrusted raw headers > elsif ($request eq 'ALL-UNTRUSTED') { > # '+1' for the same reason as in ALL-TRUSTED above > - return $self->get_all_hdrs_in_rcvd_index_range( > + @results = $self->get_all_hdrs_in_rcvd_index_range( > $self->{last_trusted_relay_index}+1, undef, > undef, undef, $getraw); > + return \@results; > } > # ALL-EXTERNAL: entire external raw headers > elsif ($request eq 'ALL-EXTERNAL') { > # '+1' for the same reason as in ALL-TRUSTED above > - return $self->get_all_hdrs_in_rcvd_index_range( > + @results = $self->get_all_hdrs_in_rcvd_index_range( > $self->{last_internal_relay_index}+1, undef, > undef, undef, $getraw); > + return \@results; > } > # EnvelopeFrom: the SMTP MAIL FROM: address > elsif ($request_lc eq "\LEnvelopeFrom") { > - $result = $self->get_envelope_from(); > + push @results, $self->get_envelope_from(); > } > # untrusted relays list, as string > elsif ($request_lc eq "\LX-Spam-Relays-Untrusted") { > - $result = $self->{relays_untrusted_str}; > + push @results, $self->{relays_untrusted_str}; > } > # trusted relays list, as string > elsif ($request_lc eq "\LX-Spam-Relays-Trusted") { > - $result = $self->{relays_trusted_str}; > + push @results, $self->{relays_trusted_str}; > } > # external relays list, as string > elsif ($request_lc eq "\LX-Spam-Relays-External") { > - $result = $self->{relays_external_str}; > + push @results, $self->{relays_external_str}; > } > # internal relays list, as string > elsif ($request_lc eq "\LX-Spam-Relays-Internal") { > - $result = $self->{relays_internal_str}; > + push @results, $self->{relays_internal_str}; > } > # ToCc: the combined recipients list > elsif ($request_lc eq "\LToCc") { > - $result = join("\n", $self->{msg}->get_header('To', $getraw)); > - if ($result ne '') { > - chomp $result; > - $result .= ", " if $result =~ /\S/; > - } > - $result .= join("\n", $self->{msg}->get_header('Cc', $getraw)); > - $result = undef if $result eq ''; > + push @results, $self->{msg}->get_header('To', $getraw); > + push @results, $self->{msg}->get_header('Cc', $getraw); > } > # MESSAGEID: handle lists which move the real message-id to another > # header for resending. > elsif ($request eq 'MESSAGEID') { > - $result = join("\n", grep { defined($_) && $_ ne '' } > + push @results, grep { defined($_) && $_ ne '' } ( > $self->{msg}->get_header('X-Message-Id', $getraw), > $self->{msg}->get_header('Resent-Message-Id', $getraw), > $self->{msg}->get_header('X-Original-Message-ID', $getraw), > @@ -2250,115 +2279,126 @@ sub _get { > } > # a conventional header > else { > - my @results = $getraw ? $self->{msg}->raw_header($request) > - : $self->{msg}->get_header($request); > - # dbg("message: get(%s)%s = %s", > - # $request, $getraw?'raw':'', join(", ",@results)); > - if (@results) { > - $result = join('', @results); > - } else { # metadata > - $result = $self->{msg}->get_metadata($request); > - } > - } > - > - # special queries > - if (defined $result && ($getaddr || $getname)) { > - local $1; > - $result =~ s/^[^:]+:(.*);\s*$/$1/gs; # 'undisclosed-recipients: ;' > - $result =~ s/\s+/ /g; # reduce whitespace > - $result =~ s/^\s+//; # leading whitespace > - $result =~ s/\s+$//; # trailing whitespace > - > - if ($getaddr) { > - # Get the email address out of the header > - # All of these should result in "jm@foo": > - # jm@foo > - # jm@foo (Foo Blah) > - # jm@foo, jm@bar > - # display: jm@foo (Foo Blah), jm@bar ; > - # Foo Blah <jm@foo> > - # "Foo Blah" <jm@foo> > - # "'Foo Blah'" <jm@foo> > - # > - # strip out the (comments) > - $result =~ s/\s*\(.*?\)//g; > - # strip out the "quoted text", unless it's the only thing in the string > - if ($result !~ /^".*"$/) { > - $result =~ s/(?<!<)"[^"]*"(?!\@)//g; #" emacs > - } > - # Foo Blah <jm@xxx> or <jm@xxx> > - local $1; > - $result =~ s/^[^"<]*?<(.*?)>.*$/$1/; > - # multiple addresses on one line? remove all but first > - $result =~ s/,.*$//; > - } > - elsif ($getname) { > - # Get the display name out of the header > - # All of these should result in "Foo Blah": > - # > - # jm@foo (Foo Blah) > - # (Foo Blah) jm@foo > - # jm@foo (Foo Blah), jm@bar > - # display: jm@foo (Foo Blah), jm@bar ; > - # Foo Blah <jm@foo> > - # "Foo Blah" <jm@foo> > - # "'Foo Blah'" <jm@foo> > - # > - local $1; > - # does not handle mailbox-list or address-list or quotes well, to be > improved > - if ($result =~ /^ \s* " (.*?) (?<!\\)" \s* < [^<>]* >/sx || > - $result =~ /^ \s* (.*?) \s* < [^<>]* >/sx) { > - $result = $1; # display-name, RFC 5322 > - # name-addr = [display-name] angle-addr > - # display-name = phrase > - # phrase = 1*word / obs-phrase > - # word = atom / quoted-string > - # obs-phrase = word *(word / "." / CFWS) > - $result =~ s{ " ( (?: [^"\\] | \\. )* ) " } > - { my $s=$1; $s=~s{\\(.)}{$1}gs; $s }gsxe; > - $result =~ s/\\"/"/gs; > - } elsif ($result =~ /^ [^(,]*? \( (.*?) \) /sx) { # legacy form > - # nested comments are not handled, to be improved > - $result = $1; > - } else { # no display name > - $result = ''; > + my @res = $getraw||$needraw ? $self->{msg}->raw_header($request) > + : $self->{msg}->get_header($request); > + if (!@res) { > + if (defined(my $m = $self->{msg}->get_metadata($request))) { > + push @res, $m; > + } > + } > + push @results, @res if @res; > + } > + > + # Nothing found to process further, bail out quick > + if (!@results) { > + return \@results; > + } > + > + # Continue processing only first (topmost) or last header > + if ($getfirst) { > + @results = ($results[0]); > + } elsif ($getlast) { > + @results = ($results[-1]); > + } > + > + # special addr/name > + if ($getaddr || $getname) { > + my @res; > + foreach my $line (@results) { > + next unless defined $line; > + # Note: parse_header_addresses always called with raw undecoded value > + # Skip invalid addresses here > + my @addrs = parse_header_addresses($line); > + if (@addrs) { > + if ($getaddr) { > + foreach my $addr (@addrs) { > + push @res, $addr->{address} if defined $addr->{address}; > + } > + } > + elsif ($getname) { > + foreach my $addr (@addrs) { > + next unless defined $addr->{phrase}; > + if ($getraw) { > + # phrase=name, could also be username or comment unless name > found > + push @res, $addr->{phrase}; > + } else { > + # If :raw was not specifically asked, decode mimewords > + # TODO: silly call to Node module, should probably be in Util > + my $decoded = > Mail::SpamAssassin::Message::Node::_decode_header( > + $addr->{phrase}, "PMS:get:$request"); > + # Normalize whitespace, unless it's all white-space > + if ($decoded =~ /\S/) { > + $decoded =~ s/\s+/ /gs; > + $decoded =~ s/^\s+//; > + $decoded =~ s/\s+$//; > + $decoded =~ s/^'(.*?)'$/$1/; # remove single quotes > + } > + push @res, $decoded if defined $decoded; > + } > + } > + } > } > - $result =~ s/^ \s* ' \s* (.*?) \s* ' \s* \z/$1/sx; > } > + @results = @res; > } > > # special host/domain > - if (defined $result && ($gethost || $getdomain || $getip)) { > - my $host; > + if (@results && ($gethost || $getdomain || $getip)) { > + my @res; > if ($gethost) { > + # TODO: IDN matching needs honing > my $tldsRE = $self->{main}->{registryboundaries}->{valid_tlds_re}; > - my $hostRE = > qr/(?<![._-])\b([a-z\d][a-z\d._-]{0,251}\.${tldsRE})\b(?![._-])/i; > - # try grabbing email/msgid domain first, because user part might look > like > - # a valid host.. > - if ($result =~ /.*\@${hostRE}/i && is_fqdn_valid($1)) { > - $host = $1; > - } else { > - # otherwise try hard to find a valid host > - while ($result =~ /${hostRE}/ig) { > - if (is_fqdn_valid($1)) { > + #my $hostRE = > qr/(?<![._-])\b([a-z\d][a-z\d._-]{0,251}\.${tldsRE})\b(?![._-])/i; > + my $hostRE = qr/(?<![._-])(\S{1,251}\.${tldsRE})(?![._-])/i; > + foreach my $line (@results) { > + next unless defined $line; > + my $host; > + if ($getaddr) { > + # If :addr already preparsed the line, just grab domain liberally > + if ($line =~ /.*\@(\S+)/) { > $host = $1; > - last; > } > } > - } > - if ($host && $getdomain) { > - $host = $self->{main}->{registryboundaries}->trim_domain($host, 1); > + else { > + # try grabbing email/msgid domain first, because user part might > look like > + # a valid host.. > + if ($line =~ /.*\@${hostRE}/i) { > + if (is_fqdn_valid(idn_to_ascii($1), 1)) { > + $host = $1; > + } > + } > + # otherwise try hard to find a valid host > + if (!$host) { > + while ($line =~ /${hostRE}/ig) { > + if (is_fqdn_valid(idn_to_ascii($1), 1)) { > + $host = $1; > + last; > + } > + } > + } > + } > + if ($host) { > + if ($getdomain) { > + $host = $self->{main}->{registryboundaries}->trim_domain($host, > 1); > + } > + push @res, $host; > + } > } > } else { > my $ipRE = qr/(?<!\.)\b(${IP_ADDRESS})\b(?!\.)/; > - if ($result =~ $ipRE) { > - $host = $getrevip ? reverse_ip_address($1) : $1; > + foreach my $line (@results) { > + next unless defined $line; > + my $host; > + if ($line =~ $ipRE) { > + $host = $getrevip ? reverse_ip_address($1) : $1; > + } > + push @res, $host if defined $host; > } > } > - $result = $host; > + @results = @res; > } > > - return $result; > + return \@results; > } > > # optimized for speed > @@ -2367,7 +2407,7 @@ sub _get { > # $_[2] is defval > sub get { > my $cache = $_[0]->{get_cache}; > - my $found; > + my $found = []; > if (exists $cache->{$_[1]}) { > # return cache entry if it is known > # (measured hit/attempts rate on a production mailer is about 47%) > @@ -2375,13 +2415,34 @@ sub get { > } else { > # fill in a cache entry > $found = _get(@_); > + # filter out undefined > + @$found = grep { defined } @$found; > $cache->{$_[1]} = $found; > } > # if the requested header wasn't found, we should return a default value > # as specified by the caller: if defval argument is present it represents > # a default value even if undef; if defval argument is absent a default > # value is an empty string for upwards compatibility > - return (defined $found ? $found : @_ > 2 ? $_[2] : ''); > + if (@$found) { > + # new list context usage in 4.0, return all values always > + if (wantarray) { > + return @$found; > + } > + # legacy scalar context expected only single return value for some > + # queries, without a newline > + if ($_[1] =~ /:(?:addr|name|host|domain|ip|revip)\b/ || > + $_[1] eq 'EnvelopeFrom') { > + my $res = $found->[0]; > + $res =~ s/\n\z$//; > + return $res; > + } else { > + return join('', @$found); > + } > + } elsif (@_ > 2) { > + return wantarray ? ($_[2]) : $_[2]; > + } else { > + return wantarray ? () : ''; > + } > } > > ########################################################################### > @@ -2698,15 +2759,16 @@ sub _process_dkim_uri_list { > > # Look for the domain in DK/DKIM headers > if ($self->{conf}->{parse_dkim_uris}) { > - my $dk = join(" ", grep {defined} ( > $self->get('DomainKey-Signature',undef ), > - $self->get('DKIM-Signature',undef) > )); > - while ($dk =~ /\bd\s*=\s*([^;]+)/g) { > - my $d = $1; > - $d =~ s/\s+//g; > - # prefix with domainkeys: so it doesn't merge with identical keys > - $self->add_uri_detail_list("domainkeys:$d", > - {'domainkeys'=>1, 'nocanon'=>1, 'noclean'=>1}, > - 'domainkeys', 1); > + foreach my $dk ( $self->get('DomainKey-Signature'), > + $self->get('DKIM-Signature') ) { > + while ($dk =~ /\bd\s*=\s*([^;]+)/g) { > + my $d = $1; > + $d =~ s/\s+//g; > + # prefix with domainkeys: so it doesn't merge with identical keys > + $self->add_uri_detail_list("domainkeys:$d", > + {'domainkeys'=>1, 'nocanon'=>1, 'noclean'=>1}, > + 'domainkeys', 1); > + } > } > } > } > @@ -3123,8 +3185,8 @@ sub get_envelope_from { > # Assume that because they have configured it, their MTA will always add > it. > # This will prevent us falling through and picking up inappropriate > headers. > if (defined $self->{conf}->{envelope_sender_header}) { > - # make sure we get the most recent copy - there can be only one > EnvelopeSender. > - $envf = > $self->get($self->{conf}->{envelope_sender_header}.":addr",undef); > + # get the most recent (topmost) copy - there can be only one > EnvelopeSender. > + $envf = > ($self->get($self->{conf}->{envelope_sender_header}.":first:addr"))[0]; > # ok if it contains an "@" sign, or is "" (ie. "<>" without the < and >) > if (defined $envf && (index($envf, '@') > 0 || $envf eq '')) { > dbg("message: using envelope_sender_header '%s' as EnvelopeFrom: '%s'", > @@ -3177,17 +3239,19 @@ sub get_envelope_from { > # lines, we cannot trust any Envelope-From headers, since they're likely to > # be incorrect fetchmail guesses. > > - if (index($self->get("X-Sender"), '@') != -1) { > - my $rcvd = join(' ', $self->get("Received")); > - if (index($rcvd, '(fetchmail') != -1) { > - dbg("message: X-Sender and fetchmail signatures found, cannot trust > envelope-from"); > - $self->{envelopefrom} = undef; > - return; > + my $x_sender = ($self->get("X-Sender:first:addr"))[0]; > + if (defined $x_sender && index($x_sender, '@') != -1) { > + foreach ($self->get("Received")) { > + if (index($_, '(fetchmail') != -1) { > + dbg("message: X-Sender and fetchmail signatures found, cannot trust > envelope-from"); > + $self->{envelopefrom} = undef; > + return; > + } > } > } > > # procmailrc notes this (we now recommend adding it to Received instead) > - if (defined($envf = $self->get("X-Envelope-From:addr",undef))) { > + if (defined($envf = ($self->get("X-Envelope-From:first:addr"))[0])) { > # heuristic: this could have been relayed via a list which then used > # a *new* Envelope-from. check > if ($self->get("ALL") =~ /^Received:.*?^X-Envelope-From:/smi) { > @@ -3202,7 +3266,7 @@ sub get_envelope_from { > } > > # qmail, new-inject(1) > - if (defined($envf = $self->get("Envelope-Sender:addr",undef))) { > + if (defined($envf = ($self->get("Envelope-Sender:first:addr"))[0])) { > # heuristic: this could have been relayed via a list which then used > # a *new* Envelope-from. check > if ($self->get("ALL") =~ /^Received:.*?^Envelope-Sender:/smi) { > @@ -3221,7 +3285,7 @@ sub get_envelope_from { > # data. This use of return-path is required; mail systems MUST support > # it. The return-path line preserves the information in the <reverse- > # path> from the MAIL command. > - if (defined($envf = $self->get("Return-Path:addr",undef))) { > + if (defined($envf = ($self->get("Return-Path:first:addr"))[0])) { > # heuristic: this could have been relayed via a list which then used > # a *new* Envelope-from. check > if ($self->get("ALL") =~ /^Received:.*?^Return-Path:/smi) { > @@ -3261,7 +3325,7 @@ sub get_all_hdrs_in_rcvd_index_range { > $include_end_rcvd = 1 unless defined $include_end_rcvd; > > my $cur_rcvd_index = -1; # none found yet > - my $result = ''; > + my @results; > > my @hdrs; > if ($getraw) { > @@ -3280,14 +3344,20 @@ sub get_all_hdrs_in_rcvd_index_range { > } > if ((!defined $start_rcvd || $start_rcvd <= $cur_rcvd_index) && > (!defined $end_rcvd || $cur_rcvd_index < $end_rcvd)) { > - $result .= $hdr; > + push @results, $hdr; > } > elsif (defined $end_rcvd && $cur_rcvd_index == $end_rcvd) { > - $result .= $hdr; > + push @results, $hdr; > last; > } > } > - return ($result eq '' ? undef : $result); > + > + if (wantarray) { > + return @results; > + } else { > + my $result = join('', @results); > + return ($result eq '' ? undef : $result); > + } > } > > ########################################################################### > @@ -3377,9 +3447,9 @@ sub all_from_addrs { > my @addrs; > > # Resent- headers take priority, if present. see bug 672 > - my $resent = $self->get('Resent-From',undef); > - if (defined $resent && $resent =~ /\S/) { > - @addrs = $self->{main}->find_all_addrs_in_line ($resent); > + my @resent = $self->get('Resent-From:first:addr'); > + if (@resent) { > + @addrs = @resent; > } > else { > # bug 2292: Used to use find_all_addrs_in_line() with the same > @@ -3387,17 +3457,18 @@ sub all_from_addrs { > # FNs for things like welcomelist_from (previously whitelist_from). > # Since all of these are From > # headers, there should only be 1 address in each anyway (not exactly > - # true, RFC 2822 allows multiple addresses in a From header field), > - # so use the :addr code... > + # true, RFC 2822 allows multiple addresses in a From header field) > + # *** since 4.0 all addresses are returned from Header correctly *** > # bug 3366: some addresses come in as 'foo@bar...', which is invalid. > # so deal with the multiple periods. > + # TODO: 4.0 need :first:addr here ? Why check so many headers ? > ## no critic > @addrs = map { tr/././s; $_ } grep { $_ ne '' } > - ($self->get('From:addr'), # std > - $self->get('Envelope-Sender:addr'), # qmail: new-inject(1) > - $self->get('Resent-Sender:addr'), # procmailrc manpage > - $self->get('X-Envelope-From:addr'), # procmailrc manpage > - $self->get('EnvelopeFrom:addr')); # SMTP envelope > + ($self->get('From:addr'), # std > + $self->get('Envelope-Sender:addr'), # qmail: new-inject(1) > + $self->get('Resent-Sender:addr'), # procmailrc manpage > + $self->get('X-Envelope-From:addr'), # procmailrc manpage > + $self->get('EnvelopeFrom:addr')); # SMTP envelope > # http://www.cs.tut.fi/~jkorpela/headers.html is useful here > } > > @@ -3455,47 +3526,52 @@ sub all_to_addrs { > my @addrs; > > # Resent- headers take priority, if present. see bug 672 > - my $resent = join('', $self->get('Resent-To'), $self->get('Resent-Cc')); > - if ($resent =~ /\S/) { > - @addrs = $self->{main}->find_all_addrs_in_line($resent); > + my @resent = ( $self->get('Resent-To:first:addr'), > + $self->get('Resent-Cc:first:addr') ); > + if (@resent) { > + @addrs = @resent; > } else { > # OK, a fetchmail trick: try to find the recipient address from > # the most recent 3 Received lines. This is required for sendmail, > # since it does not add a helpful header like exim, qmail > # or Postfix do. > # > - my $rcvd = $self->get('Received'); > - $rcvd =~ s/\n[ \t]+/ /gs; > - $rcvd =~ s/\n+/\n/gs; > - > - my @rcvdlines = split(/\n/, $rcvd, 4); pop @rcvdlines; # forget last one > + my @rcvd = ($self->get('Received'))[0 .. 2]; > my @rcvdaddrs; > - foreach my $line (@rcvdlines) { > - if ($line =~ / for (\S+\@\S+);/) { push (@rcvdaddrs, $1); } > + foreach my $line (@rcvd) { > + next unless defined $line; > + if ($line =~ / for <?(\S+\@(\S+?))>?;/) { > + if (is_fqdn_valid(idn_to_ascii($2), 1)) { > + push @rcvdaddrs, $1; > + } > + } > } > > - @addrs = $self->{main}->find_all_addrs_in_line ( > - join('', > - join(" ", @rcvdaddrs)."\n", > - $self->get('To'), # std > - $self->get('Apparently-To'), # sendmail, from envelope > - $self->get('Delivered-To'), # Postfix, poss qmail > - $self->get('Envelope-Recipients'), # qmail: new-inject(1) > - $self->get('Apparently-Resent-To'), # procmailrc manpage > - $self->get('X-Envelope-To'), # procmailrc manpage > - $self->get('Envelope-To'), # exim > - $self->get('X-Delivered-To'), # procmail quick start > - $self->get('X-Original-To'), # procmail quick start > - $self->get('X-Rcpt-To'), # procmail quick start > - $self->get('X-Real-To'), # procmail quick start > - $self->get('Cc'))); # std > + # TODO: 4.0 use :first:addr ? Why so many headers ? > + @addrs = ( > + @rcvdaddrs, > + $self->get('To:addr'), # std > + $self->get('Apparently-To:addr'), # sendmail, from envelope > + $self->get('Delivered-To:addr'), # Postfix, poss qmail > + $self->get('Envelope-Recipients:addr'), # qmail: new-inject(1) > + $self->get('Apparently-Resent-To:addr'), # procmailrc manpage > + $self->get('X-Envelope-To:addr'), # procmailrc manpage > + $self->get('Envelope-To:addr'), # exim > + $self->get('X-Delivered-To:addr'), # procmail quick start > + $self->get('X-Original-To:addr'), # procmail quick start > + $self->get('X-Rcpt-To:addr'), # procmail quick start > + $self->get('X-Real-To:addr'), # procmail quick start > + $self->get('Cc:addr')); # std > # those are taken from various sources; thanks to Nancy McGough, who > # noted some in <http://www.ii.com/internet/robots/procmail/qs/#envelope> > } > > - dbg("eval: all '*To' addrs: " . join(" ", @addrs)); > - $self->{all_to_addrs} = \@addrs; > - return @addrs; > + my %seen; > + my @result = grep { !$seen{$_}++ } @addrs; > + > + dbg("eval: all '*To' addrs: " . join(" ", @result)); > + $self->{all_to_addrs} = \@result; > + return @result; > > # http://www.cs.tut.fi/~jkorpela/headers.html is useful here, also > # http://www.exim.org/pipermail/exim-users/Week-of-Mon-20001009/021672.html > > Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Bayes.pm > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Bayes.pm?rev=1889337&r1=1889336&r2=1889337&view=diff > ============================================================================== > --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Bayes.pm (original) > +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Bayes.pm Fri Apr 30 > 18:17:51 2021 > @@ -1561,10 +1561,12 @@ sub _pre_chew_addr_header { > my ($self, $val) = @_; > local ($_); > > - my @addrs = $self->{main}->find_all_addrs_in_line ($val); > + my @addrs = Mail::SpamAssassin::Util::parse_header_addresses($val); > my @toks; > - foreach (@addrs) { > - push (@toks, $self->_tokenize_mail_addrs ($_)); > + foreach my $addr (@addrs) { > + if (defined $addr->{address}) { > + push @toks, $self->_tokenize_mail_addrs($addr->{address}); > + } > } > return join (' ', @toks); > } > > Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FreeMail.pm > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FreeMail.pm?rev=1889337&r1=1889336&r2=1889337&view=diff > ============================================================================== > --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FreeMail.pm (original) > +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FreeMail.pm Fri Apr 30 > 18:17:51 2021 > @@ -464,13 +464,13 @@ sub check_freemail_header { > $re = $rec; > } > > - my @emails = map (lc, $pms->{main}->find_all_addrs_in_line > ($pms->get($header))); > + my @emails = map (lc, $pms->get("$header:addr")); > > if (!scalar (@emails)) { > dbg("header $header not found from mail"); > return 0; > } > - dbg("addresses from header $header: ".join(';',@emails)); > + dbg("addresses from header $header: ".join(', ', @emails)); > > foreach my $email (@emails) { > if ($self->_is_freemail($email, $pms)) { > @@ -592,24 +592,33 @@ sub check_freemail_replyto { > > # Skip mailing-list etc looking requests, mostly FPs from them > if ($pms->{main}->{conf}->{freemail_skip_bulk_envfrom}) { > - my $envfrom = lc($pms->get("EnvelopeFrom")); > - if ($envfrom =~ $skip_replyto_envfrom) { > + my $envfrom = ($pms->get("EnvelopeFrom"))[0]; > + if (defined $envfrom && $envfrom =~ $skip_replyto_envfrom) { > dbg("envelope sender looks bulk, skipping check: $envfrom"); > return 0; > } > } > > - my $from = lc($pms->get("From:addr")); > - my $replyto = lc($pms->get("Reply-To:addr")); > - my $from_is_fm = $self->_is_freemail($from, $pms); > - my $replyto_is_fm = $self->_is_freemail($replyto, $pms); > + my @from_addrs = map (lc, $pms->get("From:addr")); > + dbg("From address: ".join(", ", @from_addrs)) if @from_addrs; > > - dbg("From address: $from") if $from ne ''; > - dbg("Reply-To address: $replyto") if $replyto ne ''; > + my @replyto_addrs = map (lc, $pms->get("Reply-To:addr")); > + dbg("Reply-To address: ".join(", ", @replyto_addrs)) if @replyto_addrs; > > - if ($from_is_fm and $replyto_is_fm and ($from ne $replyto)) { > + my $from_is_fm = grep { $self->_is_freemail($_, $pms) } @from_addrs; > + my $replyto_is_fm = grep { $self->_is_freemail($_, $pms) } > @replyto_addrs; > + > + my $from_not_in_replyto = 1; > + foreach my $from (@from_addrs) { > + next unless grep { $_ eq $from } @replyto_addrs; > + $from_not_in_replyto = 0; > + } > + > + if ($from_is_fm and $replyto_is_fm and $from_not_in_replyto) { > dbg("HIT! From and Reply-To are different freemails"); > - $self->_got_hit($pms, "$from, $replyto", "From and Reply-To are > different freemails"); > + my $from = join(",", @from_addrs); > + my $replyto = join(",", @replyto_addrs); > + $self->_got_hit($pms, "$from -> $replyto", "From and Reply-To are > different freemails"); > return 0; > } > > @@ -620,7 +629,7 @@ sub check_freemail_replyto { > } > } > elsif ($what eq 'reply') { > - if ($replyto ne '' and !$replyto_is_fm) { > + if (@replyto_addrs and !$replyto_is_fm) { > dbg("Reply-To defined and is not freemail, skipping check"); > return 0; > } > @@ -629,19 +638,21 @@ sub check_freemail_replyto { > return 0; > } > } > - my $reply = $replyto_is_fm ? $replyto : $from; > > return 0 unless $self->_parse_body($pms); > - > + > # Compare body to headers > if (scalar keys %{$pms->{freemail_cache}{body}}) { > - my $check = $what eq 'replyto' ? $replyto : $reply; > - dbg("comparing $check to body freemails"); > - foreach my $email (keys %{$pms->{freemail_cache}{body}}) { > - if ($email ne $check) { > - dbg("HIT! $check and $email are different freemails"); > - $self->_got_hit($pms, "$check, $email", "Different freemails > in reply header and body"); > - return 0; > + my $reply_addrs = $what eq 'replyto' ? \@replyto_addrs : > + $replyto_is_fm ? \@replyto_addrs : > \@from_addrs; > + dbg("comparing to body freemails: ".join(", ", @$reply_addrs)); > + foreach my $body_email (keys %{$pms->{freemail_cache}{body}}) { > + foreach my $reply_email (@$reply_addrs) { > + if ($body_email ne $reply_email) { > + dbg("HIT! $reply_email (Reply) and $body_email (Body) > are different freemails"); > + $self->_got_hit($pms, "$reply_email, $body_email", > "Different freemails in reply header and body"); > + return 0; > + } > } > } > } > > Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/HeaderEval.pm > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/HeaderEval.pm?rev=1889337&r1=1889336&r2=1889337&view=diff > ============================================================================== > --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/HeaderEval.pm (original) > +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/HeaderEval.pm Fri Apr 30 > 18:17:51 2021 > @@ -89,7 +89,7 @@ sub check_for_fake_aol_relay_in_rcvd { > local ($_); > > $_ = $pms->get('Received'); > - s/\s/ /gs; > + s/\s+/ /gs; > > # this is the hostname format used by AOL for their relays. Spammers love > # forging it. Don't make it more specific to match aol.com only, though -- > @@ -125,16 +125,13 @@ sub check_for_faraway_charset_in_headers > return 0 if grep { $_ eq "all" } @locales; > > for my $h (qw(From Subject)) { > - my @hdrs = $pms->get("$h:raw"); # ??? get() returns a scalar ??? > - if ($#hdrs >= 0) { > - $hdr = join(" ", @hdrs); > - } else { > - $hdr = ''; > - } > - while ($hdr =~ /=\?(.+?)\?.\?.*?\?=/g) { > - Mail::SpamAssassin::Locales::is_charset_ok_for_locales($1, @locales) > - or return 1; > - } > + my @hdrs = $pms->get("$h:raw"); > + foreach my $hdr (@hdrs) { > + while ($hdr =~ /=\?(.+?)\?.\?.*?\?=/g) { > + Mail::SpamAssassin::Locales::is_charset_ok_for_locales($1, @locales) > + or return 1; > + } > + } > } > 0; > } > @@ -145,35 +142,35 @@ sub check_for_unique_subject_id { > $_ = lc $pms->get('Subject'); > > my $id = 0; > - if (/[-_\.\s]{7,}([-a-z0-9]{4,})$/ > - || /\s{10,}(?:\S\s)?(\S+)$/ > - || /\s{3,}[-:\#\(\[]+([-a-z0-9]{4,})[\]\)]+$/ > - || /\s{3,}[:\#\(\[]*([a-f0-9]{4,})[\]\)]*$/ > - || /\s{3,}[-:\#]([a-z0-9]{5,})$/ > - || /[\s._]{3,}([^0\s._]\d{3,})$/ > - || /[\s._]{3,}\[(\S+)\]$/ > + if (/[-_\.\s]{7,}([-a-z0-9]{4,})$/m > + || /\s{10,}(?:\S\s)?(\S+)$/m > + || /\s{3,}[-:\#\(\[]+([-a-z0-9]{4,})[\]\)]+$/m > + || /\s{3,}[:\#\(\[]*([a-f0-9]{4,})[\]\)]*$/m > + || /\s{3,}[-:\#]([a-z0-9]{5,})$/m > + || /[\s._]{3,}([^0\s._]\d{3,})$/m > + || /[\s._]{3,}\[(\S+)\]$/m > > # (7217vPhZ0-478TLdy5829qicU9-0@26) and similar > - || /\(([-\w]{7,}\@\d+)\)$/ > + || /\(([-\w]{7,}\@\d+)\)$/m > > # Seven or more digits at the end of a subject is almost certainly a > id > - || /\b(\d{7,})\s*$/ > + || /\b(\d{7,})\s*$/m > > # stuff at end of line after "!" or "?" is usually an id > - || /[!\?]\s*(\d{4,}|\w+(-\w+)+)\s*$/ > + || /[!\?]\s*(\d{4,}|\w+(-\w+)+)\s*$/m > > # 9095IPZK7-095wsvp8715rJgY8-286-28 and similar > # excluding 'Re:', etc and the first word > - || /(?:\w{2,3}:\s)?\w+\s+(\w{7,}-\w{7,}(-\w+)*)\s*$/ > + || /(?:\w{2,3}:\s)?\w+\s+(\w{7,}-\w{7,}(-\w+)*)\s*$/m > > # #30D7 and similar > - || /\s#\s*([a-f0-9]{4,})\s*$/ > + || /\s#\s*([a-f0-9]{4,})\s*$/m > ) > { > $id = $1; > # exempt online purchases > if ($id =~ /\d{5,}/ > - && /(?:item|invoice|order|number|confirmation).{1,6}\Q$id\E\s*$/) > + && /(?:item|invoice|order|number|confirmation).{1,6}\Q$id\E\s*$/m) > { > $id = 0; > } > @@ -270,7 +267,7 @@ sub check_illegal_chars { > > $header .= ":raw" unless $header =~ /:raw$/; > my $str = $pms->get($header); > - return 0 if !defined $str || $str eq ''; > + return 0 if !defined $str || $str !~ /\S/; > > if ($str =~ tr/\x00-\x7F//c && is_valid_utf_8($str)) { > # is non-ASCII and is valid UTF-8 > @@ -304,12 +301,12 @@ sub gated_through_received_hdr_remover { > my ($self, $pms) = @_; > > my $txt = $pms->get("Mailing-List",undef); > - if (defined $txt && $txt =~ /^contact \S+\@\S+\; run by ezmlm$/) { > + if (defined $txt && $txt =~ /^contact \S+\@\S+\; run by ezmlm$/m) { > my $dlto = $pms->get("Delivered-To"); > my $rcvd = $pms->get("Received"); > > # ensure we have other indicative headers too > - if ($dlto =~ /^mailing list \S+\@\S+/ && > + if ($dlto =~ /^mailing list \S+\@\S+/m && > $rcvd =~ /qmail \d+ invoked (?:from network|by .{3,20})\); \d+ ... > \d+/) > { > return 1; > @@ -647,10 +644,9 @@ sub _check_recipients { > my @inputs; > > # ToCc: pseudo-header works best, but sometimes Bcc: is better > - for ('ToCc', 'Bcc') { > - my $to = $pms->get($_); # get recipients > - $to =~ s/\(.*?\)//g; # strip out the (comments) > - push(@inputs, ($to =~ m/([\w.=-]+\@\w+(?:[\w.-]+\.)+\w+)/g)); > + for ('ToCc:addr', 'Bcc:addr') { > + my @to = $pms->get($_); # get recipients > + push @inputs, @to; > last if scalar(@inputs) >= TOCC_SIMILAR_COUNT; > } > > > Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/SPF.pm > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/SPF.pm?rev=1889337&r1=1889336&r2=1889337&view=diff > ============================================================================== > --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/SPF.pm (original) > +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/SPF.pm Fri Apr 30 > 18:17:51 2021 > @@ -381,7 +381,7 @@ sub _check_spf { > $scanner->{checked_for_received_spf_header} = 1; > dbg("spf: checking to see if the message has a Received-SPF header that > we can use"); > > - my @internal_hdrs = split("\n", $scanner->get('ALL-INTERNAL')); > + my @internal_hdrs = $scanner->get('ALL-INTERNAL'); > unless ($scanner->{conf}->{use_newest_received_spf_header}) { > # look for the LAST (earliest in time) header, it'll be the most > accurate > @internal_hdrs = reverse(@internal_hdrs); > @@ -728,7 +728,7 @@ sub _get_sender { > # from the Return-Path, X-Envelope-From, or whatever header. > # it's better to get it from Received though, as that is updated > # hop-by-hop. > - my $sender = $scanner->get("EnvelopeFrom:addr"); > + my $sender = ($scanner->get("EnvelopeFrom:addr"))[0]; > if (defined $sender) { > dbg("spf: found EnvelopeFrom '$sender' from header"); > $scanner->{spf_sender} = lc $sender; > > Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Util.pm > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Util.pm?rev=1889337&r1=1889336&r2=1889337&view=diff > ============================================================================== > --- spamassassin/trunk/lib/Mail/SpamAssassin/Util.pm (original) > +++ spamassassin/trunk/lib/Mail/SpamAssassin/Util.pm Fri Apr 30 18:17:51 2021 > @@ -49,6 +49,7 @@ require 5.008001; # needs utf8::is_utf8 > > use Mail::SpamAssassin::Logger; > > +use version 0.77; > use Exporter (); > > our @ISA = qw(Exporter); > @@ -60,7 +61,7 @@ our @EXPORT_OK = qw(&local_tz &base64_de > &secure_tmpdir &uri_list_canonicalize &get_my_locales > &parse_rfc822_date &idn_to_ascii &is_valid_utf_8 > &get_user_groups &compile_regexp &qr_to_string > - &is_fqdn_valid); > + &is_fqdn_valid &parse_header_addresses); > > our $AM_TAINTED; > > @@ -2334,6 +2335,330 @@ sub get_tag_value_for_score { > > ########################################################################### > > +# RFC 5322 (+IDN?) parsing of addresses and names from To/From/Cc.. headers > +# > +# Return array of hashes, containing at minimum name,address,user,host > +# > +# Override parser with SA_HEADER_ADDRESS_PARSER environment variable > + > +our $header_address_parser; > +our $email_address_xs; > +our $email_address_xs_fix_address; > +BEGIN { > + # SA_HEADER_ADDRESS_PARSER=1 only use internal parser > + # SA_HEADER_ADDRESS_PARSER=2 only use Email::Address::XS > + # By default internal is preferred, will defer for some cases > + $header_address_parser = untaint_var($ENV{'SA_HEADER_ADDRESS_PARSER'}); > + if ((!defined $header_address_parser || $header_address_parser eq '2') && > + eval 'use Email::Address::XS; 1;') { > + $email_address_xs = 1; > + if (version->parse(Email::Address::XS->VERSION) < version->parse(1.02)) { > + $email_address_xs_fix_address = 1; > + } > + } > +} > + > +# Helper for internal parser > +our $header_address_mailre = qr/ > + # user > + (?: > + # quoted localpart > + " (?:|(?:[^"\\]++|\\.)*+) " | > + # or un-quoted localpart > + [^\@\s\<\>\(\)\[\]\,\:\;]+ > + ) > + # domain > + \@ (?: [^\"\s\<\>\(\)\[\]\,\:\;]+ | \[ [\d:.]+ \] ) > +/ix; > + > +# Very relaxed internal parser > +# Only handles non-nested comments in some places > +our $header_address_re = qr/^ > + \s* > + (?: > + # optional phrase, quoted or non-quoted > + (?: > + ( (?: " (?:|(?:[^"\\]++|\\.)*+) " | [^",;<]++ )+ ) > + \s* > + )? > + # and enclosed email (or empty) > + # ... allow whitespace in localpart > + < \s* ( [^>\@]* \S+ | \s* ) \s* > > + # some output duplicate enclosures.. > + (?: \s* < \s* (?: (?: " (?:|(?:[^"\\]++|\\.)*+) " )? \S+ | \s* ) \s* > )* > + | > + # or standalone email or phrase > + (?: > + ( $header_address_mailre ) | > + ( (?: " (?:|(?:[^"\\]++|\\.)*+) " | [^",;<]++ )+ ) > + ) > + ) > + # possible comment after (no nested support here) > + (?: \s* \( ( (?:|(?:[^()\\]++|\\.)*+) ) \) )? > + # Followed by comma (semi-colon sometimes) or finish > + \s* (?: [,;] | \z ) > +/ix; > + > +# > +# Main public function > +# expected input is header contents without Header: itself > +# > +sub parse_header_addresses { > + my ($str) = @_; > + > + return if !defined $str || $str !~ /\S/; > + > + my @results; > + > + # Internal parser > + if (!$header_address_parser || $header_address_parser eq '1') { > + @results = _parse_header_addresses($str); > + } > + > + # Email::Address::XS > + if ($email_address_xs) { > + if (!$header_address_parser || $header_address_parser eq '2') { > + # Only consulted if no internal results, or there doesn't > + # seem to have enough results, or possible nested comments ( ( > + my $maybe_nested = scalar($str =~ /\(/) >= 2; > + if (!@results || $maybe_nested || @results < scalar($str =~ tr/,//)+1) > { > + my @results_xs = _parse_header_addresses_xs($str); > + # If we have more results than internal, use it, or nested > + if (@results_xs > @results || $maybe_nested) { > + return @results_xs; > + } > + } > + } > + } > + > + return @results; > +} > + > +# Check some basic parsing mistakes > +sub _valid_parsed_address { > + return 0 if !defined $_[0]; > + return 0 if index($_[0], '""@') == 0; > + return 0 if scalar($_[0] =~ tr/"//) == 1; > + return 1; > +} > + > +# > +# v0.1, improved internal parser, no support for comments in strange > +# places or nested comments, but handled a large corpus atleast 99% the > +# same as Email::Address::XS and in some cases even better (retains some > +# more name/addr info, even when not fully valid). > +# > +sub _parse_header_addresses { > + local $_ = shift; > + local ($1, $2, $3, $4, $5); > + > + # Clear trailing whitespace > + s/\s+\z//s; > + > + # Strip away all escaped blackslashes, simplifies processing a lot > + s/\\\\//g; > + > + # Reduce group address > + s/^[^"()<>]+:\s*(.*?)\s*(?:;.*)?/$1/gs; > + > + # Skip empty > + return unless /\S/; > + > + my @results; > + while (s/$header_address_re//igs) { > + my $phrase = defined $1 ? $1 : > + defined $4 ? $4 : undef; > + my $address = defined $2 ? $2 : > + defined $3 ? $3 : undef; > + my $comment = defined $5 ? $5 : undef; > + > + my ($user, $host, $invalid); > + > + # Check relaxed <> capture > + if (defined $2) { > + # Remove comments (no nested support here) > + $address =~ s/\((?:|(?:[^()\\]++|\\.)*+)\)//gs; > + # Validate as somewhat email looking > + if ($address !~ /^$header_address_mailre$/) { > + $address = undef; > + } > + } > + > + # Validate some other address oddities > + if (!_valid_parsed_address($address)) { > + $address = undef; > + } > + > + if (defined $phrase) { > + my $newphrase; > + # Parse phrase as quoted and unquoted parts > + while ($phrase =~ /(?:"(|(?:[^"\\]++|\\.)*+)"|([^"]++))/igs) { > + my $qs = $1; > + my $nqs = $2; > + if (defined $qs) { > + # Unescape things inside quoted string > + $qs =~ s/\\(?!\\)//g; > + $qs =~ s/\\\\/\\/g; > + #$qs =~ s/\\//g; > + $newphrase .= $qs; > + } else { > + # Remove comments (no nested support here) > + $nqs =~ s/\((?:|(?:[^()\\]++|\\.)*+)\)//gs; > + $newphrase .= $nqs; > + } > + } > + $phrase = $newphrase; > + > + # If we only have phrase which looks email, swap when valid > + # Check all in one if, either swap or don't > + if (!defined $address && > + $phrase =~ /^$header_address_mailre$/i && > + _valid_parsed_address($phrase) && > + $phrase =~ /^[^\@]*\@([^\@]*)/ && > + is_fqdn_valid(idn_to_ascii($1), 1)) { > + $address = $phrase; > + $phrase = undef; > + } else { > + # Remove redundant phrase==email? > + if (defined $address && $phrase eq $address) { > + $phrase = undef; > + } elsif ($phrase eq '') { > + $phrase = undef; > + } > + } > + } > + > + # Copy comment to phrase if not defined > + if (!defined $phrase && defined $comment) { > + $phrase = $comment; > + } > + > + if (defined $address) { > + # Unescape quoted localpart > + #if ($address =~ /^"(.*?)"\@(.*)/) { > + # $user = $1; > + # $host = $2; > + # $user =~ s/\\//g; > + # $user =~ s/\s+//gs; > + # $address = "$user\@$host"; > + #} > + # Strip sometimes seen quotes > + #$address =~ s/^'(.*?)'$/$1/; > + $address =~ s/^(([^\@]*)\@([^\@]*)).*/$1/; > + ($user, $host) = ($2, $3); > + } > + > + $invalid = !defined $host || !is_fqdn_valid(idn_to_ascii($host), 1); > + push @results, { > + 'phrase' => $phrase, > + 'user' => $user, > + 'host' => $host, > + 'address' => $address, > + 'comment' => $comment, > + 'invalid' => $invalid > + }; > + } > + > + # Was something left unparsed? > + if (index($_, '@') != -1) { > + # Last ditch effort, examples: > + # =?UTF-8?Q?"Foobar"_<[email protected]>?= > + # =?utf-8?Q?"Foobar"?=<[email protected]> > + while (/<($header_address_mailre)>/igs) { > + my $address = $1; > + next if !_valid_parsed_address($address); > + $address =~ s/^(([^\@]*)\@([^\@]*)).*/$1/; > + my ($user, $host) = ($2, $3); > + my $invalid = !is_fqdn_valid(idn_to_ascii($host), 1); > + push @results, { > + 'phrase' => undef, > + 'user' => $user, > + 'host' => $host, > + 'address' => $address, > + 'comment' => undef, > + 'invalid' => $invalid > + }; > + } > + } > + > + return if !@results; > + return @results; > +} > + > +sub _parse_header_addresses_xs { > + my ($str) = @_; > + > + # Strip away all escaped blackslashes, simplifies processing a lot > + $str =~ s/\\\\//g; > + > + my @results; > + my @addrs = Email::Address::XS->parse($str); > + > + local ($1, $2); > + foreach my $addr (@addrs) { > + my $name = $addr->name; > + my $address = $addr->address; > + my $user = $addr->user; > + my $host = $addr->host; > + my $phrase = $addr->phrase; > + my $comment = $addr->comment; > + my $invalid; > + > + # Workaround Bug 5201 for Email::Address::XS > + # From: "[email protected]" > + # If everything else is missing but phrase looks like > + # an email, let's assume it is (hostname verifies) > + if (!defined $address && !defined $user && > + !defined $comment && defined $phrase && > + _valid_parsed_address($phrase) && > + $phrase =~ /^([^\s\@]+)\@([^\s\@]+)$/ && > + is_fqdn_valid(idn_to_ascii($2), 1)) > + { > + $user = $1; > + $host = $2; > + $address = $phrase; > + $name = $user; > + $invalid = 0; > + $phrase = undef; > + } > + else { > + $invalid = !$addr->is_valid; > + } > + > + # Version <1.02 borks address if both user+host are UTF-8 > + if ($email_address_xs_fix_address) { > + if (defined $user && defined $host) { > + # <"Another User"@foo> loses quotes in user, add back > + if (index($user, ' ') != -1 && > + index($user, '"') == -1) { > + $user = '"'.$user.'"'; > + } > + $address = $user.'@'.$host; > + } > + } > + > + # Copy comment to phrase if not defined > + if (!defined $phrase && defined $comment) { > + $phrase = $comment; > + } > + > + # Use input as name if nothing found > + if (!defined $phrase && !defined $address) { > + $phrase = $str; > + } > + > + push @results, { > + 'phrase' => $phrase, > + 'user' => $user, > + 'host' => $host, > + 'address' => $address, > + 'comment' => $comment, > + 'invalid' => $invalid > + }; > + } > + > + return @results; > +} > > 1; > > > Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Util/DependencyInfo.pm > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Util/DependencyInfo.pm?rev=1889337&r1=1889336&r2=1889337&view=diff > ============================================================================== > --- spamassassin/trunk/lib/Mail/SpamAssassin/Util/DependencyInfo.pm (original) > +++ spamassassin/trunk/lib/Mail/SpamAssassin/Util/DependencyInfo.pm Fri Apr > 30 18:17:51 2021 > @@ -302,6 +302,13 @@ our @OPTIONAL_MODULES = ( > desc => 'IO::String emulates file interface for in-core strings. > It is used by the optional OLEVBMacro Plugin.', > }, > +{ > + module => 'Email::Address::XS', > + version => 0, > + desc => 'Email::Address::XS is used to parse email addresses from header > + fields like To/From/cc, per RFC 5322. If installed, it may additionally > + be used by internal parser to process complex lists.', > +}, > ); > > our @BINARIES = (); > > Modified: spamassassin/trunk/t/SATest.pm > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/t/SATest.pm?rev=1889337&r1=1889336&r2=1889337&view=diff > ============================================================================== > --- spamassassin/trunk/t/SATest.pm (original) > +++ spamassassin/trunk/t/SATest.pm Fri Apr 30 18:17:51 2021 > @@ -68,6 +68,7 @@ BEGIN { > # Fix INC to point to built SA > if (-e 't/test_dir') { unshift(@INC, 'blib/lib'); } > elsif (-e 'test_dir') { unshift(@INC, '../blib/lib'); } > + else { die "FATAL: not in or below test directory?\n"; } > } > > # Set up for testing. Exports (as global vars): > > Modified: spamassassin/trunk/t/data/Dumpheaders.pm > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/t/data/Dumpheaders.pm?rev=1889337&r1=1889336&r2=1889337&view=diff > ============================================================================== > --- spamassassin/trunk/t/data/Dumpheaders.pm (original) > +++ spamassassin/trunk/t/data/Dumpheaders.pm Fri Apr 30 18:17:51 2021 > @@ -16,29 +16,81 @@ sub check_end { > my ($self, $opts) = @_; > > local $_; > - $_ = $opts->{permsgstatus}->get("ALL:raw"); > - s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; > > # ignore the M:SpamAssassin:compile() test message > - return if /I need to make this message body somewhat long so TextCat > preloads/; > - print STDOUT "text-all-raw: $_\n"; > + return if $self->{linting}; > + #return if /I need to make this message body somewhat long so TextCat > preloads/; > + > + ## pre-4.0 scalar context calls > + > + $_ = $opts->{permsgstatus}->get("ALL:raw"); > + s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; > + print STDOUT "scalar-text-all-raw: $_"."[END]\n"; > > $_ = $opts->{permsgstatus}->get("ALL"); > s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; > - print STDOUT "text-all-noraw: $_\n"; > + print STDOUT "scalar-text-all-noraw: $_"."[END]\n"; > > $_ = $opts->{permsgstatus}->get("From:raw"); > s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; > - print STDOUT "text-from-raw: $_\n"; > + print STDOUT "scalar-text-from-raw: $_"."[END]\n"; > > $_ = $opts->{permsgstatus}->get("From"); > s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; > - print STDOUT "text-from-noraw: $_\n"; > + print STDOUT "scalar-text-from-noraw: $_"."[END]\n"; > > $_ = $opts->{permsgstatus}->get("From:addr"); > s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; > - print STDOUT "text-from-addr: $_\n"; > + print STDOUT "scalar-text-from-addr: $_"."[END]\n"; > + > + ## 4.0 list context tests > + > + my @l; > + my $s; > + > + @l = $opts->{permsgstatus}->get("ALL:raw"); > + foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; } > + print STDOUT "list-text-all-raw: ".join("[LIST]", @l)."[END]\n"; > + > + @l = $opts->{permsgstatus}->get("ALL"); > + foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; } > + print STDOUT "list-text-all-noraw: ".join("[LIST]", @l)."[END]\n"; > + > + @l = $opts->{permsgstatus}->get("From:raw"); > + foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; } > + print STDOUT "list-text-from-raw: ".join("[LIST]", @l)."[END]\n"; > + > + @l = $opts->{permsgstatus}->get("From"); > + foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; } > + print STDOUT "list-text-from-noraw: ".join("[LIST]", @l)."[END]\n"; > + > + @l = $opts->{permsgstatus}->get("From:addr"); > + foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; } > + print STDOUT "list-text-from-addr: ".join("[LIST]", @l)."[END]\n"; > + > + @l = $opts->{permsgstatus}->get("From:first:addr"); > + foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; } > + print STDOUT "list-text-from-first-addr: ".join("[LIST]", @l)."[END]\n"; > + > + @l = $opts->{permsgstatus}->get("From:last:addr"); > + foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; } > + print STDOUT "list-text-from-last-addr: ".join("[LIST]", @l)."[END]\n"; > + > + @l = $opts->{permsgstatus}->get("MESSAGEID:host"); > + foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; } > + print STDOUT "list-text-msgid-host: ".join("[LIST]", @l)."[END]\n"; > + > + @l = $opts->{permsgstatus}->get("MESSAGEID:domain"); > + foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; } > + print STDOUT "list-text-msgid-domain: ".join("[LIST]", @l)."[END]\n"; > + > + @l = $opts->{permsgstatus}->get("Received:ip"); > + foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; } > + print STDOUT "list-text-received-ip: ".join("[LIST]", @l)."[END]\n"; > > + @l = $opts->{permsgstatus}->get("Received:revip"); > + foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; } > + print STDOUT "list-text-received-revip: ".join("[LIST]", @l)."[END]\n"; > } > > 1; > > Modified: spamassassin/trunk/t/data/nice/unicode1 > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/t/data/nice/unicode1?rev=1889337&r1=1889336&r2=1889337&view=diff > ============================================================================== > --- spamassassin/trunk/t/data/nice/unicode1 (original) > +++ spamassassin/trunk/t/data/nice/unicode1 Fri Apr 30 18:17:51 2021 > @@ -6,7 +6,7 @@ Received: from mail-ig0-x248.esempio-uni > by Sörensen.example.com (Postfix) with UTF8SMTPS > for <Dörte@Sörensen.example.com>; Thu, 8 Oct 2015 07:45:14 +0200 (CEST) > From: =?ISO-8859-1?Q?Maril=F9?= Gioffré ⥠> <Marilù.Gioffré@esempio-università .it> > -To: =?iso-8859-1*sv?Q?D=F6rte_=C5._S=F6rensen,_Jr.?= > +To: =?iso-8859-1*sv?Q?D=F6rte_=C5._S=F6rensen=2C_Jr.?= > <Dörte@Sörensen.example.com> > Cc: Î??ÏεÏ@εÏαÎ??Ïλε.ÏοÎ?? > Subject: =?iso-8859-2*sl?Q?Doma=e8e?= > > Added: spamassassin/trunk/t/data/spam/freemail1 > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/t/data/spam/freemail1?rev=1889337&view=auto > ============================================================================== > --- spamassassin/trunk/t/data/spam/freemail1 (added) > +++ spamassassin/trunk/t/data/spam/freemail1 Fri Apr 30 18:17:51 2021 > @@ -0,0 +1,15 @@ > +Return-Path: <[email protected]> > +Received: from google-public-dns-a.google.com > (google-public-dns-a.google.com [8.8.8.8]) > + by in.example.com (Postfix) with ESMTPS > + for <[email protected]>; Wed, 18 Jul 2018 21:12:22 +0200 (CEST) > +Received: by google-public-dns-a.google.com with SMTP id f21-v6so3811271wmc.5 > + for <[email protected]>; Wed, 18 Jul 2018 12:12:22 -0700 (PDT) > +From: <[email protected]> > +To: [email protected] > +Reply-To: "Spammer" <[email protected]> > +Subject: Freemail test > +Date: Wed, 18 Jul 2018 12:12:00 -0700 (PDT) > +MIME-Version: 1.0 > +Message-Id: <[email protected]> > + > +Freemail test > > Added: spamassassin/trunk/t/data/spam/freemail2 > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/t/data/spam/freemail2?rev=1889337&view=auto > ============================================================================== > --- spamassassin/trunk/t/data/spam/freemail2 (added) > +++ spamassassin/trunk/t/data/spam/freemail2 Fri Apr 30 18:17:51 2021 > @@ -0,0 +1,15 @@ > +Return-Path: <[email protected]> > +Received: from google-public-dns-a.google.com > (google-public-dns-a.google.com [8.8.8.8]) > + by in.example.com (Postfix) with ESMTPS > + for <[email protected]>; Wed, 18 Jul 2018 21:12:22 +0200 (CEST) > +Received: by google-public-dns-a.google.com with SMTP id f21-v6so3811271wmc.5 > + for <[email protected]>; Wed, 18 Jul 2018 12:12:22 -0700 (PDT) > +From: <[email protected]> > +To: [email protected] > +Reply-To: [email protected], "Spammer" <[email protected]> > +Subject: Freemail test > +Date: Wed, 18 Jul 2018 12:12:00 -0700 (PDT) > +MIME-Version: 1.0 > +Message-Id: <[email protected]> > + > +Freemail test with multiple Reply-To's > > Added: spamassassin/trunk/t/data/spam/freemail3 > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/t/data/spam/freemail3?rev=1889337&view=auto > ============================================================================== > --- spamassassin/trunk/t/data/spam/freemail3 (added) > +++ spamassassin/trunk/t/data/spam/freemail3 Fri Apr 30 18:17:51 2021 > @@ -0,0 +1,15 @@ > +Return-Path: <[email protected]> > +Received: from google-public-dns-a.google.com > (google-public-dns-a.google.com [8.8.8.8]) > + by in.example.com (Postfix) with ESMTPS > + for <[email protected]>; Wed, 18 Jul 2018 21:12:22 +0200 (CEST) > +Received: by google-public-dns-a.google.com with SMTP id f21-v6so3811271wmc.5 > + for <[email protected]>; Wed, 18 Jul 2018 12:12:22 -0700 (PDT) > +From: <[email protected]> > +To: [email protected] > +Subject: Freemail test > +Date: Wed, 18 Jul 2018 12:12:00 -0700 (PDT) > +MIME-Version: 1.0 > +Message-Id: <[email protected]> > + > +Freemail test with body email > [email protected] > > Modified: spamassassin/trunk/t/freemail.t > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/t/freemail.t?rev=1889337&r1=1889336&r2=1889337&view=diff > ============================================================================== > --- spamassassin/trunk/t/freemail.t (original) > +++ spamassassin/trunk/t/freemail.t Fri Apr 30 18:17:51 2021 > @@ -5,19 +5,46 @@ use SATest; sa_t_init("freemail"); > > use Test::More; > > -plan tests => 4; > +plan tests => 23; > > # --------------------------------------------------------------------------- > > +# Global > tstprefs (" > freemail_domains gmail.com > +"); > + > +## Standard + whitelist should not hit > + > +tstlocalrules (q{ > freemail_import_whitelist_auth 0 > - whitelist_auth test\@gmail.com > + whitelist_auth [email protected] > header FREEMAIL_FROM eval:check_freemail_from() > -"); > + score FREEMAIL_FROM 3.3 > + header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply') > + score FREEMAIL_REPLYXX 3.3 > + header FREEMAIL_REPLYTO eval:check_freemail_replyto('replyto') > + score FREEMAIL_REPLYTO 3.3 > + header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply') > + score FREEMAIL_REPLYXX 3.3 > + header FREEMAIL_ENVFROM_END_DIGIT > eval:check_freemail_header('EnvelopeFrom', '\d@') > + score FREEMAIL_ENVFROM_END_DIGIT 3.3 > + header FREEMAIL_REPLYTO_END_DIGIT eval:check_freemail_header('Reply-To', > '\d@') > + score FREEMAIL_REPLYTO_END_DIGIT 3.3 > + header FREEMAIL_HDR_REPLYTO eval:check_freemail_header('Reply-To') > + score FREEMAIL_HDR_REPLYTO 3.3 > +}); > > %patterns = ( > - q{ FREEMAIL_FROM }, 'FREEMAIL_FROM', > + q{ 3.3 FREEMAIL_FROM }, 'FREEMAIL_FROM', > +); > +%anti_patterns = ( > + # No Reply-To or body > + q{ 3.3 FREEMAIL_REPLYTO }, 'FREEMAIL_REPLYTO', > + q{ 3.3 FREEMAIL_REPLYXX }, 'FREEMAIL_REPLYXX', > + q{ 3.3 FREEMAIL_ENVFROM_END_DIGIT }, 'FREEMAIL_ENVFROM_END_DIGIT', > + q{ 3.3 FREEMAIL_REPLYTO_END_DIGIT }, 'FREEMAIL_REPLYTO_END_DIGIT', > + q{ 3.3 FREEMAIL_HDR_REPLYTO }, 'FREEMAIL_HDR_REPLYTO', > ); > > ok sarun ("-L -t < data/spam/relayUS.eml", \&patterns_run_cb); > @@ -28,16 +55,85 @@ clear_pattern_counters(); > > %patterns = (); > %anti_patterns = ( > - q{ FREEMAIL_FROM }, 'FREEMAIL_FROM', > + q{ 3.3 FREEMAIL_FROM }, 'FREEMAIL_FROM', > ); > > -tstprefs (" > - freemail_domains gmail.com > +tstlocalrules (q{ > freemail_import_whitelist_auth 1 > - whitelist_auth test\@gmail.com > + whitelist_auth [email protected] > header FREEMAIL_FROM eval:check_freemail_from() > -"); > + score FREEMAIL_FROM 3.3 > +}); > > ok sarun ("-L -t < data/spam/relayUS.eml", \&patterns_run_cb); > ok_all_patterns(); > > +## From and Reply-To different > + > +%patterns = ( > + q{ 3.3 FREEMAIL_FROM }, 'FREEMAIL_FROM', > + q{ 3.3 FREEMAIL_REPLYTO }, 'FREEMAIL_REPLYTO', > + q{ 3.3 FREEMAIL_REPLYXX }, 'FREEMAIL_REPLYXX', > + q{ 3.3 FREEMAIL_ENVFROM_END_DIGIT }, 'FREEMAIL_ENVFROM_END_DIGIT', > + q{ 3.3 FREEMAIL_REPLYTO_END_DIGIT }, 'FREEMAIL_REPLYTO_END_DIGIT', > + q{ 3.3 FREEMAIL_HDR_REPLYTO }, 'FREEMAIL_HDR_REPLYTO', > +); > +%anti_patterns = (); > + > +tstlocalrules (q{ > + header FREEMAIL_FROM eval:check_freemail_from() > + score FREEMAIL_FROM 3.3 > + header FREEMAIL_REPLYTO eval:check_freemail_replyto('replyto') > + score FREEMAIL_REPLYTO 3.3 > + header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply') > + score FREEMAIL_REPLYXX 3.3 > + header FREEMAIL_ENVFROM_END_DIGIT > eval:check_freemail_header('EnvelopeFrom', '\d@') > + score FREEMAIL_ENVFROM_END_DIGIT 3.3 > + header FREEMAIL_REPLYTO_END_DIGIT eval:check_freemail_header('Reply-To', > '\d@') > + score FREEMAIL_REPLYTO_END_DIGIT 3.3 > + header FREEMAIL_HDR_REPLYTO eval:check_freemail_header('Reply-To') > + score FREEMAIL_HDR_REPLYTO 3.3 > +}); > + > +ok sarun ("-L -t < data/spam/freemail1", \&patterns_run_cb); > +ok_all_patterns(); > + > +## Multiple Reply-To values, no email on body > + > +%patterns = ( > + q{ 3.3 FREEMAIL_REPLYTO }, 'FREEMAIL_REPLYTO', > + q{ 3.3 FREEMAIL_REPLYXX }, 'FREEMAIL_REPLYXX', > + q{ 3.3 FREEMAIL_REPLYTO_END_DIGIT }, 'FREEMAIL_REPLYTO_END_DIGIT', > + q{ 3.3 FREEMAIL_HDR_REPLYTO }, 'FREEMAIL_HDR_REPLYTO', > +); > +%anti_patterns = (); > + > +tstlocalrules (q{ > + header FREEMAIL_REPLYTO eval:check_freemail_replyto('replyto') > + score FREEMAIL_REPLYTO 3.3 > + header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply') > + score FREEMAIL_REPLYXX 3.3 > + header FREEMAIL_REPLYTO_END_DIGIT eval:check_freemail_header('Reply-To', > '\d@') > + score FREEMAIL_REPLYTO_END_DIGIT 3.3 > + header FREEMAIL_HDR_REPLYTO eval:check_freemail_header('Reply-To') > + score FREEMAIL_HDR_REPLYTO 3.3 > +}); > + > +ok sarun ("-L -t < data/spam/freemail2", \&patterns_run_cb); > +ok_all_patterns(); > + > +## No Reply-To, another freemail in body > + > +%patterns = ( > + q{ 3.3 FREEMAIL_REPLYXX }, 'FREEMAIL_REPLYXX', > +); > +%anti_patterns = (); > + > +tstlocalrules (q{ > + header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply') > + score FREEMAIL_REPLYXX 3.3 > +}); > + > +ok sarun ("-L -t < data/spam/freemail3", \&patterns_run_cb); > +ok_all_patterns(); > + > > Modified: spamassassin/trunk/t/freemail_welcome_block.t > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/t/freemail_welcome_block.t?rev=1889337&r1=1889336&r2=1889337&view=diff > ============================================================================== > --- spamassassin/trunk/t/freemail_welcome_block.t (original) > +++ spamassassin/trunk/t/freemail_welcome_block.t Fri Apr 30 18:17:51 2021 > @@ -1,23 +1,50 @@ > #!/usr/bin/perl -T > > use lib '.'; use lib 't'; > -use SATest; sa_t_init("freemail_welcome_block"); > +use SATest; sa_t_init("freemail"); > > use Test::More; > > -plan tests => 4; > +plan tests => 23; > > # --------------------------------------------------------------------------- > > +# Global > tstprefs (" > freemail_domains gmail.com > +"); > + > +## Standard + welcomelist should not hit > + > +tstlocalrules (q{ > freemail_import_welcomelist_auth 0 > - welcomelist_auth test\@gmail.com > + welcomelist_auth [email protected] > header FREEMAIL_FROM eval:check_freemail_from() > -"); > + score FREEMAIL_FROM 3.3 > + header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply') > + score FREEMAIL_REPLYXX 3.3 > + header FREEMAIL_REPLYTO eval:check_freemail_replyto('replyto') > + score FREEMAIL_REPLYTO 3.3 > + header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply') > + score FREEMAIL_REPLYXX 3.3 > + header FREEMAIL_ENVFROM_END_DIGIT > eval:check_freemail_header('EnvelopeFrom', '\d@') > + score FREEMAIL_ENVFROM_END_DIGIT 3.3 > + header FREEMAIL_REPLYTO_END_DIGIT eval:check_freemail_header('Reply-To', > '\d@') > + score FREEMAIL_REPLYTO_END_DIGIT 3.3 > + header FREEMAIL_HDR_REPLYTO eval:check_freemail_header('Reply-To') > + score FREEMAIL_HDR_REPLYTO 3.3 > +}); > > %patterns = ( > - q{ FREEMAIL_FROM }, 'FREEMAIL_FROM', > + q{ 3.3 FREEMAIL_FROM }, 'FREEMAIL_FROM', > +); > +%anti_patterns = ( > + # No Reply-To or body > + q{ 3.3 FREEMAIL_REPLYTO }, 'FREEMAIL_REPLYTO', > + q{ 3.3 FREEMAIL_REPLYXX }, 'FREEMAIL_REPLYXX', > + q{ 3.3 FREEMAIL_ENVFROM_END_DIGIT }, 'FREEMAIL_ENVFROM_END_DIGIT', > + q{ 3.3 FREEMAIL_REPLYTO_END_DIGIT }, 'FREEMAIL_REPLYTO_END_DIGIT', > + q{ 3.3 FREEMAIL_HDR_REPLYTO }, 'FREEMAIL_HDR_REPLYTO', > ); > > ok sarun ("-L -t < data/spam/relayUS.eml", \&patterns_run_cb); > @@ -28,16 +55,85 @@ clear_pattern_counters(); > > %patterns = (); > %anti_patterns = ( > - q{ FREEMAIL_FROM }, 'FREEMAIL_FROM', > + q{ 3.3 FREEMAIL_FROM }, 'FREEMAIL_FROM', > ); > > -tstlocalrules (" > - freemail_domains gmail.com > +tstlocalrules (q{ > freemail_import_welcomelist_auth 1 > - welcomelist_auth test\@gmail.com > + welcomelist_auth [email protected] > header FREEMAIL_FROM eval:check_freemail_from() > -"); > + score FREEMAIL_FROM 3.3 > +}); > > ok sarun ("-L -t < data/spam/relayUS.eml", \&patterns_run_cb); > ok_all_patterns(); > > +## From and Reply-To different > + > +%patterns = ( > + q{ 3.3 FREEMAIL_FROM }, 'FREEMAIL_FROM', > + q{ 3.3 FREEMAIL_REPLYTO }, 'FREEMAIL_REPLYTO', > + q{ 3.3 FREEMAIL_REPLYXX }, 'FREEMAIL_REPLYXX', > + q{ 3.3 FREEMAIL_ENVFROM_END_DIGIT }, 'FREEMAIL_ENVFROM_END_DIGIT', > + q{ 3.3 FREEMAIL_REPLYTO_END_DIGIT }, 'FREEMAIL_REPLYTO_END_DIGIT', > + q{ 3.3 FREEMAIL_HDR_REPLYTO }, 'FREEMAIL_HDR_REPLYTO', > +); > +%anti_patterns = (); > + > +tstlocalrules (q{ > + header FREEMAIL_FROM eval:check_freemail_from() > + score FREEMAIL_FROM 3.3 > + header FREEMAIL_REPLYTO eval:check_freemail_replyto('replyto') > + score FREEMAIL_REPLYTO 3.3 > + header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply') > + score FREEMAIL_REPLYXX 3.3 > + header FREEMAIL_ENVFROM_END_DIGIT > eval:check_freemail_header('EnvelopeFrom', '\d@') > + score FREEMAIL_ENVFROM_END_DIGIT 3.3 > + header FREEMAIL_REPLYTO_END_DIGIT eval:check_freemail_header('Reply-To', > '\d@') > + score FREEMAIL_REPLYTO_END_DIGIT 3.3 > + header FREEMAIL_HDR_REPLYTO eval:check_freemail_header('Reply-To') > + score FREEMAIL_HDR_REPLYTO 3.3 > +}); > + > +ok sarun ("-L -t < data/spam/freemail1", \&patterns_run_cb); > +ok_all_patterns(); > + > +## Multiple Reply-To values, no email on body > + > +%patterns = ( > + q{ 3.3 FREEMAIL_REPLYTO }, 'FREEMAIL_REPLYTO', > + q{ 3.3 FREEMAIL_REPLYXX }, 'FREEMAIL_REPLYXX', > + q{ 3.3 FREEMAIL_REPLYTO_END_DIGIT }, 'FREEMAIL_REPLYTO_END_DIGIT', > + q{ 3.3 FREEMAIL_HDR_REPLYTO }, 'FREEMAIL_HDR_REPLYTO', > +); > +%anti_patterns = (); > + > +tstlocalrules (q{ > + header FREEMAIL_REPLYTO eval:check_freemail_replyto('replyto') > + score FREEMAIL_REPLYTO 3.3 > + header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply') > + score FREEMAIL_REPLYXX 3.3 > + header FREEMAIL_REPLYTO_END_DIGIT eval:check_freemail_header('Reply-To', > '\d@') > + score FREEMAIL_REPLYTO_END_DIGIT 3.3 > + header FREEMAIL_HDR_REPLYTO eval:check_freemail_header('Reply-To') > + score FREEMAIL_HDR_REPLYTO 3.3 > +}); > + > +ok sarun ("-L -t < data/spam/freemail2", \&patterns_run_cb); > +ok_all_patterns(); > + > +## No Reply-To, another freemail in body > + > +%patterns = ( > + q{ 3.3 FREEMAIL_REPLYXX }, 'FREEMAIL_REPLYXX', > +); > +%anti_patterns = (); > + > +tstlocalrules (q{ > + header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply') > + score FREEMAIL_REPLYXX 3.3 > +}); > + > +ok sarun ("-L -t < data/spam/freemail3", \&patterns_run_cb); > +ok_all_patterns(); > + > > Modified: spamassassin/trunk/t/get_all_headers.t > URL: > http://svn.apache.org/viewvc/spamassassin/trunk/t/get_all_headers.t?rev=1889337&r1=1889336&r2=1889337&view=diff > ============================================================================== > --- spamassassin/trunk/t/get_all_headers.t (original) > +++ spamassassin/trunk/t/get_all_headers.t Fri Apr 30 18:17:51 2021 > @@ -2,14 +2,34 @@ > > use lib '.'; use lib 't'; > use SATest; sa_t_init("get_all_headers"); > -use Test::More tests => 5; > +use Test::More; > + > +use constant HAS_EMAIL_ADDRESS_XS => eval { require Email::Address::XS; }; > + > +$tests = 19; > +$tests += 19 if (HAS_EMAIL_ADDRESS_XS); > +plan tests => $tests; > > # --------------------------------------------------------------------------- > > %patterns = ( > - q{ MIME-Version: 1.0 } => 'no-extra-space', > - q{/text-all-raw: Received: from yahoo\.com\[\\\\n\] > \(PPPa33-ResaleLosAngelesMetroB2-2R7452\.dialinx\.net \[4\.48\.136\.190\]\) > by\[\\\\n\] www\.goabroad\.com\.cn \(8\.9\.3/8\.9\.3\) with SMTP id > TAA96146; Thu,\[\\\\n\] 30 Aug 2001 19:06:45 \+0800 \(CST\) > \(envelope-from\[\\\\n\] pertand\@email\.mondolink\.com\)\[\\\\n\]From > :<tst1\@example\.com>\[\\\\n\]X-Mailer: Mozilla 4\.04 \[en\]C-bls40 \(Win95; > U\)\[\\\\n\]To: jenny33436\@netscape\.net\[\\\\n\]Subject: > via\.gra\[\\\\n\]From:\[\\\\t\] <tst2\@example\.com>\[\\\\n\]DATE: Fri, 7 > Dec 2001 07:01:03\[\\\\n\]MIME-Version: 1\.0\[\\\\n\]Message-Id: > <20011206235802\.4FD6F1143D6\@mail\.netnoteinc\.com>\[\\\\n\]Sender: > travelincentives\@aol\.com\[\\\\n\]Content-Type: text/plain; > charset="us-ascii"\[\\\\n\]/} => 'full-headers-raw', > - q{/text-all-noraw: Received: from yahoo\\.com > \\(PPPa33-ResaleLosAngelesMetroB2-2R7452\\.dialinx\\.net > \\[4\\.48\\.136\\.190\\]\\) by www\\.goabroad\\.com\\.cn > \\(8\\.9\\.3/8\\.9\\.3\\) with SMTP id TAA96146; Thu, 30 Aug 2001 19:06:45 > \\+0800 \\(CST\\) \\(envelope-from > pertand\\@email\\.mondolink\\.com\\)\[\\\\n\]From: > <tst1\\@example\\.com>\[\\\\n\]X-Mailer: Mozilla 4\\.04 \\[en\\]C-bls40 > \\(Win95; U\\)\[\\\\n\]To: jenny33436\\@netscape\\.net\[\\\\n\]Subject: > via\\.gra\[\\\\n\]From: <tst2\\@example\\.com>\[\\\\n\]DATE: Fri, 7 Dec 2001 > 07:01:03\[\\\\n\]MIME-Version: 1\\.0\[\\\\n\]Message-Id: > <20011206235802\\.4FD6F1143D6\\@mail\\.netnoteinc\\.com>\[\\\\n\]Sender: > travelincentives\\@aol\\.com\[\\\\n\]Content-Type: text/plain; > charset="us-ascii"\[\\\\n\]/} => 'full-headers-noraw', > + q{'MIME-Version: 1.0'} => 'no-extra-space', > + q{'scalar-text-all-raw: Received: from yahoo.com[\n] > (PPPa33-ResaleLosAngelesMetroB2-2R7452.dialinx.net [4.48.136.190]) by[\n] > www.goabroad.com.cn (8.9.3/8.9.3) with SMTP id TAA96146; Thu,[\n] 30 Aug > 2001 19:06:45 +0800 (CST) (envelope-from[\n] > [email protected])[\n]From :<[email protected]>[\n]X-Mailer: > Mozilla 4.04 [en]C-bls40 (Win95; U)[\n]To: > [email protected][\n]Subject: via.gra[\n]From:[\t] > <[email protected]>[\n]DATE: Fri, 7 Dec 2001 07:01:03[\n]MIME-Version: > 1.0[\n]Message-Id: > <[email protected]>[\n]Sender: > [email protected][\n]Content-Type: text/plain; > charset="us-ascii"[\n][END]'} => 'scalar-text-all-raw', > + q{'scalar-text-all-noraw: Received: from yahoo.com > (PPPa33-ResaleLosAngelesMetroB2-2R7452.dialinx.net [4.48.136.190]) by > www.goabroad.com.cn (8.9.3/8.9.3) with SMTP id TAA96146; Thu, 30 Aug 2001 > 19:06:45 +0800 (CST) (envelope-from [email protected])[\n]From: > <[email protected]>[\n]X-Mailer: Mozilla 4.04 [en]C-bls40 (Win95; U)[\n]To: > [email protected][\n]Subject: via.gra[\n]From: > <[email protected]>[\n]DATE: Fri, 7 Dec 2001 07:01:03[\n]MIME-Version: > 1.0[\n]Message-Id: > <[email protected]>[\n]Sender: > [email protected][\n]Content-Type: text/plain; > charset="us-ascii"[\n][END]'} => 'scalar-text-all-noraw', > + q{'scalar-text-from-raw: <[email protected]>[\n][\t] > <[email protected]>[\n][END]'} => 'scalar-text-from-raw', > + q{'scalar-text-from-noraw: > <[email protected]>[\n]<[email protected]>[\n][END]'} => > 'scalar-text-from-noraw', > + q{'scalar-text-from-addr: [email protected][END]'} => > 'scalar-text-from-addr', > + q{'list-text-all-raw: Received: from yahoo.com[\n] > (PPPa33-ResaleLosAngelesMetroB2-2R7452.dialinx.net [4.48.136.190]) by[\n] > www.goabroad.com.cn (8.9.3/8.9.3) with SMTP id TAA96146; Thu,[\n] 30 Aug > 2001 19:06:45 +0800 (CST) (envelope-from[\n] > [email protected])[\n][LIST]From > :<[email protected]>[\n][LIST]X-Mailer: Mozilla 4.04 [en]C-bls40 (Win95; > U)[\n][LIST]To: [email protected][\n][LIST]Subject: > via.gra[\n][LIST]From:[\t] <[email protected]>[\n][LIST]DATE: Fri, 7 Dec 2001 > 07:01:03[\n][LIST]MIME-Version: 1.0[\n][LIST]Message-Id: > <[email protected]>[\n][LIST]Sender: > [email protected][\n][LIST]Content-Type: text/plain; > charset="us-ascii"[\n][END]'} => 'list-text-all-raw', > + q{'list-text-all-noraw: Received: from yahoo.com > (PPPa33-ResaleLosAngelesMetroB2-2R7452.dialinx.net [4.48.136.190]) by > www.goabroad.com.cn (8.9.3/8.9.3) with SMTP id TAA96146; Thu, 30 Aug 2001 > 19:06:45 +0800 (CST) (envelope-from > [email protected])[\n][LIST]From: > <[email protected]>[\n][LIST]X-Mailer: Mozilla 4.04 [en]C-bls40 (Win95; > U)[\n][LIST]To: [email protected][\n][LIST]Subject: > via.gra[\n][LIST]From: <[email protected]>[\n][LIST]DATE: Fri, 7 Dec 2001 > 07:01:03[\n][LIST]MIME-Version: 1.0[\n][LIST]Message-Id: > <[email protected]>[\n][LIST]Sender: > [email protected][\n][LIST]Content-Type: text/plain; > charset="us-ascii"[\n][END]'} => 'list-text-all-noraw', > + q{'list-text-from-raw: <[email protected]>[\n][LIST][\t] > <[email protected]>[\n][END]'} => 'list-text-from-raw', > + q{'list-text-from-noraw: > <[email protected]>[\n][LIST]<[email protected]>[\n][END]'} => > 'list-text-from-noraw', > + q{'list-text-from-addr: [email protected][LIST][email protected][END]'} => > 'list-text-from-addr', > + q{'list-text-from-first-addr: [email protected][END]'} => > 'list-text-from-first-addr', > + q{'list-text-from-last-addr: [email protected][END]'} => > 'list-text-from-last-addr', > + q{'list-text-msgid-host: mail.netnoteinc.com[END]'} => > 'list-text-msgid-host', > + q{'list-text-msgid-domain: netnoteinc.com[END]'} => > 'list-text-msgid-domain', > + q{'list-text-received-ip: 4.48.136.190[END]'} => 'list-text-received-ip', > + q{'list-text-received-revip: 190.136.48.4[END]'} => > 'list-text-received-revip', > ); > > %anti_patterns = ( > @@ -20,6 +40,15 @@ tstprefs (" > loadplugin Dumpheaders ../../../data/Dumpheaders.pm > "); > > +# Internal parser > +$ENV{'SA_HEADER_ADDRESS_PARSER'} = 1; > ok (sarun ("-L -t < data/spam/008", \&patterns_run_cb)); > ok_all_patterns(); > > +if (HAS_EMAIL_ADDRESS_XS) { > + # Email::Address::XS > + $ENV{'SA_HEADER_ADDRESS_PARSER'} = 2; > + ok (sarun ("-L -t < data/spam/008", \&patterns_run_cb)); > + ok_all_patterns(); > +} else { warn "Not running Email::Address::XS tests, module missing\n"; } > + >
