Please note the large changeset and have a try.  I've been tweaking it all
week, should be good for general use.


On Fri, Apr 30, 2021 at 06:17:51PM -0000, [email protected] wrote:
> Author: hege
> Date: Fri Apr 30 18:17:51 2021
> New Revision: 1889337
> 
> URL: http://svn.apache.org/viewvc?rev=1889337&view=rev
> Log:
> - Improved internal header address (From/To/Cc) parser, now also handles
>   multiple addresses.  Optional support for external Email::Address::XS
>   parser, which can handle nested comments and other oddities.
> 
> - Header :addr :name modifiers now returns all addresses.  :first :last
>   select only first (topmost) or last header to process, when there are
>   multiple headers with the same name (:addr and :name may still return
>   multiple values from a single header).
> 
> - API: $pms->get() can and should now be called in list context.  Scalar
>   context continues to return multiple values newline separated, but this
>   should be considered deprecated.
> 
> 
> Added:
>     spamassassin/trunk/t/data/spam/freemail1
>     spamassassin/trunk/t/data/spam/freemail2
>     spamassassin/trunk/t/data/spam/freemail3
> Modified:
>     spamassassin/trunk/MANIFEST
>     spamassassin/trunk/UPGRADE
>     spamassassin/trunk/lib/Mail/SpamAssassin.pm
>     spamassassin/trunk/lib/Mail/SpamAssassin/Conf.pm
>     spamassassin/trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm
>     spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Bayes.pm
>     spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FreeMail.pm
>     spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
>     spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/SPF.pm
>     spamassassin/trunk/lib/Mail/SpamAssassin/Util.pm
>     spamassassin/trunk/lib/Mail/SpamAssassin/Util/DependencyInfo.pm
>     spamassassin/trunk/t/SATest.pm
>     spamassassin/trunk/t/data/Dumpheaders.pm
>     spamassassin/trunk/t/data/nice/unicode1
>     spamassassin/trunk/t/freemail.t
>     spamassassin/trunk/t/freemail_welcome_block.t
>     spamassassin/trunk/t/get_all_headers.t
>     spamassassin/trunk/t/get_headers.t   (contents, props changed)
>     spamassassin/trunk/t/header_utf8.t   (contents, props changed)
> 
> Modified: spamassassin/trunk/MANIFEST
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/MANIFEST?rev=1889337&r1=1889336&r2=1889337&view=diff
> ==============================================================================
> --- spamassassin/trunk/MANIFEST (original)
> +++ spamassassin/trunk/MANIFEST Fri Apr 30 18:17:51 2021
> @@ -414,6 +414,9 @@ t/data/spam/esp/sendgrid_id.eml
>  t/data/spam/esp/sendgrid_id.txt
>  t/data/spam/extracttext/gtube_jpg.eml
>  t/data/spam/extracttext/gtube_pdf.eml
> +t/data/spam/freemail1
> +t/data/spam/freemail2
> +t/data/spam/freemail3
>  t/data/spam/gtube.eml
>  t/data/spam/gtubedcc.eml
>  t/data/spam/gtubedcc_crlf.eml
> 
> Modified: spamassassin/trunk/UPGRADE
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/UPGRADE?rev=1889337&r1=1889336&r2=1889337&view=diff
> ==============================================================================
> --- spamassassin/trunk/UPGRADE (original)
> +++ spamassassin/trunk/UPGRADE Fri Apr 30 18:17:51 2021
> @@ -2,6 +2,19 @@
>  Note for Users Upgrading to SpamAssassin 4.0.0
>  ----------------------------------------------
>  
> +- Improved internal header address (From/To/Cc) parser, now also handles
> +  multiple addresses.  Optional support for external Email::Address::XS
> +  parser, which can handle nested comments and other oddities.
> +
> +- Header :addr :name modifiers now returns all addresses.  :first :last
> +  select only first (topmost) or last header to process, when there are
> +  multiple headers with the same name (:addr and :name may still return
> +  multiple values from a single header).
> +
> +- API: $pms->get() can and should now be called in list context.  Scalar
> +  context continues to return multiple values newline separated, but this
> +  should be considered deprecated.
> +
>  - New ExtractText plugin that extracts text from documents or images and 
> feed it
>    into SpamAssassin
>  
> 
> Modified: spamassassin/trunk/lib/Mail/SpamAssassin.pm
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin.pm?rev=1889337&r1=1889336&r2=1889337&view=diff
> ==============================================================================
> --- spamassassin/trunk/lib/Mail/SpamAssassin.pm (original)
> +++ spamassassin/trunk/lib/Mail/SpamAssassin.pm Fri Apr 30 18:17:51 2021
> @@ -1064,8 +1064,11 @@ sub add_all_addresses_to_blacklist {
>  
>    my @addrlist;
>    my @hdrs = $mail_obj->get_header('From');
> -  if ($#hdrs >= 0) {
> -    push (@addrlist, $self->find_all_addrs_in_line (join (" ", @hdrs)));
> +  foreach my $hdr (@hdrs) {
> +    my @addrs = Mail::SpamAssassin::Util::parse_header_addresses($hdr);
> +    foreach my $addr (@addrs) {
> +      push @addrlist, $addr->{address} if defined $addr->{address};
> +    }
>    }
>  
>    foreach my $addr (@addrlist) {
> @@ -2244,8 +2247,12 @@ sub find_all_addrs_in_mail {
>                               Errors-To Mail-Followup-To))
>    {
>      my @hdrs = $mail_obj->get_header($header);
> -    if ($#hdrs < 0) { next; }
> -    push (@addrlist, $self->find_all_addrs_in_line(join (" ", @hdrs)));
> +    foreach my $hdr (@hdrs) {
> +      my @addrs = Mail::SpamAssassin::Util::parse_header_addresses($hdr);
> +      foreach my $addr (@addrs) {
> +        push @addrlist, $addr->{address} if defined $addr->{address};
> +      }
> +    }
>    }
>  
>    # find addrs in body, too
> 
> Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Conf.pm
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Conf.pm?rev=1889337&r1=1889336&r2=1889337&view=diff
> ==============================================================================
> --- spamassassin/trunk/lib/Mail/SpamAssassin/Conf.pm (original)
> +++ spamassassin/trunk/lib/Mail/SpamAssassin/Conf.pm Fri Apr 30 18:17:51 2021
> @@ -5430,6 +5430,7 @@ sub feature_subjprefix { 1 } # add subje
>  sub feature_bayes_stopwords { 1 } # multi language stopwords in Bayes
>  sub feature_get_host { 1 } # $pms->get() :host :domain :ip :revip # was 
> implemented together with AskDNS::has_tag_header # Bug 7734
>  sub feature_blocklist_welcomelist { 1 } # bz 7826
> +sub feature_header_address_parser { 1 } # improved header address parsing 
> using Email::Address::XS, $pms->get() list context
>  sub has_tflags_nosubject { 1 } # tflags nosubject
>  sub has_tflags_nolog { 1 } # tflags nolog
>  sub perl_min_version_5010000 { return $] >= 5.010000 }  # perl version check 
> ("perl_version" not neatly backwards-compatible)
> 
> Modified: spamassassin/trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm?rev=1889337&r1=1889336&r2=1889337&view=diff
> ==============================================================================
> --- spamassassin/trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm (original)
> +++ spamassassin/trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm Fri Apr 30 
> 18:17:51 2021
> @@ -62,7 +62,7 @@ use Mail::SpamAssassin::AsyncLoop;
>  use Mail::SpamAssassin::Conf;
>  use Mail::SpamAssassin::Util qw(untaint_var base64_encode idn_to_ascii
>                                  uri_list_canonicalize reverse_ip_address
> -                                is_fqdn_valid);
> +                                is_fqdn_valid parse_header_addresses);
>  use Mail::SpamAssassin::Timeout;
>  use Mail::SpamAssassin::Logger;
>  
> @@ -1953,21 +1953,24 @@ sub extract_message_metadata {
>    # tags (explicitly required for DMARC, RFC 7489)
>    #
>    { local $1;
> -    my $addr = $self->get('EnvelopeFrom:addr', undef);
> +    my $host = ($self->get('EnvelopeFrom:first:addr:host'))[0];
>      # collect a FQDN, ignoring potential trailing WSP
> -    if (defined $addr && $addr =~ /\@([^@. \t]+\.[^@ \t]+?)[ \t]*\z/s) {
> -      my $d = idn_to_ascii($1);
> +    if (defined $host) {
> +      my $d = idn_to_ascii($host);
>        $self->set_tag('SENDERDOMAIN', $d);
>        $self->{msg}->put_metadata("X-SenderDomain", $d);
>        dbg("metadata: X-SenderDomain: %s", $d);
>      }
> -    # TODO: the get ':addr' only returns the first address; this should be
> -    # augmented to be able to return all addresses in a header field, 
> multiple
> -    # addresses in a From header field are allowed according to RFC 5322
> -    $addr = $self->get('From:addr', undef);
> -    if (defined $addr && $addr =~ /\@([^@. \t]+\.[^@ \t]+?)[ \t]*\z/s) {
> -      my $d = idn_to_ascii($1);
> -      $self->set_tag('AUTHORDOMAIN', $d);
> +    my @from_doms;
> +    my %seen;
> +    foreach ($self->get('From:addr:host')) {
> +      next if $seen{$_}++;
> +      my $d = idn_to_ascii($_);
> +      push @from_doms, $d;
> +    }
> +    if (@from_doms) {
> +      $self->set_tag('AUTHORDOMAIN', @from_doms > 1 ? \@from_doms : 
> $from_doms[0]);
> +      my $d = join(" ", @from_doms);
>        $self->{msg}->put_metadata("X-AuthorDomain", $d);
>        dbg("metadata: X-AuthorDomain: %s", $d);
>      }
> @@ -2031,25 +2034,32 @@ sub get_decoded_stripped_body_text_array
>  
>  =item $status->get (header_name [, default_value])
>  
> -Returns a message header, pseudo-header, real name or address.
> -C<header_name> is the name of a mail header, such as 'Subject', 'To',
> -etc.  If C<default_value> is given, it will be used if the requested
> -C<header_name> does not exist.
> -
> -Appending C<:raw> to the header name will inhibit decoding of 
> quoted-printable
> -or base-64 encoded strings.
> -
> -Appending a modifier C<:addr> to a header field name will cause everything
> -except the first email address to be removed from the header field.  It is
> -mainly applicable to header fields 'From', 'Sender', 'To', 'Cc' along with
> -their 'Resent-*' counterparts, and the 'Return-Path'. For example, all of
> -the following will result in "example@foo":
> +Returns a message header, pseudo-header or a real name, email-address or
> +some other parsed value set by modifiers.  C<header_name> is the name of a
> +mail header, such as 'Subject', 'To', etc.
> +
> +Should be called in list context since 4.0.  Will return list of headers
> +content, or other values when modifiers used.
> +
> +If C<default_value> is given, it will be used if the requested
> +C<header_name> does not exist.  This is mainly useful when called in scalar
> +context to set 'undef' instead of legacy '' return value when header does
> +not exist.
> +
> +Appending C<:raw> modifier to the header name will inhibit decoding of
> +quoted-printable or base-64 encoded strings.
> +
> +Appending C<:addr> modifier to the header name will return all
> +email-addresses found in the header.  It is mainly applicable to header
> +fields 'From', 'Sender', 'To', 'Cc' along with their 'Resent-*'
> +counterparts, and the 'Return-Path'.  For example, all of the following will
> +result in "example@foo" (and "example@bar"):
>  
>  =over 4
>  
>  =item example@foo
>  
> -=item example@foo (Foo Blah)
> +=item example@foo (Foo Blah), <example@bar>
>  
>  =item example@foo, example@bar
>  
> @@ -2063,18 +2073,18 @@ the following will result in "example@fo
>  
>  =back
>  
> -Appending a modifier C<:name> to a header field name will cause everything
> -except the first display name to be removed from the header field. It is
> -mainly applicable to header fields containing a single mail address: 'From',
> -'Sender', along with their 'Resent-From' and 'Resent-Sender' counterparts.
> -For example, all of the following will result in "Foo Blah". One level of
> -single quotes is stripped too, as it is often seen.
> +Appending C<:name> modifier to the header name will return all "display
> +names" from the header field.  As with C<:addr>, it is mainly applicable to
> +header fields 'From', 'Sender', 'To', 'Cc' along with their 'Resent-*'
> +counterparts, and the 'Return-Path'.  For example, all of the following will
> +result in "Foo Blah" (and "Bar Baz").  One level of single quotes is
> +stripped too, as it is often seen.
>  
>  =over 4
>  
>  =item example@foo (Foo Blah)
>  
> -=item example@foo (Foo Blah), example@bar
> +=item example@foo (Foo Blah), "Bar Baz" <example@bar>
>  
>  =item display: example@foo (Foo Blah), example@bar ;
>  
> @@ -2086,22 +2096,27 @@ single quotes is stripped too, as it is
>  
>  =back
>  
> -Appending a modifier C<:host> to a header field name will return the first
> -hostname-looking string that ends with a valid TLD. First it tries to find a
> -match after @ character (possible email), then from any part of the header.
> -Normal use of this would be for example 'From:addr:host' to return the
> -hostname portion of a From-address.
> -
> -Appending a modifier C<:domain> to a header field name implies C<:host>,
> -but will return only domain part of the hostname, as returned by
> -RegistryBoundaries::trim_domain.
> -
> -Appending a modifier C<:ip> to a header field name, will return the first
> -IPv4 or IPv6 address string found. Could be used for example as
> -'X-Originating-IP:ip'.
> -
> -Appending a modifier C<:revip> to a header field name implies C<:ip>,
> -but will return the found IP in reverse (usually for DNSBL usage).
> +Appending C<:host> to the header name will return the first hostname-looking
> +string that ends with a valid TLD.  First it tries to find a match after @
> +character (possible email), then from any part of the header.  Normal use of
> +this would be for example 'From:addr:host' to return the hostname portion of
> +a From-address.
> +
> +Appending C<:domain> to the header name implies C<:host>, but will return
> +only domain part of the hostname, as returned by
> +RegistryBoundaries::trim_domain().
> +
> +Appending C<:ip> to the header name, will return the first IPv4 or IPv6
> +address string found.  Could be used for example as 'X-Originating-IP:ip'.
> +
> +Appending C<:revip> to the header name implies C<:ip>, but will return the
> +found IP in reverse (usually for DNSBL usage).
> +
> +Appending C<:first> modifier to the header name will return only the first
> +(topmost) header, in case there are multiple ones.  Similarly C<:last> will
> +select the last one.  These affect only the physical header line selection. 
> +If selected header is parsed further with C<:addr> or similar, it may return
> +multiple results, if the selected header contains multiple addresses.
>  
>  There are several special pseudo-headers that can be specified:
>  
> @@ -2143,6 +2158,12 @@ the message has passed through
>  =item C<X-Spam-Relays-Trusted> is the generated metadata of trusted relays
>  the message has passed through
>  
> +=item C<X-Spam-Relays-External> is the generated metadata of external relays
> +the message has passed through
> +
> +=item C<X-Spam-Relays-Internal> is the generated metadata of internal relays
> +the message has passed through
> +
>  =back
>  
>  =cut
> @@ -2151,98 +2172,106 @@ the message has passed through
>  sub _get {
>    my ($self, $request) = @_;
>  
> -  my $result;
> +  my @results;
>    my $getaddr = 0;
>    my $getname = 0;
>    my $getraw = 0;
> +  my $needraw = 0;
>    my $gethost = 0;
>    my $getdomain = 0;
>    my $getip = 0;
>    my $getrevip = 0;
> +  my $getfirst = 0;
> +  my $getlast = 0;
>  
>    # special queries - process and strip modifiers
>    if (index($request,':') >= 0) {  # triage
>      local $1;
>      while ($request =~ s/:([^:]*)//) {
>        if    ($1 eq 'raw')    { $getraw  = 1 }
> -      elsif ($1 eq 'addr')   { $getaddr = $getraw = 1 }
> -      elsif ($1 eq 'name')   { $getname = 1 }
> +      elsif ($1 eq 'addr')   { $getaddr = $needraw = 1 }
> +      elsif ($1 eq 'name')   { $getname = $needraw = 1 }
>        elsif ($1 eq 'host')   { $gethost = 1 }
>        elsif ($1 eq 'domain') { $gethost = $getdomain = 1 }
>        elsif ($1 eq 'ip')     { $getip = 1 }
>        elsif ($1 eq 'revip')  { $getip = $getrevip = 1 }
> +      elsif ($1 eq 'first')  { $getfirst = 1 }
> +      elsif ($1 eq 'last')   { $getlast = 1 }
>      }
>    }
>    my $request_lc = lc $request;
>  
>    # ALL: entire pristine or semi-raw headers
>    if ($request eq 'ALL') {
> -    return ($getraw ? $self->{msg}->get_pristine_header()
> -                    : $self->{msg}->get_all_headers(0));
> +    if ($getraw) {
> +      @results = $self->{msg}->get_pristine_header() =~ /^([^ \t].*?\n)(?![ 
> \t])/smgi;
> +    } else {
> +      @results = $self->{msg}->get_all_headers(0);
> +    }
> +    return \@results;
>    }
>    # ALL-TRUSTED: entire trusted raw headers
>    elsif ($request eq 'ALL-TRUSTED') {
>      # '+1' since we added the received header even though it's not considered
>      # trusted, so we know that those headers can be trusted too
> -    return $self->get_all_hdrs_in_rcvd_index_range(
> +    @results = $self->get_all_hdrs_in_rcvd_index_range(
>                       undef, $self->{last_trusted_relay_index}+1,
>                       undef, undef, $getraw);
> +    return \@results;
>    }
>    # ALL-INTERNAL: entire internal raw headers
>    elsif ($request eq 'ALL-INTERNAL') {
>      # '+1' for the same reason as in ALL-TRUSTED above
> -    return $self->get_all_hdrs_in_rcvd_index_range(
> +    @results = $self->get_all_hdrs_in_rcvd_index_range(
>                       undef, $self->{last_internal_relay_index}+1,
>                       undef, undef, $getraw);
> +    return \@results;
>    }
>    # ALL-UNTRUSTED: entire untrusted raw headers
>    elsif ($request eq 'ALL-UNTRUSTED') {
>      # '+1' for the same reason as in ALL-TRUSTED above
> -    return $self->get_all_hdrs_in_rcvd_index_range(
> +    @results = $self->get_all_hdrs_in_rcvd_index_range(
>                       $self->{last_trusted_relay_index}+1, undef,
>                       undef, undef, $getraw);
> +    return \@results;
>    }
>    # ALL-EXTERNAL: entire external raw headers
>    elsif ($request eq 'ALL-EXTERNAL') {
>      # '+1' for the same reason as in ALL-TRUSTED above
> -    return $self->get_all_hdrs_in_rcvd_index_range(
> +    @results = $self->get_all_hdrs_in_rcvd_index_range(
>                       $self->{last_internal_relay_index}+1, undef,
>                       undef, undef, $getraw);
> +    return \@results;
>    }
>    # EnvelopeFrom: the SMTP MAIL FROM: address
>    elsif ($request_lc eq "\LEnvelopeFrom") {
> -    $result = $self->get_envelope_from();
> +    push @results, $self->get_envelope_from();
>    }
>    # untrusted relays list, as string
>    elsif ($request_lc eq "\LX-Spam-Relays-Untrusted") {
> -    $result = $self->{relays_untrusted_str};
> +    push @results, $self->{relays_untrusted_str};
>    }
>    # trusted relays list, as string
>    elsif ($request_lc eq "\LX-Spam-Relays-Trusted") {
> -    $result = $self->{relays_trusted_str};
> +    push @results, $self->{relays_trusted_str};
>    }
>    # external relays list, as string
>    elsif ($request_lc eq "\LX-Spam-Relays-External") {
> -    $result = $self->{relays_external_str};
> +    push @results, $self->{relays_external_str};
>    }
>    # internal relays list, as string
>    elsif ($request_lc eq "\LX-Spam-Relays-Internal") {
> -    $result = $self->{relays_internal_str};
> +    push @results, $self->{relays_internal_str};
>    }
>    # ToCc: the combined recipients list
>    elsif ($request_lc eq "\LToCc") {
> -    $result = join("\n", $self->{msg}->get_header('To', $getraw));
> -    if ($result ne '') {
> -      chomp $result;
> -      $result .= ", " if $result =~ /\S/;
> -    }
> -    $result .= join("\n", $self->{msg}->get_header('Cc', $getraw));
> -    $result = undef if $result eq '';
> +    push @results, $self->{msg}->get_header('To', $getraw);
> +    push @results, $self->{msg}->get_header('Cc', $getraw);
>    }
>    # MESSAGEID: handle lists which move the real message-id to another
>    # header for resending.
>    elsif ($request eq 'MESSAGEID') {
> -    $result = join("\n", grep { defined($_) && $_ ne '' }
> +    push @results, grep { defined($_) && $_ ne '' } (
>                  $self->{msg}->get_header('X-Message-Id', $getraw),
>                  $self->{msg}->get_header('Resent-Message-Id', $getraw),
>                  $self->{msg}->get_header('X-Original-Message-ID', $getraw),
> @@ -2250,115 +2279,126 @@ sub _get {
>    }
>    # a conventional header
>    else {
> -    my @results = $getraw ? $self->{msg}->raw_header($request)
> -                          : $self->{msg}->get_header($request);
> -  # dbg("message: get(%s)%s = %s",
> -  #     $request, $getraw?'raw':'', join(", ",@results));
> -    if (@results) {
> -      $result = join('', @results);
> -    } else {  # metadata
> -      $result = $self->{msg}->get_metadata($request);
> -    }
> -  }
> -
> -  # special queries
> -  if (defined $result && ($getaddr || $getname)) {
> -    local $1;
> -    $result =~ s/^[^:]+:(.*);\s*$/$1/gs;     # 'undisclosed-recipients: ;'
> -    $result =~ s/\s+/ /g;                    # reduce whitespace
> -    $result =~ s/^\s+//;                     # leading whitespace
> -    $result =~ s/\s+$//;                     # trailing whitespace
> -
> -    if ($getaddr) {
> -      # Get the email address out of the header
> -      # All of these should result in "jm@foo":
> -      # jm@foo
> -      # jm@foo (Foo Blah)
> -      # jm@foo, jm@bar
> -      # display: jm@foo (Foo Blah), jm@bar ;
> -      # Foo Blah <jm@foo>
> -      # "Foo Blah" <jm@foo>
> -      # "'Foo Blah'" <jm@foo>
> -      #
> -      # strip out the (comments)
> -      $result =~ s/\s*\(.*?\)//g;
> -      # strip out the "quoted text", unless it's the only thing in the string
> -      if ($result !~ /^".*"$/) {
> -        $result =~ s/(?<!<)"[^"]*"(?!\@)//g;   #" emacs
> -      }
> -      # Foo Blah <jm@xxx> or <jm@xxx>
> -      local $1;
> -      $result =~ s/^[^"<]*?<(.*?)>.*$/$1/;
> -      # multiple addresses on one line? remove all but first
> -      $result =~ s/,.*$//;
> -    }
> -    elsif ($getname) {
> -      # Get the display name out of the header
> -      # All of these should result in "Foo Blah":
> -      #
> -      # jm@foo (Foo Blah)
> -      # (Foo Blah) jm@foo
> -      # jm@foo (Foo Blah), jm@bar
> -      # display: jm@foo (Foo Blah), jm@bar ;
> -      # Foo Blah <jm@foo>
> -      # "Foo Blah" <jm@foo>
> -      # "'Foo Blah'" <jm@foo>
> -      #
> -      local $1;
> -      # does not handle mailbox-list or address-list or quotes well, to be 
> improved
> -      if ($result =~ /^ \s* " (.*?) (?<!\\)" \s* < [^<>]* >/sx ||
> -          $result =~ /^ \s* (.*?) \s* < [^<>]* >/sx) {
> -        $result = $1;  # display-name, RFC 5322
> -        # name-addr    = [display-name] angle-addr
> -        # display-name = phrase
> -        # phrase       = 1*word / obs-phrase
> -        # word         = atom / quoted-string
> -        # obs-phrase   = word *(word / "." / CFWS)
> -        $result =~ s{ " ( (?: [^"\\] | \\. )* ) " }
> -                { my $s=$1; $s=~s{\\(.)}{$1}gs; $s }gsxe;
> -        $result =~ s/\\"/"/gs;
> -      } elsif ($result =~ /^ [^(,]*? \( (.*?) \) /sx) {  # legacy form
> -        # nested comments are not handled, to be improved
> -        $result = $1;
> -      } else {  # no display name
> -        $result = '';
> +    my @res = $getraw||$needraw ? $self->{msg}->raw_header($request)
> +                                : $self->{msg}->get_header($request);
> +    if (!@res) {
> +      if (defined(my $m = $self->{msg}->get_metadata($request))) {
> +        push @res, $m;
> +      }
> +    }
> +    push @results, @res if @res;
> +  }
> +
> +  # Nothing found to process further, bail out quick
> +  if (!@results) {
> +    return \@results;
> +  }
> +
> +  # Continue processing only first (topmost) or last header
> +  if ($getfirst) {
> +    @results = ($results[0]);
> +  } elsif ($getlast) {
> +    @results = ($results[-1]);
> +  }
> +
> +  # special addr/name
> +  if ($getaddr || $getname) {
> +    my @res;
> +    foreach my $line (@results) {
> +      next unless defined $line;
> +      # Note: parse_header_addresses always called with raw undecoded value
> +      # Skip invalid addresses here
> +      my @addrs = parse_header_addresses($line);
> +      if (@addrs) {
> +        if ($getaddr) {
> +          foreach my $addr (@addrs) {
> +            push @res, $addr->{address} if defined $addr->{address};
> +          }
> +        }
> +        elsif ($getname) {
> +          foreach my $addr (@addrs) {
> +            next unless defined $addr->{phrase};
> +            if ($getraw) {
> +              # phrase=name, could also be username or comment unless name 
> found
> +              push @res, $addr->{phrase};
> +            } else {
> +              # If :raw was not specifically asked, decode mimewords
> +              # TODO: silly call to Node module, should probably be in Util
> +              my $decoded = 
> Mail::SpamAssassin::Message::Node::_decode_header(
> +                              $addr->{phrase}, "PMS:get:$request");
> +              # Normalize whitespace, unless it's all white-space
> +              if ($decoded =~ /\S/) {
> +                $decoded =~ s/\s+/ /gs;
> +                $decoded =~ s/^\s+//;
> +                $decoded =~ s/\s+$//;
> +                $decoded =~ s/^'(.*?)'$/$1/; # remove single quotes
> +              }
> +              push @res, $decoded if defined $decoded;
> +            }
> +          }
> +        }
>        }
> -      $result =~ s/^ \s* ' \s* (.*?) \s* ' \s* \z/$1/sx;
>      }
> +    @results = @res;
>    }
>  
>    # special host/domain
> -  if (defined $result && ($gethost || $getdomain || $getip)) {
> -    my $host;
> +  if (@results && ($gethost || $getdomain || $getip)) {
> +    my @res;
>      if ($gethost) {
> +      # TODO: IDN matching needs honing
>        my $tldsRE = $self->{main}->{registryboundaries}->{valid_tlds_re};
> -      my $hostRE = 
> qr/(?<![._-])\b([a-z\d][a-z\d._-]{0,251}\.${tldsRE})\b(?![._-])/i;
> -      # try grabbing email/msgid domain first, because user part might look 
> like
> -      # a valid host..
> -      if ($result =~ /.*\@${hostRE}/i && is_fqdn_valid($1)) {
> -        $host = $1;
> -      } else {
> -        # otherwise try hard to find a valid host
> -        while ($result =~ /${hostRE}/ig) {
> -          if (is_fqdn_valid($1)) {
> +      #my $hostRE = 
> qr/(?<![._-])\b([a-z\d][a-z\d._-]{0,251}\.${tldsRE})\b(?![._-])/i;
> +      my $hostRE = qr/(?<![._-])(\S{1,251}\.${tldsRE})(?![._-])/i;
> +      foreach my $line (@results) {
> +        next unless defined $line;
> +        my $host;
> +        if ($getaddr) {
> +          # If :addr already preparsed the line, just grab domain liberally
> +          if ($line =~ /.*\@(\S+)/) {
>              $host = $1;
> -            last;
>            }
>          }
> -      }
> -      if ($host && $getdomain) {
> -        $host = $self->{main}->{registryboundaries}->trim_domain($host, 1);
> +        else {
> +          # try grabbing email/msgid domain first, because user part might 
> look like
> +          # a valid host..
> +          if ($line =~ /.*\@${hostRE}/i) {
> +            if (is_fqdn_valid(idn_to_ascii($1), 1)) {
> +              $host = $1;
> +            }
> +          }
> +          # otherwise try hard to find a valid host
> +          if (!$host) {
> +            while ($line =~ /${hostRE}/ig) {
> +              if (is_fqdn_valid(idn_to_ascii($1), 1)) {
> +                $host = $1;
> +                last;
> +              }
> +            }
> +          }
> +        }
> +        if ($host) {
> +          if ($getdomain) {
> +            $host = $self->{main}->{registryboundaries}->trim_domain($host, 
> 1);
> +          }
> +          push @res, $host;
> +        }
>        }
>      } else {
>        my $ipRE = qr/(?<!\.)\b(${IP_ADDRESS})\b(?!\.)/;
> -      if ($result =~ $ipRE) {
> -        $host = $getrevip ? reverse_ip_address($1) : $1;
> +      foreach my $line (@results) {
> +        next unless defined $line;
> +        my $host;
> +        if ($line =~ $ipRE) {
> +          $host = $getrevip ? reverse_ip_address($1) : $1;
> +        }
> +        push @res, $host  if defined $host;
>        }
>      }
> -    $result = $host;
> +    @results = @res;
>    }
>  
> -  return $result;
> +  return \@results;
>  }
>  
>  # optimized for speed
> @@ -2367,7 +2407,7 @@ sub _get {
>  # $_[2] is defval
>  sub get {
>    my $cache = $_[0]->{get_cache};
> -  my $found;
> +  my $found = [];
>    if (exists $cache->{$_[1]}) {
>      # return cache entry if it is known
>      # (measured hit/attempts rate on a production mailer is about 47%)
> @@ -2375,13 +2415,34 @@ sub get {
>    } else {
>      # fill in a cache entry
>      $found = _get(@_);
> +    # filter out undefined
> +    @$found = grep { defined } @$found;
>      $cache->{$_[1]} = $found;
>    }
>    # if the requested header wasn't found, we should return a default value
>    # as specified by the caller: if defval argument is present it represents
>    # a default value even if undef; if defval argument is absent a default
>    # value is an empty string for upwards compatibility
> -  return (defined $found ? $found : @_ > 2 ? $_[2] : '');
> +  if (@$found) {
> +    # new list context usage in 4.0, return all values always
> +    if (wantarray) {
> +      return @$found;
> +    }
> +    # legacy scalar context expected only single return value for some
> +    # queries, without a newline
> +    if ($_[1] =~ /:(?:addr|name|host|domain|ip|revip)\b/ ||
> +        $_[1] eq 'EnvelopeFrom') {
> +      my $res = $found->[0];
> +      $res =~ s/\n\z$//;
> +      return $res;
> +    } else {
> +      return join('', @$found);
> +    }
> +  } elsif (@_ > 2) {
> +    return wantarray ? ($_[2]) : $_[2];
> +  } else {
> +    return wantarray ? () : '';
> +  }
>  }
>  
>  ###########################################################################
> @@ -2698,15 +2759,16 @@ sub _process_dkim_uri_list {
>  
>    # Look for the domain in DK/DKIM headers
>    if ($self->{conf}->{parse_dkim_uris}) {
> -    my $dk = join(" ", grep {defined} ( 
> $self->get('DomainKey-Signature',undef ),
> -                                        $self->get('DKIM-Signature',undef) 
> ));
> -    while ($dk =~ /\bd\s*=\s*([^;]+)/g) {
> -      my $d = $1;
> -      $d =~ s/\s+//g;
> -      # prefix with domainkeys: so it doesn't merge with identical keys
> -      $self->add_uri_detail_list("domainkeys:$d",
> -        {'domainkeys'=>1, 'nocanon'=>1, 'noclean'=>1},
> -        'domainkeys', 1);
> +    foreach my $dk ( $self->get('DomainKey-Signature'),
> +                     $self->get('DKIM-Signature') ) {
> +      while ($dk =~ /\bd\s*=\s*([^;]+)/g) {
> +        my $d = $1;
> +        $d =~ s/\s+//g;
> +        # prefix with domainkeys: so it doesn't merge with identical keys
> +        $self->add_uri_detail_list("domainkeys:$d",
> +          {'domainkeys'=>1, 'nocanon'=>1, 'noclean'=>1},
> +          'domainkeys', 1);
> +      }
>      }
>    }
>  }
> @@ -3123,8 +3185,8 @@ sub get_envelope_from {
>    # Assume that because they have configured it, their MTA will always add 
> it.
>    # This will prevent us falling through and picking up inappropriate 
> headers.
>    if (defined $self->{conf}->{envelope_sender_header}) {
> -    # make sure we get the most recent copy - there can be only one 
> EnvelopeSender.
> -    $envf = 
> $self->get($self->{conf}->{envelope_sender_header}.":addr",undef);
> +    # get the most recent (topmost) copy - there can be only one 
> EnvelopeSender.
> +    $envf = 
> ($self->get($self->{conf}->{envelope_sender_header}.":first:addr"))[0];
>      # ok if it contains an "@" sign, or is "" (ie. "<>" without the < and >)
>      if (defined $envf && (index($envf, '@') > 0 || $envf eq '')) {
>        dbg("message: using envelope_sender_header '%s' as EnvelopeFrom: '%s'",
> @@ -3177,17 +3239,19 @@ sub get_envelope_from {
>    # lines, we cannot trust any Envelope-From headers, since they're likely to
>    # be incorrect fetchmail guesses.
>  
> -  if (index($self->get("X-Sender"), '@') != -1) {
> -    my $rcvd = join(' ', $self->get("Received"));
> -    if (index($rcvd, '(fetchmail') != -1) {
> -      dbg("message: X-Sender and fetchmail signatures found, cannot trust 
> envelope-from");
> -      $self->{envelopefrom} = undef;
> -      return;
> +  my $x_sender = ($self->get("X-Sender:first:addr"))[0];
> +  if (defined $x_sender && index($x_sender, '@') != -1) {
> +    foreach ($self->get("Received")) {
> +      if (index($_, '(fetchmail') != -1) {
> +        dbg("message: X-Sender and fetchmail signatures found, cannot trust 
> envelope-from");
> +        $self->{envelopefrom} = undef;
> +        return;
> +      }
>      }
>    }
>  
>    # procmailrc notes this (we now recommend adding it to Received instead)
> -  if (defined($envf = $self->get("X-Envelope-From:addr",undef))) {
> +  if (defined($envf = ($self->get("X-Envelope-From:first:addr"))[0])) {
>      # heuristic: this could have been relayed via a list which then used
>      # a *new* Envelope-from.  check
>      if ($self->get("ALL") =~ /^Received:.*?^X-Envelope-From:/smi) {
> @@ -3202,7 +3266,7 @@ sub get_envelope_from {
>    }
>  
>    # qmail, new-inject(1)
> -  if (defined($envf = $self->get("Envelope-Sender:addr",undef))) {
> +  if (defined($envf = ($self->get("Envelope-Sender:first:addr"))[0])) {
>      # heuristic: this could have been relayed via a list which then used
>      # a *new* Envelope-from.  check
>      if ($self->get("ALL") =~ /^Received:.*?^Envelope-Sender:/smi) {
> @@ -3221,7 +3285,7 @@ sub get_envelope_from {
>    #   data.  This use of return-path is required; mail systems MUST support
>    #   it.  The return-path line preserves the information in the <reverse-
>    #   path> from the MAIL command.
> -  if (defined($envf = $self->get("Return-Path:addr",undef))) {
> +  if (defined($envf = ($self->get("Return-Path:first:addr"))[0])) {
>      # heuristic: this could have been relayed via a list which then used
>      # a *new* Envelope-from.  check
>      if ($self->get("ALL") =~ /^Received:.*?^Return-Path:/smi) {
> @@ -3261,7 +3325,7 @@ sub get_all_hdrs_in_rcvd_index_range {
>    $include_end_rcvd = 1 unless defined $include_end_rcvd;
>  
>    my $cur_rcvd_index = -1;  # none found yet
> -  my $result = '';
> +  my @results;
>  
>    my @hdrs;
>    if ($getraw) {
> @@ -3280,14 +3344,20 @@ sub get_all_hdrs_in_rcvd_index_range {
>      }
>      if ((!defined $start_rcvd || $start_rcvd <= $cur_rcvd_index) &&
>       (!defined $end_rcvd || $cur_rcvd_index < $end_rcvd)) {
> -      $result .= $hdr;
> +      push @results, $hdr;
>      }
>      elsif (defined $end_rcvd && $cur_rcvd_index == $end_rcvd) {
> -      $result .= $hdr;
> +      push @results, $hdr;
>        last;
>      }
>    }
> -  return ($result eq '' ? undef : $result);
> +
> +  if (wantarray) {
> +    return @results;
> +  } else {
> +    my $result = join('', @results);
> +    return ($result eq '' ? undef : $result);
> +  }
>  }
>  
>  ###########################################################################
> @@ -3377,9 +3447,9 @@ sub all_from_addrs {
>    my @addrs;
>  
>    # Resent- headers take priority, if present. see bug 672
> -  my $resent = $self->get('Resent-From',undef);
> -  if (defined $resent && $resent =~ /\S/) {
> -    @addrs = $self->{main}->find_all_addrs_in_line ($resent);
> +  my @resent = $self->get('Resent-From:first:addr');
> +  if (@resent) {
> +    @addrs = @resent;
>    }
>    else {
>      # bug 2292: Used to use find_all_addrs_in_line() with the same
> @@ -3387,17 +3457,18 @@ sub all_from_addrs {
>      # FNs for things like welcomelist_from (previously whitelist_from).  
>      # Since all of these are From
>      # headers, there should only be 1 address in each anyway (not exactly
> -    # true, RFC 2822 allows multiple addresses in a From header field),
> -    # so use the :addr code...
> +    # true, RFC 2822 allows multiple addresses in a From header field)
> +    # *** since 4.0 all addresses are returned from Header correctly ***
>      # bug 3366: some addresses come in as 'foo@bar...', which is invalid.
>      # so deal with the multiple periods.
> +    # TODO: 4.0 need :first:addr here ? Why check so many headers ?
>      ## no critic
>      @addrs = map { tr/././s; $_ } grep { $_ ne '' }
> -        ($self->get('From:addr'),            # std
> -         $self->get('Envelope-Sender:addr'), # qmail: new-inject(1)
> -         $self->get('Resent-Sender:addr'),   # procmailrc manpage
> -         $self->get('X-Envelope-From:addr'), # procmailrc manpage
> -         $self->get('EnvelopeFrom:addr'));   # SMTP envelope
> +      ($self->get('From:addr'),            # std
> +       $self->get('Envelope-Sender:addr'), # qmail: new-inject(1)
> +       $self->get('Resent-Sender:addr'),   # procmailrc manpage
> +       $self->get('X-Envelope-From:addr'), # procmailrc manpage
> +       $self->get('EnvelopeFrom:addr'));   # SMTP envelope
>      # http://www.cs.tut.fi/~jkorpela/headers.html is useful here
>    }
>  
> @@ -3455,47 +3526,52 @@ sub all_to_addrs {
>    my @addrs;
>  
>    # Resent- headers take priority, if present. see bug 672
> -  my $resent = join('', $self->get('Resent-To'), $self->get('Resent-Cc'));
> -  if ($resent =~ /\S/) {
> -    @addrs = $self->{main}->find_all_addrs_in_line($resent);
> +  my @resent = ( $self->get('Resent-To:first:addr'),
> +                 $self->get('Resent-Cc:first:addr') );
> +  if (@resent) {
> +    @addrs = @resent;
>    } else {
>      # OK, a fetchmail trick: try to find the recipient address from
>      # the most recent 3 Received lines.  This is required for sendmail,
>      # since it does not add a helpful header like exim, qmail
>      # or Postfix do.
>      #
> -    my $rcvd = $self->get('Received');
> -    $rcvd =~ s/\n[ \t]+/ /gs;
> -    $rcvd =~ s/\n+/\n/gs;
> -
> -    my @rcvdlines = split(/\n/, $rcvd, 4); pop @rcvdlines; # forget last one
> +    my @rcvd = ($self->get('Received'))[0 .. 2];
>      my @rcvdaddrs;
> -    foreach my $line (@rcvdlines) {
> -      if ($line =~ / for (\S+\@\S+);/) { push (@rcvdaddrs, $1); }
> +    foreach my $line (@rcvd) {
> +      next unless defined $line;
> +      if ($line =~ / for <?(\S+\@(\S+?))>?;/) {
> +        if (is_fqdn_valid(idn_to_ascii($2), 1)) {
> +          push @rcvdaddrs, $1;
> +        }
> +      }
>      }
>  
> -    @addrs = $self->{main}->find_all_addrs_in_line (
> -       join('',
> -      join(" ", @rcvdaddrs)."\n",
> -         $self->get('To'),                   # std
> -      $self->get('Apparently-To'),           # sendmail, from envelope
> -      $self->get('Delivered-To'),            # Postfix, poss qmail
> -      $self->get('Envelope-Recipients'),     # qmail: new-inject(1)
> -      $self->get('Apparently-Resent-To'),    # procmailrc manpage
> -      $self->get('X-Envelope-To'),           # procmailrc manpage
> -      $self->get('Envelope-To'),             # exim
> -      $self->get('X-Delivered-To'),          # procmail quick start
> -      $self->get('X-Original-To'),           # procmail quick start
> -      $self->get('X-Rcpt-To'),               # procmail quick start
> -      $self->get('X-Real-To'),               # procmail quick start
> -      $self->get('Cc')));                    # std
> +    # TODO: 4.0 use :first:addr ? Why so many headers ?
> +    @addrs = (
> +      @rcvdaddrs,
> +      $self->get('To:addr'),                   # std
> +      $self->get('Apparently-To:addr'),        # sendmail, from envelope
> +      $self->get('Delivered-To:addr'),         # Postfix, poss qmail
> +      $self->get('Envelope-Recipients:addr'),  # qmail: new-inject(1)
> +      $self->get('Apparently-Resent-To:addr'), # procmailrc manpage
> +      $self->get('X-Envelope-To:addr'),        # procmailrc manpage
> +      $self->get('Envelope-To:addr'),          # exim
> +      $self->get('X-Delivered-To:addr'),       # procmail quick start
> +      $self->get('X-Original-To:addr'),        # procmail quick start
> +      $self->get('X-Rcpt-To:addr'),            # procmail quick start
> +      $self->get('X-Real-To:addr'),            # procmail quick start
> +      $self->get('Cc:addr'));                  # std
>      # those are taken from various sources; thanks to Nancy McGough, who
>      # noted some in <http://www.ii.com/internet/robots/procmail/qs/#envelope>
>    }
>  
> -  dbg("eval: all '*To' addrs: " . join(" ", @addrs));
> -  $self->{all_to_addrs} = \@addrs;
> -  return @addrs;
> +  my %seen;
> +  my @result = grep { !$seen{$_}++ } @addrs;
> +
> +  dbg("eval: all '*To' addrs: " . join(" ", @result));
> +  $self->{all_to_addrs} = \@result;
> +  return @result;
>  
>  # http://www.cs.tut.fi/~jkorpela/headers.html is useful here, also
>  # http://www.exim.org/pipermail/exim-users/Week-of-Mon-20001009/021672.html
> 
> Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Bayes.pm
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Bayes.pm?rev=1889337&r1=1889336&r2=1889337&view=diff
> ==============================================================================
> --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Bayes.pm (original)
> +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Bayes.pm Fri Apr 30 
> 18:17:51 2021
> @@ -1561,10 +1561,12 @@ sub _pre_chew_addr_header {
>    my ($self, $val) = @_;
>    local ($_);
>  
> -  my @addrs = $self->{main}->find_all_addrs_in_line ($val);
> +  my @addrs = Mail::SpamAssassin::Util::parse_header_addresses($val);
>    my @toks;
> -  foreach (@addrs) {
> -    push (@toks, $self->_tokenize_mail_addrs ($_));
> +  foreach my $addr (@addrs) {
> +    if (defined $addr->{address}) {
> +      push @toks, $self->_tokenize_mail_addrs($addr->{address});
> +    }
>    }
>    return join (' ', @toks);
>  }
> 
> Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FreeMail.pm
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FreeMail.pm?rev=1889337&r1=1889336&r2=1889337&view=diff
> ==============================================================================
> --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FreeMail.pm (original)
> +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FreeMail.pm Fri Apr 30 
> 18:17:51 2021
> @@ -464,13 +464,13 @@ sub check_freemail_header {
>          $re = $rec;
>      }
>  
> -    my @emails = map (lc, $pms->{main}->find_all_addrs_in_line 
> ($pms->get($header)));
> +    my @emails = map (lc, $pms->get("$header:addr"));
>  
>      if (!scalar (@emails)) {
>           dbg("header $header not found from mail");
>           return 0;
>      }
> -    dbg("addresses from header $header: ".join(';',@emails));
> +    dbg("addresses from header $header: ".join(', ', @emails));
>  
>      foreach my $email (@emails) {    
>          if ($self->_is_freemail($email, $pms)) {
> @@ -592,24 +592,33 @@ sub check_freemail_replyto {
>  
>      # Skip mailing-list etc looking requests, mostly FPs from them
>      if ($pms->{main}->{conf}->{freemail_skip_bulk_envfrom}) {
> -        my $envfrom = lc($pms->get("EnvelopeFrom"));
> -        if ($envfrom =~ $skip_replyto_envfrom) {
> +        my $envfrom = ($pms->get("EnvelopeFrom"))[0];
> +        if (defined $envfrom && $envfrom =~ $skip_replyto_envfrom) {
>              dbg("envelope sender looks bulk, skipping check: $envfrom");
>              return 0;
>          }
>      }
>  
> -    my $from = lc($pms->get("From:addr"));
> -    my $replyto = lc($pms->get("Reply-To:addr"));
> -    my $from_is_fm = $self->_is_freemail($from, $pms);
> -    my $replyto_is_fm = $self->_is_freemail($replyto, $pms);
> +    my @from_addrs = map (lc, $pms->get("From:addr"));
> +    dbg("From address: ".join(", ", @from_addrs)) if @from_addrs;
>  
> -    dbg("From address: $from") if $from ne '';
> -    dbg("Reply-To address: $replyto") if $replyto ne '';
> +    my @replyto_addrs = map (lc, $pms->get("Reply-To:addr"));
> +    dbg("Reply-To address: ".join(", ", @replyto_addrs)) if @replyto_addrs;
>  
> -    if ($from_is_fm and $replyto_is_fm and ($from ne $replyto)) {
> +    my $from_is_fm = grep { $self->_is_freemail($_, $pms) } @from_addrs;
> +    my $replyto_is_fm = grep { $self->_is_freemail($_, $pms) } 
> @replyto_addrs;
> +
> +    my $from_not_in_replyto = 1;
> +    foreach my $from (@from_addrs) {
> +        next unless grep { $_ eq $from } @replyto_addrs;
> +        $from_not_in_replyto = 0;
> +    }
> +
> +    if ($from_is_fm and $replyto_is_fm and $from_not_in_replyto) {
>          dbg("HIT! From and Reply-To are different freemails");
> -        $self->_got_hit($pms, "$from, $replyto", "From and Reply-To are 
> different freemails");
> +        my $from = join(",", @from_addrs);
> +        my $replyto = join(",", @replyto_addrs);
> +        $self->_got_hit($pms, "$from -> $replyto", "From and Reply-To are 
> different freemails");
>          return 0;
>      }
>  
> @@ -620,7 +629,7 @@ sub check_freemail_replyto {
>          }
>      }
>      elsif ($what eq 'reply') {
> -        if ($replyto ne '' and !$replyto_is_fm) {
> +        if (@replyto_addrs and !$replyto_is_fm) {
>              dbg("Reply-To defined and is not freemail, skipping check");
>              return 0;
>          }
> @@ -629,19 +638,21 @@ sub check_freemail_replyto {
>              return 0;
>          }
>      }
> -    my $reply = $replyto_is_fm ? $replyto : $from;
>  
>      return 0 unless $self->_parse_body($pms);
> -    
> +
>      # Compare body to headers
>      if (scalar keys %{$pms->{freemail_cache}{body}}) {
> -        my $check = $what eq 'replyto' ? $replyto : $reply;
> -        dbg("comparing $check to body freemails");
> -        foreach my $email (keys %{$pms->{freemail_cache}{body}}) {
> -            if ($email ne $check) {
> -                dbg("HIT! $check and $email are different freemails");
> -                $self->_got_hit($pms, "$check, $email", "Different freemails 
> in reply header and body");
> -                return 0;
> +        my $reply_addrs = $what eq 'replyto' ? \@replyto_addrs :
> +                              $replyto_is_fm ? \@replyto_addrs : 
> \@from_addrs;
> +        dbg("comparing to body freemails: ".join(", ", @$reply_addrs));
> +        foreach my $body_email (keys %{$pms->{freemail_cache}{body}}) {
> +            foreach my $reply_email (@$reply_addrs) {
> +                if ($body_email ne $reply_email) {
> +                    dbg("HIT! $reply_email (Reply) and $body_email (Body) 
> are different freemails");
> +                    $self->_got_hit($pms, "$reply_email, $body_email", 
> "Different freemails in reply header and body");
> +                    return 0;
> +                }
>              }
>          }
>      }
> 
> Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/HeaderEval.pm?rev=1889337&r1=1889336&r2=1889337&view=diff
> ==============================================================================
> --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/HeaderEval.pm (original)
> +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/HeaderEval.pm Fri Apr 30 
> 18:17:51 2021
> @@ -89,7 +89,7 @@ sub check_for_fake_aol_relay_in_rcvd {
>    local ($_);
>  
>    $_ = $pms->get('Received');
> -  s/\s/ /gs;
> +  s/\s+/ /gs;
>  
>    # this is the hostname format used by AOL for their relays. Spammers love 
>    # forging it.  Don't make it more specific to match aol.com only, though --
> @@ -125,16 +125,13 @@ sub check_for_faraway_charset_in_headers
>    return 0 if grep { $_ eq "all" } @locales;
>  
>    for my $h (qw(From Subject)) {
> -    my @hdrs = $pms->get("$h:raw");  # ??? get() returns a scalar ???
> -    if ($#hdrs >= 0) {
> -      $hdr = join(" ", @hdrs);
> -    } else {
> -      $hdr = '';
> -    }
> -    while ($hdr =~ /=\?(.+?)\?.\?.*?\?=/g) {
> -      Mail::SpamAssassin::Locales::is_charset_ok_for_locales($1, @locales)
> -       or return 1;
> -    }
> +    my @hdrs = $pms->get("$h:raw");
> +    foreach my $hdr (@hdrs) {
> +      while ($hdr =~ /=\?(.+?)\?.\?.*?\?=/g) {
> +        Mail::SpamAssassin::Locales::is_charset_ok_for_locales($1, @locales)
> +          or return 1;
> +      }
> +    }     
>    }
>    0;
>  }
> @@ -145,35 +142,35 @@ sub check_for_unique_subject_id {
>    $_ = lc $pms->get('Subject');
>  
>    my $id = 0;
> -  if (/[-_\.\s]{7,}([-a-z0-9]{4,})$/
> -     || /\s{10,}(?:\S\s)?(\S+)$/
> -     || /\s{3,}[-:\#\(\[]+([-a-z0-9]{4,})[\]\)]+$/
> -     || /\s{3,}[:\#\(\[]*([a-f0-9]{4,})[\]\)]*$/
> -     || /\s{3,}[-:\#]([a-z0-9]{5,})$/
> -     || /[\s._]{3,}([^0\s._]\d{3,})$/
> -     || /[\s._]{3,}\[(\S+)\]$/
> +  if (/[-_\.\s]{7,}([-a-z0-9]{4,})$/m
> +     || /\s{10,}(?:\S\s)?(\S+)$/m
> +     || /\s{3,}[-:\#\(\[]+([-a-z0-9]{4,})[\]\)]+$/m
> +     || /\s{3,}[:\#\(\[]*([a-f0-9]{4,})[\]\)]*$/m
> +     || /\s{3,}[-:\#]([a-z0-9]{5,})$/m
> +     || /[\s._]{3,}([^0\s._]\d{3,})$/m
> +     || /[\s._]{3,}\[(\S+)\]$/m
>  
>          # (7217vPhZ0-478TLdy5829qicU9-0@26) and similar
> -        || /\(([-\w]{7,}\@\d+)\)$/
> +        || /\(([-\w]{7,}\@\d+)\)$/m
>  
>          # Seven or more digits at the end of a subject is almost certainly a 
> id
> -        || /\b(\d{7,})\s*$/
> +        || /\b(\d{7,})\s*$/m
>  
>          # stuff at end of line after "!" or "?" is usually an id
> -        || /[!\?]\s*(\d{4,}|\w+(-\w+)+)\s*$/
> +        || /[!\?]\s*(\d{4,}|\w+(-\w+)+)\s*$/m
>  
>          # 9095IPZK7-095wsvp8715rJgY8-286-28 and similar
>       # excluding 'Re:', etc and the first word
> -        || /(?:\w{2,3}:\s)?\w+\s+(\w{7,}-\w{7,}(-\w+)*)\s*$/
> +        || /(?:\w{2,3}:\s)?\w+\s+(\w{7,}-\w{7,}(-\w+)*)\s*$/m
>  
>          # #30D7 and similar
> -        || /\s#\s*([a-f0-9]{4,})\s*$/
> +        || /\s#\s*([a-f0-9]{4,})\s*$/m
>       )
>    {
>      $id = $1;
>      # exempt online purchases
>      if ($id =~ /\d{5,}/
> -     && /(?:item|invoice|order|number|confirmation).{1,6}\Q$id\E\s*$/)
> +     && /(?:item|invoice|order|number|confirmation).{1,6}\Q$id\E\s*$/m)
>      {
>        $id = 0;
>      }
> @@ -270,7 +267,7 @@ sub check_illegal_chars {
>  
>    $header .= ":raw" unless $header =~ /:raw$/;
>    my $str = $pms->get($header);
> -  return 0 if !defined $str || $str eq '';
> +  return 0 if !defined $str || $str !~ /\S/;
>  
>    if ($str =~ tr/\x00-\x7F//c && is_valid_utf_8($str)) {
>      # is non-ASCII and is valid UTF-8
> @@ -304,12 +301,12 @@ sub gated_through_received_hdr_remover {
>    my ($self, $pms) = @_;
>  
>    my $txt = $pms->get("Mailing-List",undef);
> -  if (defined $txt && $txt =~ /^contact \S+\@\S+\; run by ezmlm$/) {
> +  if (defined $txt && $txt =~ /^contact \S+\@\S+\; run by ezmlm$/m) {
>      my $dlto = $pms->get("Delivered-To");
>      my $rcvd = $pms->get("Received");
>  
>      # ensure we have other indicative headers too
> -    if ($dlto =~ /^mailing list \S+\@\S+/ &&
> +    if ($dlto =~ /^mailing list \S+\@\S+/m &&
>          $rcvd =~ /qmail \d+ invoked (?:from network|by .{3,20})\); \d+ ... 
> \d+/)
>      {
>        return 1;
> @@ -647,10 +644,9 @@ sub _check_recipients {
>    my @inputs;
>  
>    # ToCc: pseudo-header works best, but sometimes Bcc: is better
> -  for ('ToCc', 'Bcc') {
> -    my $to = $pms->get($_);  # get recipients
> -    $to =~ s/\(.*?\)//g;     # strip out the (comments)
> -    push(@inputs, ($to =~ m/([\w.=-]+\@\w+(?:[\w.-]+\.)+\w+)/g));
> +  for ('ToCc:addr', 'Bcc:addr') {
> +    my @to = $pms->get($_);  # get recipients
> +    push @inputs, @to;
>      last if scalar(@inputs) >= TOCC_SIMILAR_COUNT;
>    }
>  
> 
> Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/SPF.pm
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/SPF.pm?rev=1889337&r1=1889336&r2=1889337&view=diff
> ==============================================================================
> --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/SPF.pm (original)
> +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/SPF.pm Fri Apr 30 
> 18:17:51 2021
> @@ -381,7 +381,7 @@ sub _check_spf {
>      $scanner->{checked_for_received_spf_header} = 1;
>      dbg("spf: checking to see if the message has a Received-SPF header that 
> we can use");
>  
> -    my @internal_hdrs = split("\n", $scanner->get('ALL-INTERNAL'));
> +    my @internal_hdrs = $scanner->get('ALL-INTERNAL');
>      unless ($scanner->{conf}->{use_newest_received_spf_header}) {
>        # look for the LAST (earliest in time) header, it'll be the most 
> accurate
>        @internal_hdrs = reverse(@internal_hdrs);
> @@ -728,7 +728,7 @@ sub _get_sender {
>        # from the Return-Path, X-Envelope-From, or whatever header.
>        # it's better to get it from Received though, as that is updated
>        # hop-by-hop.
> -      my $sender = $scanner->get("EnvelopeFrom:addr");
> +      my $sender = ($scanner->get("EnvelopeFrom:addr"))[0];
>        if (defined $sender) {
>          dbg("spf: found EnvelopeFrom '$sender' from header");
>          $scanner->{spf_sender} = lc $sender;
> 
> Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Util.pm
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Util.pm?rev=1889337&r1=1889336&r2=1889337&view=diff
> ==============================================================================
> --- spamassassin/trunk/lib/Mail/SpamAssassin/Util.pm (original)
> +++ spamassassin/trunk/lib/Mail/SpamAssassin/Util.pm Fri Apr 30 18:17:51 2021
> @@ -49,6 +49,7 @@ require 5.008001;  # needs utf8::is_utf8
>  
>  use Mail::SpamAssassin::Logger;
>  
> +use version 0.77;
>  use Exporter ();
>  
>  our @ISA = qw(Exporter);
> @@ -60,7 +61,7 @@ our @EXPORT_OK = qw(&local_tz &base64_de
>                    &secure_tmpdir &uri_list_canonicalize &get_my_locales
>                    &parse_rfc822_date &idn_to_ascii &is_valid_utf_8
>                    &get_user_groups &compile_regexp &qr_to_string
> -                  &is_fqdn_valid);
> +                  &is_fqdn_valid &parse_header_addresses);
>  
>  our $AM_TAINTED;
>  
> @@ -2334,6 +2335,330 @@ sub get_tag_value_for_score {
>  
>  ###########################################################################
>  
> +# RFC 5322 (+IDN?) parsing of addresses and names from To/From/Cc.. headers
> +#
> +# Return array of hashes, containing at minimum name,address,user,host
> +#
> +# Override parser with SA_HEADER_ADDRESS_PARSER environment variable
> +
> +our $header_address_parser;
> +our $email_address_xs;
> +our $email_address_xs_fix_address;
> +BEGIN {
> +  # SA_HEADER_ADDRESS_PARSER=1 only use internal parser
> +  # SA_HEADER_ADDRESS_PARSER=2 only use Email::Address::XS
> +  # By default internal is preferred, will defer for some cases
> +  $header_address_parser = untaint_var($ENV{'SA_HEADER_ADDRESS_PARSER'});
> +  if ((!defined $header_address_parser || $header_address_parser eq '2') &&
> +       eval 'use Email::Address::XS; 1;') {
> +    $email_address_xs = 1;
> +    if (version->parse(Email::Address::XS->VERSION) < version->parse(1.02)) {
> +      $email_address_xs_fix_address = 1;
> +    }
> +  }
> +}
> +
> +# Helper for internal parser
> +our $header_address_mailre = qr/
> +  # user
> +  (?:
> +    # quoted localpart
> +    " (?:|(?:[^"\\]++|\\.)*+) " |
> +    # or un-quoted localpart
> +    [^\@\s\<\>\(\)\[\]\,\:\;]+
> +  )
> +  # domain
> +  \@ (?: [^\"\s\<\>\(\)\[\]\,\:\;]+ | \[ [\d:.]+ \] )
> +/ix;
> +
> +# Very relaxed internal parser
> +# Only handles non-nested comments in some places
> +our $header_address_re = qr/^
> +  \s*
> +  (?:
> +    # optional phrase, quoted or non-quoted
> +    (?:
> +      ( (?: " (?:|(?:[^"\\]++|\\.)*+) " | [^",;<]++ )+ )
> +      \s*
> +    )?
> +    # and enclosed email (or empty)
> +    # ... allow whitespace in localpart
> +    < \s* ( [^>\@]* \S+ | \s* ) \s* >
> +    # some output duplicate enclosures..
> +    (?: \s* < \s* (?: (?: " (?:|(?:[^"\\]++|\\.)*+) " )? \S+ | \s* ) \s* > )*
> +  |
> +    # or standalone email or phrase
> +    (?:
> +      ( $header_address_mailre ) |
> +      ( (?: " (?:|(?:[^"\\]++|\\.)*+) " | [^",;<]++ )+ )
> +    )
> +  )
> +  # possible comment after (no nested support here)
> +  (?: \s* \( ( (?:|(?:[^()\\]++|\\.)*+) ) \) )?
> +  # Followed by comma (semi-colon sometimes) or finish
> +  \s* (?: [,;] | \z )
> +/ix;
> +
> +#
> +# Main public function
> +# expected input is header contents without Header: itself
> +#
> +sub parse_header_addresses {
> +  my ($str) = @_;
> +
> +  return if !defined $str || $str !~ /\S/;
> +
> +  my @results;
> +
> +  # Internal parser
> +  if (!$header_address_parser || $header_address_parser eq '1') {
> +    @results = _parse_header_addresses($str);
> +  }
> +
> +  # Email::Address::XS
> +  if ($email_address_xs) {
> +    if (!$header_address_parser || $header_address_parser eq '2') {
> +      # Only consulted if no internal results, or there doesn't
> +      # seem to have enough results, or possible nested comments ( (
> +      my $maybe_nested = scalar($str =~ /\(/) >= 2;
> +      if (!@results || $maybe_nested || @results < scalar($str =~ tr/,//)+1) 
> {
> +        my @results_xs = _parse_header_addresses_xs($str);
> +        # If we have more results than internal, use it, or nested
> +        if (@results_xs > @results || $maybe_nested) {
> +          return @results_xs;
> +        }
> +      }
> +    }
> +  }
> +
> +  return @results;
> +}
> +
> +# Check some basic parsing mistakes
> +sub _valid_parsed_address {
> +  return 0 if !defined $_[0];
> +  return 0 if index($_[0], '""@') == 0;
> +  return 0 if scalar($_[0] =~ tr/"//) == 1;
> +  return 1;
> +}
> +
> +#
> +# v0.1, improved internal parser, no support for comments in strange
> +# places or nested comments, but handled a large corpus atleast 99% the
> +# same as Email::Address::XS and in some cases even better (retains some
> +# more name/addr info, even when not fully valid).
> +#
> +sub _parse_header_addresses {
> +  local $_ = shift;
> +  local ($1, $2, $3, $4, $5);
> +
> +  # Clear trailing whitespace
> +  s/\s+\z//s;
> +
> +  # Strip away all escaped blackslashes, simplifies processing a lot
> +  s/\\\\//g;
> +
> +  # Reduce group address
> +  s/^[^"()<>]+:\s*(.*?)\s*(?:;.*)?/$1/gs;
> +
> +  # Skip empty
> +  return unless /\S/;
> +
> +  my @results;
> +  while (s/$header_address_re//igs) {
> +    my $phrase = defined $1 ? $1 :
> +                 defined $4 ? $4 : undef;
> +    my $address = defined $2 ? $2 :
> +                defined $3 ? $3 : undef;
> +    my $comment = defined $5 ? $5 : undef;
> +
> +    my ($user, $host, $invalid);
> +
> +    # Check relaxed <> capture
> +    if (defined $2) {
> +      # Remove comments (no nested support here)
> +      $address =~ s/\((?:|(?:[^()\\]++|\\.)*+)\)//gs;
> +      # Validate as somewhat email looking
> +      if ($address !~ /^$header_address_mailre$/) {
> +        $address = undef;
> +      }
> +    }
> +
> +    # Validate some other address oddities
> +    if (!_valid_parsed_address($address)) {
> +      $address = undef;
> +    }
> +
> +    if (defined $phrase) {
> +      my $newphrase;
> +      # Parse phrase as quoted and unquoted parts
> +      while ($phrase =~ /(?:"(|(?:[^"\\]++|\\.)*+)"|([^"]++))/igs) {
> +        my $qs = $1;
> +        my $nqs = $2;
> +        if (defined $qs) {
> +          # Unescape things inside quoted string
> +          $qs =~ s/\\(?!\\)//g;
> +          $qs =~ s/\\\\/\\/g;
> +          #$qs =~ s/\\//g;
> +          $newphrase .= $qs;
> +        } else {
> +          # Remove comments (no nested support here)
> +          $nqs =~ s/\((?:|(?:[^()\\]++|\\.)*+)\)//gs;
> +          $newphrase .= $nqs;
> +        }
> +      }
> +      $phrase = $newphrase;
> +
> +      # If we only have phrase which looks email, swap when valid
> +      # Check all in one if, either swap or don't
> +      if (!defined $address &&
> +          $phrase =~ /^$header_address_mailre$/i &&
> +          _valid_parsed_address($phrase) &&
> +          $phrase =~ /^[^\@]*\@([^\@]*)/ &&
> +          is_fqdn_valid(idn_to_ascii($1), 1)) {
> +        $address = $phrase;
> +        $phrase = undef;
> +      } else {
> +        # Remove redundant phrase==email?
> +        if (defined $address && $phrase eq $address) {
> +          $phrase = undef;
> +        } elsif ($phrase eq '') {
> +          $phrase = undef;
> +        }
> +      }
> +    }
> +
> +    # Copy comment to phrase if not defined
> +    if (!defined $phrase && defined $comment) {
> +      $phrase = $comment;
> +    }
> +
> +    if (defined $address) {
> +      # Unescape quoted localpart
> +      #if ($address =~ /^"(.*?)"\@(.*)/) {
> +      #  $user = $1;
> +      #  $host = $2;
> +      #  $user =~ s/\\//g;
> +      #  $user =~ s/\s+//gs;
> +      #  $address = "$user\@$host";
> +      #}
> +      # Strip sometimes seen quotes
> +      #$address =~ s/^'(.*?)'$/$1/;
> +      $address =~ s/^(([^\@]*)\@([^\@]*)).*/$1/;
> +      ($user, $host) = ($2, $3);
> +    }
> +
> +    $invalid = !defined $host || !is_fqdn_valid(idn_to_ascii($host), 1);
> +    push @results, {
> +      'phrase' => $phrase,
> +      'user' => $user,
> +      'host' => $host,
> +      'address' => $address,
> +      'comment' => $comment,
> +      'invalid' => $invalid
> +    };
> +  }
> +
> +  # Was something left unparsed?
> +  if (index($_, '@') != -1) {
> +    # Last ditch effort, examples:
> +    # =?UTF-8?Q?"Foobar"_<[email protected]>?=
> +    # =?utf-8?Q?"Foobar"?=<[email protected]>
> +    while (/<($header_address_mailre)>/igs) {
> +      my $address = $1;
> +      next if !_valid_parsed_address($address);
> +      $address =~ s/^(([^\@]*)\@([^\@]*)).*/$1/;
> +      my ($user, $host) = ($2, $3);
> +      my $invalid = !is_fqdn_valid(idn_to_ascii($host), 1);
> +      push @results, {
> +        'phrase' => undef,
> +        'user' => $user,
> +        'host' => $host,
> +        'address' => $address,
> +        'comment' => undef,
> +        'invalid' => $invalid
> +      };
> +    }
> +  }
> +
> +  return if !@results;
> +  return @results;
> +}
> +
> +sub _parse_header_addresses_xs {
> +  my ($str) = @_;
> +
> +  # Strip away all escaped blackslashes, simplifies processing a lot
> +  $str =~ s/\\\\//g;
> +
> +  my @results;
> +  my @addrs = Email::Address::XS->parse($str);
> +
> +  local ($1, $2);
> +  foreach my $addr (@addrs) {
> +    my $name = $addr->name;
> +    my $address = $addr->address;
> +    my $user = $addr->user;
> +    my $host = $addr->host;
> +    my $phrase = $addr->phrase;
> +    my $comment = $addr->comment;
> +    my $invalid;
> +
> +    # Workaround Bug 5201 for Email::Address::XS
> +    # From: "[email protected]"
> +    # If everything else is missing but phrase looks like
> +    # an email, let's assume it is (hostname verifies)
> +    if (!defined $address && !defined $user &&
> +        !defined $comment && defined $phrase &&
> +        _valid_parsed_address($phrase) &&
> +        $phrase =~ /^([^\s\@]+)\@([^\s\@]+)$/ &&
> +        is_fqdn_valid(idn_to_ascii($2), 1))
> +    {
> +      $user = $1;
> +      $host = $2;
> +      $address = $phrase;
> +      $name = $user;
> +      $invalid = 0;
> +      $phrase = undef;
> +    }
> +    else {
> +      $invalid = !$addr->is_valid;
> +    }
> +
> +    # Version <1.02 borks address if both user+host are UTF-8
> +    if ($email_address_xs_fix_address) {
> +      if (defined $user && defined $host) {
> +        # <"Another User"@foo> loses quotes in user, add back
> +        if (index($user, ' ') != -1 &&
> +            index($user, '"') == -1) {
> +          $user = '"'.$user.'"';
> +        }
> +        $address = $user.'@'.$host;
> +      }
> +    }
> +
> +    # Copy comment to phrase if not defined
> +    if (!defined $phrase && defined $comment) {
> +      $phrase = $comment;
> +    }
> +
> +    # Use input as name if nothing found
> +    if (!defined $phrase && !defined $address) {
> +      $phrase = $str;
> +    }
> +
> +    push @results, {
> +      'phrase' => $phrase,
> +      'user' => $user,
> +      'host' => $host,
> +      'address' => $address,
> +      'comment' => $comment,
> +      'invalid' => $invalid
> +    };
> +  }
> +
> +  return @results;
> +}
>  
>  1;
>  
> 
> Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Util/DependencyInfo.pm
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Util/DependencyInfo.pm?rev=1889337&r1=1889336&r2=1889337&view=diff
> ==============================================================================
> --- spamassassin/trunk/lib/Mail/SpamAssassin/Util/DependencyInfo.pm (original)
> +++ spamassassin/trunk/lib/Mail/SpamAssassin/Util/DependencyInfo.pm Fri Apr 
> 30 18:17:51 2021
> @@ -302,6 +302,13 @@ our @OPTIONAL_MODULES = (
>    desc => 'IO::String emulates file interface for in-core strings.
>    It is used by the optional OLEVBMacro Plugin.',
>  },
> +{
> +  module => 'Email::Address::XS',
> +  version => 0,
> +  desc => 'Email::Address::XS is used to parse email addresses from header
> +  fields like To/From/cc, per RFC 5322. If installed, it may additionally
> +  be used by internal parser to process complex lists.',
> +},
>  );
>  
>  our @BINARIES = ();
> 
> Modified: spamassassin/trunk/t/SATest.pm
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/t/SATest.pm?rev=1889337&r1=1889336&r2=1889337&view=diff
> ==============================================================================
> --- spamassassin/trunk/t/SATest.pm (original)
> +++ spamassassin/trunk/t/SATest.pm Fri Apr 30 18:17:51 2021
> @@ -68,6 +68,7 @@ BEGIN {
>    # Fix INC to point to built SA
>    if (-e 't/test_dir') { unshift(@INC, 'blib/lib'); }
>    elsif (-e 'test_dir') { unshift(@INC, '../blib/lib'); }
> +  else { die "FATAL: not in or below test directory?\n"; }
>  }
>  
>  # Set up for testing. Exports (as global vars):
> 
> Modified: spamassassin/trunk/t/data/Dumpheaders.pm
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/t/data/Dumpheaders.pm?rev=1889337&r1=1889336&r2=1889337&view=diff
> ==============================================================================
> --- spamassassin/trunk/t/data/Dumpheaders.pm (original)
> +++ spamassassin/trunk/t/data/Dumpheaders.pm Fri Apr 30 18:17:51 2021
> @@ -16,29 +16,81 @@ sub check_end {
>    my ($self, $opts) = @_;
>  
>    local $_;
> -  $_ = $opts->{permsgstatus}->get("ALL:raw");
> -  s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs;
>  
>    # ignore the M:SpamAssassin:compile() test message
> -  return if /I need to make this message body somewhat long so TextCat 
> preloads/;
> -  print STDOUT "text-all-raw: $_\n";
> +  return if $self->{linting};
> +  #return if /I need to make this message body somewhat long so TextCat 
> preloads/;
> +
> +  ## pre-4.0 scalar context calls
> +
> +  $_ = $opts->{permsgstatus}->get("ALL:raw");
> +  s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs;
> +  print STDOUT "scalar-text-all-raw: $_"."[END]\n";
>  
>    $_ = $opts->{permsgstatus}->get("ALL");
>    s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs;
> -  print STDOUT "text-all-noraw: $_\n";
> +  print STDOUT "scalar-text-all-noraw: $_"."[END]\n";
>  
>    $_ = $opts->{permsgstatus}->get("From:raw");
>    s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs;
> -  print STDOUT "text-from-raw: $_\n";
> +  print STDOUT "scalar-text-from-raw: $_"."[END]\n";
>  
>    $_ = $opts->{permsgstatus}->get("From");
>    s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs;
> -  print STDOUT "text-from-noraw: $_\n";
> +  print STDOUT "scalar-text-from-noraw: $_"."[END]\n";
>  
>    $_ = $opts->{permsgstatus}->get("From:addr");
>    s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs;
> -  print STDOUT "text-from-addr: $_\n";
> +  print STDOUT "scalar-text-from-addr: $_"."[END]\n";
> +
> +  ## 4.0 list context tests
> +
> +  my @l;
> +  my $s;
> +
> +  @l = $opts->{permsgstatus}->get("ALL:raw");
> +  foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; }
> +  print STDOUT "list-text-all-raw: ".join("[LIST]", @l)."[END]\n";
> +
> +  @l = $opts->{permsgstatus}->get("ALL");
> +  foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; }
> +  print STDOUT "list-text-all-noraw: ".join("[LIST]", @l)."[END]\n";
> +
> +  @l = $opts->{permsgstatus}->get("From:raw");
> +  foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; }
> +  print STDOUT "list-text-from-raw: ".join("[LIST]", @l)."[END]\n";
> +
> +  @l = $opts->{permsgstatus}->get("From");
> +  foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; }
> +  print STDOUT "list-text-from-noraw: ".join("[LIST]", @l)."[END]\n";
> +
> +  @l = $opts->{permsgstatus}->get("From:addr");
> +  foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; }
> +  print STDOUT "list-text-from-addr: ".join("[LIST]", @l)."[END]\n";
> +
> +  @l = $opts->{permsgstatus}->get("From:first:addr");
> +  foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; }
> +  print STDOUT "list-text-from-first-addr: ".join("[LIST]", @l)."[END]\n";
> +
> +  @l = $opts->{permsgstatus}->get("From:last:addr");
> +  foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; }
> +  print STDOUT "list-text-from-last-addr: ".join("[LIST]", @l)."[END]\n";
> +
> +  @l = $opts->{permsgstatus}->get("MESSAGEID:host");
> +  foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; }
> +  print STDOUT "list-text-msgid-host: ".join("[LIST]", @l)."[END]\n";
> +
> +  @l = $opts->{permsgstatus}->get("MESSAGEID:domain");
> +  foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; }
> +  print STDOUT "list-text-msgid-domain: ".join("[LIST]", @l)."[END]\n";
> +
> +  @l = $opts->{permsgstatus}->get("Received:ip");
> +  foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; }
> +  print STDOUT "list-text-received-ip: ".join("[LIST]", @l)."[END]\n";
>  
> +  @l = $opts->{permsgstatus}->get("Received:revip");
> +  foreach (@l) { s/\n/[\\n]/gs; s/\t/[\\t]/gs; s/\n+//gs; }
> +  print STDOUT "list-text-received-revip: ".join("[LIST]", @l)."[END]\n";
>  }
>  
>  1;
> 
> Modified: spamassassin/trunk/t/data/nice/unicode1
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/t/data/nice/unicode1?rev=1889337&r1=1889336&r2=1889337&view=diff
> ==============================================================================
> --- spamassassin/trunk/t/data/nice/unicode1 (original)
> +++ spamassassin/trunk/t/data/nice/unicode1 Fri Apr 30 18:17:51 2021
> @@ -6,7 +6,7 @@ Received: from mail-ig0-x248.esempio-uni
>    by Sörensen.example.com (Postfix) with UTF8SMTPS
>    for <Dörte@Sörensen.example.com>; Thu,  8 Oct 2015 07:45:14 +0200 (CEST)
>  From: =?ISO-8859-1?Q?Maril=F9?= Gioffré ♥ 
> <Marilù.Gioffré@esempio-università.it>
> -To: =?iso-8859-1*sv?Q?D=F6rte_=C5._S=F6rensen,_Jr.?=
> +To: =?iso-8859-1*sv?Q?D=F6rte_=C5._S=F6rensen=2C_Jr.?=
>    <Dörte@Sörensen.example.com>
>  Cc: Î??σερ@εχαÎ??πλε.ψοÎ??
>  Subject: =?iso-8859-2*sl?Q?Doma=e8e?=
> 
> Added: spamassassin/trunk/t/data/spam/freemail1
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/t/data/spam/freemail1?rev=1889337&view=auto
> ==============================================================================
> --- spamassassin/trunk/t/data/spam/freemail1 (added)
> +++ spamassassin/trunk/t/data/spam/freemail1 Fri Apr 30 18:17:51 2021
> @@ -0,0 +1,15 @@
> +Return-Path: <[email protected]>
> +Received: from google-public-dns-a.google.com 
> (google-public-dns-a.google.com [8.8.8.8])
> +     by in.example.com (Postfix) with ESMTPS
> +     for <[email protected]>; Wed, 18 Jul 2018 21:12:22 +0200 (CEST)
> +Received: by google-public-dns-a.google.com with SMTP id f21-v6so3811271wmc.5
> +        for <[email protected]>; Wed, 18 Jul 2018 12:12:22 -0700 (PDT)
> +From: <[email protected]>
> +To: [email protected]
> +Reply-To: "Spammer" <[email protected]>
> +Subject: Freemail test
> +Date: Wed, 18 Jul 2018 12:12:00 -0700 (PDT)
> +MIME-Version: 1.0
> +Message-Id: <[email protected]>
> +
> +Freemail test
> 
> Added: spamassassin/trunk/t/data/spam/freemail2
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/t/data/spam/freemail2?rev=1889337&view=auto
> ==============================================================================
> --- spamassassin/trunk/t/data/spam/freemail2 (added)
> +++ spamassassin/trunk/t/data/spam/freemail2 Fri Apr 30 18:17:51 2021
> @@ -0,0 +1,15 @@
> +Return-Path: <[email protected]>
> +Received: from google-public-dns-a.google.com 
> (google-public-dns-a.google.com [8.8.8.8])
> +     by in.example.com (Postfix) with ESMTPS
> +     for <[email protected]>; Wed, 18 Jul 2018 21:12:22 +0200 (CEST)
> +Received: by google-public-dns-a.google.com with SMTP id f21-v6so3811271wmc.5
> +        for <[email protected]>; Wed, 18 Jul 2018 12:12:22 -0700 (PDT)
> +From: <[email protected]>
> +To: [email protected]
> +Reply-To: [email protected], "Spammer" <[email protected]>
> +Subject: Freemail test
> +Date: Wed, 18 Jul 2018 12:12:00 -0700 (PDT)
> +MIME-Version: 1.0
> +Message-Id: <[email protected]>
> +
> +Freemail test with multiple Reply-To's
> 
> Added: spamassassin/trunk/t/data/spam/freemail3
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/t/data/spam/freemail3?rev=1889337&view=auto
> ==============================================================================
> --- spamassassin/trunk/t/data/spam/freemail3 (added)
> +++ spamassassin/trunk/t/data/spam/freemail3 Fri Apr 30 18:17:51 2021
> @@ -0,0 +1,15 @@
> +Return-Path: <[email protected]>
> +Received: from google-public-dns-a.google.com 
> (google-public-dns-a.google.com [8.8.8.8])
> +     by in.example.com (Postfix) with ESMTPS
> +     for <[email protected]>; Wed, 18 Jul 2018 21:12:22 +0200 (CEST)
> +Received: by google-public-dns-a.google.com with SMTP id f21-v6so3811271wmc.5
> +        for <[email protected]>; Wed, 18 Jul 2018 12:12:22 -0700 (PDT)
> +From: <[email protected]>
> +To: [email protected]
> +Subject: Freemail test
> +Date: Wed, 18 Jul 2018 12:12:00 -0700 (PDT)
> +MIME-Version: 1.0
> +Message-Id: <[email protected]>
> +
> +Freemail test with body email
> [email protected]
> 
> Modified: spamassassin/trunk/t/freemail.t
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/t/freemail.t?rev=1889337&r1=1889336&r2=1889337&view=diff
> ==============================================================================
> --- spamassassin/trunk/t/freemail.t (original)
> +++ spamassassin/trunk/t/freemail.t Fri Apr 30 18:17:51 2021
> @@ -5,19 +5,46 @@ use SATest; sa_t_init("freemail");
>  
>  use Test::More;
>  
> -plan tests => 4;
> +plan tests => 23;
>  
>  # ---------------------------------------------------------------------------
>  
> +# Global
>  tstprefs ("
>    freemail_domains gmail.com
> +");
> +
> +## Standard + whitelist should not hit
> +
> +tstlocalrules (q{
>    freemail_import_whitelist_auth 0
> -  whitelist_auth test\@gmail.com
> +  whitelist_auth [email protected]
>    header FREEMAIL_FROM eval:check_freemail_from()
> -");
> +  score FREEMAIL_FROM 3.3
> +  header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply')
> +  score FREEMAIL_REPLYXX 3.3
> +  header FREEMAIL_REPLYTO eval:check_freemail_replyto('replyto')
> +  score FREEMAIL_REPLYTO 3.3
> +  header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply')
> +  score FREEMAIL_REPLYXX 3.3
> +  header FREEMAIL_ENVFROM_END_DIGIT  
> eval:check_freemail_header('EnvelopeFrom', '\d@')
> +  score FREEMAIL_ENVFROM_END_DIGIT 3.3
> +  header FREEMAIL_REPLYTO_END_DIGIT  eval:check_freemail_header('Reply-To', 
> '\d@')
> +  score FREEMAIL_REPLYTO_END_DIGIT 3.3
> +  header FREEMAIL_HDR_REPLYTO eval:check_freemail_header('Reply-To')
> +  score FREEMAIL_HDR_REPLYTO 3.3
> +});
>  
>  %patterns = (
> -  q{ FREEMAIL_FROM }, 'FREEMAIL_FROM',
> +  q{ 3.3 FREEMAIL_FROM }, 'FREEMAIL_FROM',
> +);
> +%anti_patterns = (
> +  # No Reply-To or body
> +  q{ 3.3 FREEMAIL_REPLYTO }, 'FREEMAIL_REPLYTO',
> +  q{ 3.3 FREEMAIL_REPLYXX }, 'FREEMAIL_REPLYXX',
> +  q{ 3.3 FREEMAIL_ENVFROM_END_DIGIT }, 'FREEMAIL_ENVFROM_END_DIGIT',
> +  q{ 3.3 FREEMAIL_REPLYTO_END_DIGIT }, 'FREEMAIL_REPLYTO_END_DIGIT',
> +  q{ 3.3 FREEMAIL_HDR_REPLYTO }, 'FREEMAIL_HDR_REPLYTO',
>  );
>  
>  ok sarun ("-L -t < data/spam/relayUS.eml", \&patterns_run_cb);
> @@ -28,16 +55,85 @@ clear_pattern_counters();
>  
>  %patterns = ();
>  %anti_patterns = (
> -  q{ FREEMAIL_FROM }, 'FREEMAIL_FROM',
> +  q{ 3.3 FREEMAIL_FROM }, 'FREEMAIL_FROM',
>  );
>  
> -tstprefs ("
> -  freemail_domains gmail.com
> +tstlocalrules (q{
>    freemail_import_whitelist_auth 1
> -  whitelist_auth test\@gmail.com
> +  whitelist_auth [email protected]
>    header FREEMAIL_FROM eval:check_freemail_from()
> -");
> +  score FREEMAIL_FROM 3.3
> +});
>  
>  ok sarun ("-L -t < data/spam/relayUS.eml", \&patterns_run_cb);
>  ok_all_patterns();
>  
> +## From and Reply-To different
> +
> +%patterns = (
> +  q{ 3.3 FREEMAIL_FROM }, 'FREEMAIL_FROM',
> +  q{ 3.3 FREEMAIL_REPLYTO }, 'FREEMAIL_REPLYTO',
> +  q{ 3.3 FREEMAIL_REPLYXX }, 'FREEMAIL_REPLYXX',
> +  q{ 3.3 FREEMAIL_ENVFROM_END_DIGIT }, 'FREEMAIL_ENVFROM_END_DIGIT',
> +  q{ 3.3 FREEMAIL_REPLYTO_END_DIGIT }, 'FREEMAIL_REPLYTO_END_DIGIT',
> +  q{ 3.3 FREEMAIL_HDR_REPLYTO }, 'FREEMAIL_HDR_REPLYTO',
> +);
> +%anti_patterns = ();
> +
> +tstlocalrules (q{
> +  header FREEMAIL_FROM eval:check_freemail_from()
> +  score FREEMAIL_FROM 3.3
> +  header FREEMAIL_REPLYTO eval:check_freemail_replyto('replyto')
> +  score FREEMAIL_REPLYTO 3.3
> +  header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply')
> +  score FREEMAIL_REPLYXX 3.3
> +  header FREEMAIL_ENVFROM_END_DIGIT  
> eval:check_freemail_header('EnvelopeFrom', '\d@')
> +  score FREEMAIL_ENVFROM_END_DIGIT 3.3
> +  header FREEMAIL_REPLYTO_END_DIGIT  eval:check_freemail_header('Reply-To', 
> '\d@')
> +  score FREEMAIL_REPLYTO_END_DIGIT 3.3
> +  header FREEMAIL_HDR_REPLYTO eval:check_freemail_header('Reply-To')
> +  score FREEMAIL_HDR_REPLYTO 3.3
> +});
> +
> +ok sarun ("-L -t < data/spam/freemail1", \&patterns_run_cb);
> +ok_all_patterns();
> +
> +## Multiple Reply-To values, no email on body
> +
> +%patterns = (
> +  q{ 3.3 FREEMAIL_REPLYTO }, 'FREEMAIL_REPLYTO',
> +  q{ 3.3 FREEMAIL_REPLYXX }, 'FREEMAIL_REPLYXX',
> +  q{ 3.3 FREEMAIL_REPLYTO_END_DIGIT }, 'FREEMAIL_REPLYTO_END_DIGIT',
> +  q{ 3.3 FREEMAIL_HDR_REPLYTO }, 'FREEMAIL_HDR_REPLYTO',
> +);
> +%anti_patterns = ();
> +
> +tstlocalrules (q{
> +  header FREEMAIL_REPLYTO eval:check_freemail_replyto('replyto')
> +  score FREEMAIL_REPLYTO 3.3
> +  header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply')
> +  score FREEMAIL_REPLYXX 3.3
> +  header FREEMAIL_REPLYTO_END_DIGIT  eval:check_freemail_header('Reply-To', 
> '\d@')
> +  score FREEMAIL_REPLYTO_END_DIGIT 3.3
> +  header FREEMAIL_HDR_REPLYTO eval:check_freemail_header('Reply-To')
> +  score FREEMAIL_HDR_REPLYTO 3.3
> +});
> +
> +ok sarun ("-L -t < data/spam/freemail2", \&patterns_run_cb);
> +ok_all_patterns();
> +
> +## No Reply-To, another freemail in body
> +
> +%patterns = (
> +  q{ 3.3 FREEMAIL_REPLYXX }, 'FREEMAIL_REPLYXX',
> +);
> +%anti_patterns = ();
> +
> +tstlocalrules (q{
> +  header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply')
> +  score FREEMAIL_REPLYXX 3.3
> +});
> +
> +ok sarun ("-L -t < data/spam/freemail3", \&patterns_run_cb);
> +ok_all_patterns();
> +
> 
> Modified: spamassassin/trunk/t/freemail_welcome_block.t
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/t/freemail_welcome_block.t?rev=1889337&r1=1889336&r2=1889337&view=diff
> ==============================================================================
> --- spamassassin/trunk/t/freemail_welcome_block.t (original)
> +++ spamassassin/trunk/t/freemail_welcome_block.t Fri Apr 30 18:17:51 2021
> @@ -1,23 +1,50 @@
>  #!/usr/bin/perl -T
>  
>  use lib '.'; use lib 't';
> -use SATest; sa_t_init("freemail_welcome_block");
> +use SATest; sa_t_init("freemail");
>  
>  use Test::More;
>  
> -plan tests => 4;
> +plan tests => 23;
>  
>  # ---------------------------------------------------------------------------
>  
> +# Global
>  tstprefs ("
>    freemail_domains gmail.com
> +");
> +
> +## Standard + welcomelist should not hit
> +
> +tstlocalrules (q{
>    freemail_import_welcomelist_auth 0
> -  welcomelist_auth test\@gmail.com
> +  welcomelist_auth [email protected]
>    header FREEMAIL_FROM eval:check_freemail_from()
> -");
> +  score FREEMAIL_FROM 3.3
> +  header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply')
> +  score FREEMAIL_REPLYXX 3.3
> +  header FREEMAIL_REPLYTO eval:check_freemail_replyto('replyto')
> +  score FREEMAIL_REPLYTO 3.3
> +  header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply')
> +  score FREEMAIL_REPLYXX 3.3
> +  header FREEMAIL_ENVFROM_END_DIGIT  
> eval:check_freemail_header('EnvelopeFrom', '\d@')
> +  score FREEMAIL_ENVFROM_END_DIGIT 3.3
> +  header FREEMAIL_REPLYTO_END_DIGIT  eval:check_freemail_header('Reply-To', 
> '\d@')
> +  score FREEMAIL_REPLYTO_END_DIGIT 3.3
> +  header FREEMAIL_HDR_REPLYTO eval:check_freemail_header('Reply-To')
> +  score FREEMAIL_HDR_REPLYTO 3.3
> +});
>  
>  %patterns = (
> -  q{ FREEMAIL_FROM }, 'FREEMAIL_FROM',
> +  q{ 3.3 FREEMAIL_FROM }, 'FREEMAIL_FROM',
> +);
> +%anti_patterns = (
> +  # No Reply-To or body
> +  q{ 3.3 FREEMAIL_REPLYTO }, 'FREEMAIL_REPLYTO',
> +  q{ 3.3 FREEMAIL_REPLYXX }, 'FREEMAIL_REPLYXX',
> +  q{ 3.3 FREEMAIL_ENVFROM_END_DIGIT }, 'FREEMAIL_ENVFROM_END_DIGIT',
> +  q{ 3.3 FREEMAIL_REPLYTO_END_DIGIT }, 'FREEMAIL_REPLYTO_END_DIGIT',
> +  q{ 3.3 FREEMAIL_HDR_REPLYTO }, 'FREEMAIL_HDR_REPLYTO',
>  );
>  
>  ok sarun ("-L -t < data/spam/relayUS.eml", \&patterns_run_cb);
> @@ -28,16 +55,85 @@ clear_pattern_counters();
>  
>  %patterns = ();
>  %anti_patterns = (
> -  q{ FREEMAIL_FROM }, 'FREEMAIL_FROM',
> +  q{ 3.3 FREEMAIL_FROM }, 'FREEMAIL_FROM',
>  );
>  
> -tstlocalrules ("
> -  freemail_domains gmail.com
> +tstlocalrules (q{
>    freemail_import_welcomelist_auth 1
> -  welcomelist_auth test\@gmail.com
> +  welcomelist_auth [email protected]
>    header FREEMAIL_FROM eval:check_freemail_from()
> -");
> +  score FREEMAIL_FROM 3.3
> +});
>  
>  ok sarun ("-L -t < data/spam/relayUS.eml", \&patterns_run_cb);
>  ok_all_patterns();
>  
> +## From and Reply-To different
> +
> +%patterns = (
> +  q{ 3.3 FREEMAIL_FROM }, 'FREEMAIL_FROM',
> +  q{ 3.3 FREEMAIL_REPLYTO }, 'FREEMAIL_REPLYTO',
> +  q{ 3.3 FREEMAIL_REPLYXX }, 'FREEMAIL_REPLYXX',
> +  q{ 3.3 FREEMAIL_ENVFROM_END_DIGIT }, 'FREEMAIL_ENVFROM_END_DIGIT',
> +  q{ 3.3 FREEMAIL_REPLYTO_END_DIGIT }, 'FREEMAIL_REPLYTO_END_DIGIT',
> +  q{ 3.3 FREEMAIL_HDR_REPLYTO }, 'FREEMAIL_HDR_REPLYTO',
> +);
> +%anti_patterns = ();
> +
> +tstlocalrules (q{
> +  header FREEMAIL_FROM eval:check_freemail_from()
> +  score FREEMAIL_FROM 3.3
> +  header FREEMAIL_REPLYTO eval:check_freemail_replyto('replyto')
> +  score FREEMAIL_REPLYTO 3.3
> +  header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply')
> +  score FREEMAIL_REPLYXX 3.3
> +  header FREEMAIL_ENVFROM_END_DIGIT  
> eval:check_freemail_header('EnvelopeFrom', '\d@')
> +  score FREEMAIL_ENVFROM_END_DIGIT 3.3
> +  header FREEMAIL_REPLYTO_END_DIGIT  eval:check_freemail_header('Reply-To', 
> '\d@')
> +  score FREEMAIL_REPLYTO_END_DIGIT 3.3
> +  header FREEMAIL_HDR_REPLYTO eval:check_freemail_header('Reply-To')
> +  score FREEMAIL_HDR_REPLYTO 3.3
> +});
> +
> +ok sarun ("-L -t < data/spam/freemail1", \&patterns_run_cb);
> +ok_all_patterns();
> +
> +## Multiple Reply-To values, no email on body
> +
> +%patterns = (
> +  q{ 3.3 FREEMAIL_REPLYTO }, 'FREEMAIL_REPLYTO',
> +  q{ 3.3 FREEMAIL_REPLYXX }, 'FREEMAIL_REPLYXX',
> +  q{ 3.3 FREEMAIL_REPLYTO_END_DIGIT }, 'FREEMAIL_REPLYTO_END_DIGIT',
> +  q{ 3.3 FREEMAIL_HDR_REPLYTO }, 'FREEMAIL_HDR_REPLYTO',
> +);
> +%anti_patterns = ();
> +
> +tstlocalrules (q{
> +  header FREEMAIL_REPLYTO eval:check_freemail_replyto('replyto')
> +  score FREEMAIL_REPLYTO 3.3
> +  header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply')
> +  score FREEMAIL_REPLYXX 3.3
> +  header FREEMAIL_REPLYTO_END_DIGIT  eval:check_freemail_header('Reply-To', 
> '\d@')
> +  score FREEMAIL_REPLYTO_END_DIGIT 3.3
> +  header FREEMAIL_HDR_REPLYTO eval:check_freemail_header('Reply-To')
> +  score FREEMAIL_HDR_REPLYTO 3.3
> +});
> +
> +ok sarun ("-L -t < data/spam/freemail2", \&patterns_run_cb);
> +ok_all_patterns();
> +
> +## No Reply-To, another freemail in body
> +
> +%patterns = (
> +  q{ 3.3 FREEMAIL_REPLYXX }, 'FREEMAIL_REPLYXX',
> +);
> +%anti_patterns = ();
> +
> +tstlocalrules (q{
> +  header FREEMAIL_REPLYXX eval:check_freemail_replyto('reply')
> +  score FREEMAIL_REPLYXX 3.3
> +});
> +
> +ok sarun ("-L -t < data/spam/freemail3", \&patterns_run_cb);
> +ok_all_patterns();
> +
> 
> Modified: spamassassin/trunk/t/get_all_headers.t
> URL: 
> http://svn.apache.org/viewvc/spamassassin/trunk/t/get_all_headers.t?rev=1889337&r1=1889336&r2=1889337&view=diff
> ==============================================================================
> --- spamassassin/trunk/t/get_all_headers.t (original)
> +++ spamassassin/trunk/t/get_all_headers.t Fri Apr 30 18:17:51 2021
> @@ -2,14 +2,34 @@
>  
>  use lib '.'; use lib 't';
>  use SATest; sa_t_init("get_all_headers");
> -use Test::More tests => 5;
> +use Test::More;
> +
> +use constant HAS_EMAIL_ADDRESS_XS => eval { require Email::Address::XS; };
> +
> +$tests = 19;
> +$tests += 19 if (HAS_EMAIL_ADDRESS_XS);
> +plan tests => $tests;
>  
>  # ---------------------------------------------------------------------------
>  
>  %patterns = (
> -  q{ MIME-Version: 1.0 } => 'no-extra-space',
> -  q{/text-all-raw: Received: from yahoo\.com\[\\\\n\]    
> \(PPPa33-ResaleLosAngelesMetroB2-2R7452\.dialinx\.net \[4\.48\.136\.190\]\) 
> by\[\\\\n\]    www\.goabroad\.com\.cn \(8\.9\.3/8\.9\.3\) with SMTP id 
> TAA96146; Thu,\[\\\\n\]    30 Aug 2001 19:06:45 \+0800 \(CST\) 
> \(envelope-from\[\\\\n\]    pertand\@email\.mondolink\.com\)\[\\\\n\]From  
> :<tst1\@example\.com>\[\\\\n\]X-Mailer: Mozilla 4\.04 \[en\]C-bls40  \(Win95; 
> U\)\[\\\\n\]To: jenny33436\@netscape\.net\[\\\\n\]Subject: 
> via\.gra\[\\\\n\]From:\[\\\\t\]  <tst2\@example\.com>\[\\\\n\]DATE: Fri, 7 
> Dec 2001 07:01:03\[\\\\n\]MIME-Version: 1\.0\[\\\\n\]Message-Id: 
> <20011206235802\.4FD6F1143D6\@mail\.netnoteinc\.com>\[\\\\n\]Sender: 
> travelincentives\@aol\.com\[\\\\n\]Content-Type: text/plain; 
> charset="us-ascii"\[\\\\n\]/} => 'full-headers-raw',
> -  q{/text-all-noraw: Received: from yahoo\\.com 
> \\(PPPa33-ResaleLosAngelesMetroB2-2R7452\\.dialinx\\.net 
> \\[4\\.48\\.136\\.190\\]\\) by www\\.goabroad\\.com\\.cn 
> \\(8\\.9\\.3/8\\.9\\.3\\) with SMTP id TAA96146; Thu, 30 Aug 2001 19:06:45 
> \\+0800 \\(CST\\) \\(envelope-from 
> pertand\\@email\\.mondolink\\.com\\)\[\\\\n\]From: 
> <tst1\\@example\\.com>\[\\\\n\]X-Mailer: Mozilla 4\\.04 \\[en\\]C-bls40  
> \\(Win95; U\\)\[\\\\n\]To: jenny33436\\@netscape\\.net\[\\\\n\]Subject: 
> via\\.gra\[\\\\n\]From: <tst2\\@example\\.com>\[\\\\n\]DATE: Fri, 7 Dec 2001 
> 07:01:03\[\\\\n\]MIME-Version: 1\\.0\[\\\\n\]Message-Id: 
> <20011206235802\\.4FD6F1143D6\\@mail\\.netnoteinc\\.com>\[\\\\n\]Sender: 
> travelincentives\\@aol\\.com\[\\\\n\]Content-Type: text/plain; 
> charset="us-ascii"\[\\\\n\]/} => 'full-headers-noraw',
> +  q{'MIME-Version: 1.0'} => 'no-extra-space',
> +  q{'scalar-text-all-raw: Received: from yahoo.com[\n]    
> (PPPa33-ResaleLosAngelesMetroB2-2R7452.dialinx.net [4.48.136.190]) by[\n]    
> www.goabroad.com.cn (8.9.3/8.9.3) with SMTP id TAA96146; Thu,[\n]    30 Aug 
> 2001 19:06:45 +0800 (CST) (envelope-from[\n]    
> [email protected])[\n]From  :<[email protected]>[\n]X-Mailer: 
> Mozilla 4.04 [en]C-bls40  (Win95; U)[\n]To: 
> [email protected][\n]Subject: via.gra[\n]From:[\t]  
> <[email protected]>[\n]DATE: Fri, 7 Dec 2001 07:01:03[\n]MIME-Version: 
> 1.0[\n]Message-Id: 
> <[email protected]>[\n]Sender: 
> [email protected][\n]Content-Type: text/plain; 
> charset="us-ascii"[\n][END]'} => 'scalar-text-all-raw',
> +  q{'scalar-text-all-noraw: Received: from yahoo.com 
> (PPPa33-ResaleLosAngelesMetroB2-2R7452.dialinx.net [4.48.136.190]) by 
> www.goabroad.com.cn (8.9.3/8.9.3) with SMTP id TAA96146; Thu, 30 Aug 2001 
> 19:06:45 +0800 (CST) (envelope-from [email protected])[\n]From: 
> <[email protected]>[\n]X-Mailer: Mozilla 4.04 [en]C-bls40  (Win95; U)[\n]To: 
> [email protected][\n]Subject: via.gra[\n]From: 
> <[email protected]>[\n]DATE: Fri, 7 Dec 2001 07:01:03[\n]MIME-Version: 
> 1.0[\n]Message-Id: 
> <[email protected]>[\n]Sender: 
> [email protected][\n]Content-Type: text/plain; 
> charset="us-ascii"[\n][END]'} => 'scalar-text-all-noraw',
> +  q{'scalar-text-from-raw: <[email protected]>[\n][\t]  
> <[email protected]>[\n][END]'} => 'scalar-text-from-raw',
> +  q{'scalar-text-from-noraw: 
> <[email protected]>[\n]<[email protected]>[\n][END]'} => 
> 'scalar-text-from-noraw',
> +  q{'scalar-text-from-addr: [email protected][END]'} => 
> 'scalar-text-from-addr',
> +  q{'list-text-all-raw: Received: from yahoo.com[\n]    
> (PPPa33-ResaleLosAngelesMetroB2-2R7452.dialinx.net [4.48.136.190]) by[\n]    
> www.goabroad.com.cn (8.9.3/8.9.3) with SMTP id TAA96146; Thu,[\n]    30 Aug 
> 2001 19:06:45 +0800 (CST) (envelope-from[\n]    
> [email protected])[\n][LIST]From  
> :<[email protected]>[\n][LIST]X-Mailer: Mozilla 4.04 [en]C-bls40  (Win95; 
> U)[\n][LIST]To: [email protected][\n][LIST]Subject: 
> via.gra[\n][LIST]From:[\t]  <[email protected]>[\n][LIST]DATE: Fri, 7 Dec 2001 
> 07:01:03[\n][LIST]MIME-Version: 1.0[\n][LIST]Message-Id: 
> <[email protected]>[\n][LIST]Sender: 
> [email protected][\n][LIST]Content-Type: text/plain; 
> charset="us-ascii"[\n][END]'} => 'list-text-all-raw',
> +  q{'list-text-all-noraw: Received: from yahoo.com 
> (PPPa33-ResaleLosAngelesMetroB2-2R7452.dialinx.net [4.48.136.190]) by 
> www.goabroad.com.cn (8.9.3/8.9.3) with SMTP id TAA96146; Thu, 30 Aug 2001 
> 19:06:45 +0800 (CST) (envelope-from 
> [email protected])[\n][LIST]From: 
> <[email protected]>[\n][LIST]X-Mailer: Mozilla 4.04 [en]C-bls40  (Win95; 
> U)[\n][LIST]To: [email protected][\n][LIST]Subject: 
> via.gra[\n][LIST]From: <[email protected]>[\n][LIST]DATE: Fri, 7 Dec 2001 
> 07:01:03[\n][LIST]MIME-Version: 1.0[\n][LIST]Message-Id: 
> <[email protected]>[\n][LIST]Sender: 
> [email protected][\n][LIST]Content-Type: text/plain; 
> charset="us-ascii"[\n][END]'} => 'list-text-all-noraw',
> +  q{'list-text-from-raw: <[email protected]>[\n][LIST][\t]  
> <[email protected]>[\n][END]'} => 'list-text-from-raw',
> +  q{'list-text-from-noraw: 
> <[email protected]>[\n][LIST]<[email protected]>[\n][END]'} => 
> 'list-text-from-noraw',
> +  q{'list-text-from-addr: [email protected][LIST][email protected][END]'} => 
> 'list-text-from-addr',
> +  q{'list-text-from-first-addr: [email protected][END]'} => 
> 'list-text-from-first-addr',
> +  q{'list-text-from-last-addr: [email protected][END]'} => 
> 'list-text-from-last-addr',
> +  q{'list-text-msgid-host: mail.netnoteinc.com[END]'} => 
> 'list-text-msgid-host',
> +  q{'list-text-msgid-domain: netnoteinc.com[END]'} => 
> 'list-text-msgid-domain',
> +  q{'list-text-received-ip: 4.48.136.190[END]'} => 'list-text-received-ip',
> +  q{'list-text-received-revip: 190.136.48.4[END]'} => 
> 'list-text-received-revip',
>  );
>  
>  %anti_patterns = (
> @@ -20,6 +40,15 @@ tstprefs ("
>    loadplugin Dumpheaders ../../../data/Dumpheaders.pm
>  ");
>  
> +# Internal parser
> +$ENV{'SA_HEADER_ADDRESS_PARSER'} = 1;
>  ok (sarun ("-L -t < data/spam/008", \&patterns_run_cb));
>  ok_all_patterns();
>  
> +if (HAS_EMAIL_ADDRESS_XS) {
> +  # Email::Address::XS
> +  $ENV{'SA_HEADER_ADDRESS_PARSER'} = 2;
> +  ok (sarun ("-L -t < data/spam/008", \&patterns_run_cb));
> +  ok_all_patterns();
> +} else { warn "Not running Email::Address::XS tests, module missing\n"; }
> +
> 

Reply via email to