https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8267

            Bug ID: 8267
           Summary: ExtractText.pm
           Product: Spamassassin
           Version: 4.0.1
          Hardware: PC
                OS: Windows 10
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Plugins
          Assignee: dev@spamassassin.apache.org
          Reporter: j...@disroot.org
  Target Milestone: Undefined

Hiya!

It looks like there's a bug here:

--- a/lib/Mail/SpamAssassin/Plugin/ExtractText.pm       2024-03-29
02:00:00.000000000 +0000
+++ b/lib/Mail/SpamAssassin/Plugin/ExtractText.pm       2024-07-06
21:56:00.788596023 +0100
@@ -601,7 +601,7 @@ sub _extract {
       push @{$coll->{flags}}, 'ActionURI';
       dbg("extracttext: ActionURI: $1");
       push @{$coll->{text}}, $text;
-      push @{$coll->{uris}}, $2;
+      push @{$coll->{uris}}, $1;
     } elsif($text =~ /QR-Code\:([^\s]*)/) {
       # zbarimg(1) prefixes the url with "QR-Code:" string
       my $qrurl = $1;

Note that the regex has a "?:" in the first capturing group:

    if ($text =~ /<a(?:\s+[^>]+)?\s+href="([^">]*)"/) {

So, you just have $1. $2 is undef.

A side note: You say "This module (ExtractText.pm) uses external tools to
extract text from message parts, and then sets the text as the rendered part.
**External tool must output plain text**, not HTML or other non-textual
result."

Though, this code is parsing an html tag for a href attribute...

Cheers,
jps

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to