Kyle Meyer <[email protected]> wrote:
> I've been testing out obfuscate=true a bit (which won't be a surprise to
> Eric, given a private email that was sent to both of us).  One issue I
> noticed is that it breaks archive links.  I've posted an example at
> <https://yhetil.org/obf/20201204120929.GA22736@dcvr/>:
> 
>   Reported-by: Kyle Meyer <kyle@kyleam•com>
>   Link: https://public-inbox.org/meta/87360nlc44.fsf@kyleam•com/

Oops, I think the following fixes it, but not sure if there's a
better way to accomplish the same thing....

I worry the regexp change is susceptible to performance problems
from malicious inputs.  I can't remember if something like this
triggers a pathological case or not, or if I'm confusing this
with another quirk that does (or quirks of another RE engine)

------------8<--------
Subject: [WIP] www: do not perform address obfuscation on URLs

---
 lib/PublicInbox/Hval.pm | 10 ++++++----
 t/hval.t                |  4 ++++
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/Hval.pm b/lib/PublicInbox/Hval.pm
index d20f70ae..6f1a046c 100644
--- a/lib/PublicInbox/Hval.pm
+++ b/lib/PublicInbox/Hval.pm
@@ -82,15 +82,17 @@ sub obfuscate_addrs ($$;$) {
        my $repl = $_[2] // '&#8226;';
        my $re = $ibx->{-no_obfuscate_re}; # regex of domains
        my $addrs = $ibx->{-no_obfuscate}; # { $address => 1 }
-       $_[1] =~ s/(([\w\.\+=\-]+)\@([\w\-]+\.[\w\.\-]+))/
-               my ($addr, $user, $domain) = ($1, $2, $3);
-               if ($addrs->{$addr} || ((defined $re && $domain =~ $re))) {
+       $_[1] =~ s#(\S*?)(([\w\.\+=\-]+)\@([\w\-]+\.[\w\.\-]+))#
+               my ($beg, $addr, $user, $domain) = ($1, $2, $3, $4);
+               if (index($beg, '://') > 0) {
+                       $beg.$addr;
+               } elsif ($addrs->{$addr} || ((defined $re && $domain =~ $re))) {
                        $addr;
                } else {
                        $domain =~ s!([^\.]+)\.!$1$repl!;
                        $user . '@' . $domain
                }
-               /sge;
+               #sge;
 }
 
 # like format_sanitized_subject in git.git pretty.c with '%f' format string
diff --git a/t/hval.t b/t/hval.t
index 9d0dab7a..5afc2052 100644
--- a/t/hval.t
+++ b/t/hval.t
@@ -47,6 +47,10 @@ EOF
 
 is($html, $exp, 'only obfuscated relevant addresses');
 
+$exp = 'https://example.net/[email protected]';
+PublicInbox::Hval::obfuscate_addrs($ibx, my $res = $exp);
+is($res, $exp, 'does not obfuscate URL with Message-ID');
+
 is(PublicInbox::Hval::to_filename('foo bar  '), 'foo-bar',
        'to_filename has no trailing -');
 
--
unsubscribe: one-click, see List-Unsubscribe header
archive: https://public-inbox.org/meta/

Reply via email to