On 4/4/25 1:57 AM, Kent Oyer wrote:
The problem is the word boundary (\b). There's no word boundary next to a 
mathematical symbol. Try this one instead:

this works with all my spamples, thanks.
 Giovanni

replace_tag N1      (?:1|l|\xf0\x9d\x9f\x8f)
replace_tag DIGIT (?:[0-9Ol]|\xf0\x9d\x9f[\x8e-\x97])
replace_rules OB_PHONE_S
body OB_PHONE_S 
/(?<!\d)(?:<N1>[^a-zA-Z0-9]*)?<DIGIT>{3}[^a-zA-Z0-9]+<DIGIT>{3}[^a-zA-Z0-9]+<DIGIT>{4}(?!\d)/

This should detect US phone numbers in various formats:
1 (800) 555-1212
1-800-555-1212
800.555.1212

Caveats:
1. This will fire on non-obfuscated phone numbers also
2. This will not fire on obfuscated phone numbers if they use any symbols other 
than MATHEMATICAL BOLD

See attached for a more complete list of homoglyphs

Thanks
Kent


On Thu, Apr 3, 2025 at 02:43 AM, giova...@paclan.it wrote:

    
------------------------------------------------------------------------------------------------------------------
    CAUTION: External email from: giovanni@‌paclan.‌it
    Use caution before clicking on links or opening attachments.
    
------------------------------------------------------------------------------------------------------------------
    On 4/3/25 8:04 AM, Loren Wilton wrote:

        
        Well, this is very strange and I don't know what is going on. I almost 
suspect some sort of bug in the regex processor in SA.
        replace_tag N1      (?:1|l|\xf0\x9d\x9f\x8f)
        replace_tag DIGIT (?:[0-9Ol]|\xf0\x9d\x9f[\x8e-\x97])
        replace_rules OB_PHONE_TEST4 OB_PHONE_TEST5 OB_PHONE_TEST6
        body OB_PHONE_TEST4 
/\b(?:\+?\s?<N1>\s?)?\(?<DIGIT>{3}\)?[-\s]{0,3}<DIGIT>{3}[-\s]{0,3}\b/
        body OB_PHONE_TEST5 
/\b(?:\+?\s?<N1>\s?)?\(?<DIGIT>{3}\)?[\s-]{0,3}<DIGIT>{3}[\s-]{0,3}<DIGIT>{4}/
        body OB_PHONE_TEST6 
/\(?<DIGIT>{3}\)?[\s-]{0,3}<DIGIT>{3}[\s-]{0,3}<DIGIT>{4}/
        Rules 4 and 6 match. Rule 5, which is the complete match, does not. I 
have no idea why.
        I was getting the same results using your rule form before I simplified 
things a bit. Partial overlapping matches work, a complete match does not. The 
complete match DOES work if the phone number is in ASCII. But not if any digit 
is unicode.



    Actually OB_PHONE_TEST4 matches on "INV-854113" and OB_PHONE_TEST6 matches on 
"andr9202822840@caosusaoviet[.]vn", the regexps doesn't seem to work at all.
    Giovanni


Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to