On 4/4/25 1:57 AM, Kent Oyer wrote:
The problem is the word boundary (\b). There's no word boundary next to a mathematical symbol. Try this one instead:
this works with all my spamples, thanks. Giovanni
replace_tag N1 (?:1|l|\xf0\x9d\x9f\x8f) replace_tag DIGIT (?:[0-9Ol]|\xf0\x9d\x9f[\x8e-\x97]) replace_rules OB_PHONE_S body OB_PHONE_S /(?<!\d)(?:<N1>[^a-zA-Z0-9]*)?<DIGIT>{3}[^a-zA-Z0-9]+<DIGIT>{3}[^a-zA-Z0-9]+<DIGIT>{4}(?!\d)/ This should detect US phone numbers in various formats: 1 (800) 555-1212 1-800-555-1212 800.555.1212 Caveats: 1. This will fire on non-obfuscated phone numbers also 2. This will not fire on obfuscated phone numbers if they use any symbols other than MATHEMATICAL BOLD See attached for a more complete list of homoglyphs Thanks Kent On Thu, Apr 3, 2025 at 02:43 AM, giova...@paclan.it wrote: ------------------------------------------------------------------------------------------------------------------ CAUTION: External email from: giovanni@paclan.it Use caution before clicking on links or opening attachments. ------------------------------------------------------------------------------------------------------------------ On 4/3/25 8:04 AM, Loren Wilton wrote: Well, this is very strange and I don't know what is going on. I almost suspect some sort of bug in the regex processor in SA. replace_tag N1 (?:1|l|\xf0\x9d\x9f\x8f) replace_tag DIGIT (?:[0-9Ol]|\xf0\x9d\x9f[\x8e-\x97]) replace_rules OB_PHONE_TEST4 OB_PHONE_TEST5 OB_PHONE_TEST6 body OB_PHONE_TEST4 /\b(?:\+?\s?<N1>\s?)?\(?<DIGIT>{3}\)?[-\s]{0,3}<DIGIT>{3}[-\s]{0,3}\b/ body OB_PHONE_TEST5 /\b(?:\+?\s?<N1>\s?)?\(?<DIGIT>{3}\)?[\s-]{0,3}<DIGIT>{3}[\s-]{0,3}<DIGIT>{4}/ body OB_PHONE_TEST6 /\(?<DIGIT>{3}\)?[\s-]{0,3}<DIGIT>{3}[\s-]{0,3}<DIGIT>{4}/ Rules 4 and 6 match. Rule 5, which is the complete match, does not. I have no idea why. I was getting the same results using your rule form before I simplified things a bit. Partial overlapping matches work, a complete match does not. The complete match DOES work if the phone number is in ASCII. But not if any digit is unicode. Actually OB_PHONE_TEST4 matches on "INV-854113" and OB_PHONE_TEST6 matches on "andr9202822840@caosusaoviet[.]vn", the regexps doesn't seem to work at all. Giovanni
OpenPGP_signature.asc
Description: OpenPGP digital signature