Hi Giovanni.

I've been looking at this, and haven't quite figured out what is wrong, since all of the parts individually will match. But I've been wondering about some of the coding, since it seems to match a lot of stuff that wouldn't be an even vaguely correctly formatted US phone number. Are you looking for phone numbers in other countries also?

Some comments:
1. The country code is matched as 1 or 2 digits. For the US it is always a "1", and for other countries I believe it can potentially be 1, 2 or 3 digits. 2. The area code is matched with optional parends around it, which is correct. 3. The 3-digit exchange part of the number is also matched with optional parends, and they would never appear here. 4. The line number part is matched as 4 to 7 digits. It would always be 4. It could be considered 4 or 7, but you already have the separate 3 digit exchange match and a separator that can be nonexistant. It seems that this should be a strict 4 digit match for a US number.

I don't have any sort of collection of spammer phone numbers, and haven't paid much attention to them, so I don't know what variants are common in the wild. I'll keep working on this, and will simplify it down a little to match the common valid US formats, along with some slop in the separators. This will probably then not match common international numbers.

       Loren

----- Original Message ----- Sent: Wednesday, April 02, 2025 10:01 AM
Subject: Phone rule vs replace tags


Hi,
I am trying to write a rule that should match a (US) phone number using replace_tags.
The email I am using for tests is https://dpaste.org/v3b9A/raw
The (simplified) WIP rule is:

replace_tag N0      (?:0|O|\xf0\x9d\x9f\x8e)
replace_tag N1      (?:1|l|\xf0\x9d\x9f\x8f)
replace_tag N2      (?:2|\xf0\x9d\x9f\x90)
replace_tag N3      (?:3|\xf0\x9d\x9f\x91)
replace_tag N4      (?:4|\xf0\x9d\x9f\x92)
replace_tag N5      (?:5|\xf0\x9d\x9f\x93)
replace_tag N6      (?:6|\xf0\x9d\x9f\x94)
replace_tag N7      (?:7|\xf0\x9d\x9f\x95)
replace_tag N8      (?:8|\xf0\x9d\x9f\x96)
replace_tag N9      (?:9|\xf0\x9d\x9f\x97)
replace_rules       OB_PHONE_S
body OB_PHONE_S /\b(?:\+)?(?:\s)?(?:(?:<N1>|<N2>|<N3>|<N4>|<N5>|<N6>|<N7>|<N8>|<N9>){1,2})?(?:\s|\-){0,2}(?:\()?((?:(?:<N0>|<N1>|<N2>|<N3>|<N4>|<N5>|<N6>|<N7>|<N8>|<N9>){3})(?:\))?(?:\s|\-){0,2}(?:\()?(?:(?:<N0>|<N1>|<N2>|<N3>|<N4>|<N5>|<N6>|<N7>|<N8>|<N9>){3})(?:\))?(?:\s|\-){0,3}(?:(?:<N0>|<N1>|<N2>|<N3>|<N4>|<N5>|<N6>|<N7>|<N8>|<N9>){4,7}))\b/

Any hints about why it doesn't hit ?
 Thanks
  Giovanni


Reply via email to