https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6926

Adam Katz <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]

--- Comment #1 from Adam Katz <[email protected]> ---
> Do they provide their TLD list in machine readable format?

How about:

    $ wget -qqO- https://www.iana.org/domains/root/db \
      |perl -ne 'if (m"/root/db/([^.]+)\.html") { print "$1\n" }' \
      > tld.txt

After that, you can run:

    $ sed '/^  ac ad/,/^  zm zw/!d; s/^  //; s/  */\n/g' \
      lib/Mail/SpamAssassin/Util/RegistrarBoundaries.pm \
        |grep -vwFf- tld.txt

Which currently reveals we're missing:

    bl bq bv cw eh gb mf post sj ss sx um
    (plus all the punycode IDNs, unless we track them elsewhere)

(I also ran the opposite.  We don't have any TLDs that aren't on IANA's list.)

We'll have to add these via util_rb_tld in sa-update in addition to
RegistrarBoundaries.pm so users don't have to wait for SA 3.4.0 to get this.

While on the ~tld topic, I see we don't yet include
https://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1
(for 2tld and 3tld).  I haven't vetted that to see if it's worthwhile, but in
doing some research a while back, it looked ideal.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to