[dmarc-ietf] Most common problems with DMARC records

Steven M Jones Thu, 08 Sep 2016 03:28:23 -0700

In June/July I slogged through about six years of captured email
authentication records, and wrote a blog post about what I found
regarding DMARC policy records.
(https://dmarc.org/2016/07/common-problems-with-dmarc-records/)


A colleague suggested that a write-up with more numbers would be
considered useful for this working group. So I did a lot more slogging,
and here's the result. There are undoubtedly shortcomings in the
methodology, feel free to point them out if you enjoy that sort of
thing. No telling when I could incorporate any suggestions, and I am not
able to share the underlying data - sorry. But I will receive any
suggestions gladly and save them for later use.


tl;dr
=====

Two percent of all recognizable DMARC records published were invalid.

The three top problems with DMARC records observed over six years were:

1: Missing "p=" tag
2: The "DMARC" in "v=DMARC1" wasn't capitalized
3: Extraneous escape/quoting characters in record

There's a "Summary" at the bottom if you want it.




Introduction
============

Farsight Security has made over six years of DNS data related to email
authentication protocols available to DMARC.org. We recently reviewed
DMARC records across this period, which seems like it might provide
useful input for the IETF DMARC working group.

The Farsight data is captured by network probes proximate to a number
of nameservers. While this represents a subset of all Internet
nameservice traffic, it nevertheless provides a unique opportunity to
extract trends in the use of email authentication over time.



Records in the Dataset
======================

In all, 41 million records were made available, captured in the period
from June 2010 through June 2016 (minus three weeks in December
2015). This included over 37 million TXT records and over 1.5 million
CNAME records. Almost 924,000 records dealt with labels beginning with
"_dmarc."



Non-DMARC Records at DMARC Labels
=================================

Out of almost 924,000 records in the dataset at DMARC labels
(beginning "_dmarc"), over 435,000 were not even TXT records. Here is
a breakdown of all the resource record types appearing at these
labels:

Resource Record Types at DMARC Labels

A records              3,487
AAAA records                 10
NS records              9,053
NSEC records              3,793
SOA records            182,959
TXT records            488,656


Many of the TXT records at DMARC labels were in fact records for a
different protocol such as SPF, Sender-ID, or DKIM.

Other Protocols at DMARC TXT Record Labels

SPF records             268,731
Sender-ID records         28,172
DKIM records                176
                     -------
                297,079

Bear in mind that TXT records are multivalued, and it is common to
publish records for multiple protocols at the same label - SPF and
Sender-ID are meant to operate this way, and SPF and DMARC are often
deployed this way. Unfortunately these records are often
(mis-)configured as wildcards for the entire domain, and get returned
for an TXT record request under that domain. Fortunately non-DMARC
records should have no effect being deployed at a "_dmarc" label.

But there are other recognizable schemes or protocols in use that
show up here, such as domain- or site-ownership verification.

Site Verification Strings at DMARC Labels

"bio=<hexstring>"        20,094
Google site verification     1,920
"v=msv1 t=<hexstring>"           246
Globalsign domain verification        82
                     ------
                22,342

Again, this is probably due to the unecessary creation of wildcard
records for these functions.



Values For DMARC TXT Records
============================

Examination of more than 488,000 TXT records at labels beginning with
"_dmarc" yielded the following breakdown of recognized email
authentication protocols:

Valid DMARC records    184,805
SPF records         268,731
Sender-ID records     28,172
DKIM records            176
Other values         36,687

Many of these labels include both SPF and DMARC records, and Sender-ID
records are almost always seen alongside SPF records, so the above
breakdown does not match a simple count of TXT records with "_dmarc"
labels in the dataset.

Most of these record types were identified by string matching the
first several characters of the value. In all cases this was done
after any extraneous leading quotation marks or backslashes (the
escape character for many nameservers) were removed.

Classifying most records involved looking for matches of "v=spf1",
"spf/2.0", "v=DKIM1", et cetera at the beginning of the value. For
evaluating DMARC records, the payload was validated using the
Mail::DMARC module written by Matt Simerson.

While more rigorous validity checks for the non-DMARC record types
could be performed, it did not seem worthwhile for labels in the
"_dmarc" namespace.



Policies of Valid DMARC Records
===============================

We will present two views of the distribution of policies expressed in
valid DMARC records. One is a count of all policies expressed in the
dataset over the entire period. The other is a count based on the last
policy published for any given domain during the entire period. So if
example.com published one DMARC policy in 2014 and a different policy
in 2015, both would appear in the first breakdown below, but only the
2015 policy would be part of the second breakdown.

Note that some domains allow the subdomain policy ("sp=") to be set
via inheritance, while others set it explicitly to the same value
despite the inheritance mechanism. We will keep the two counts
separate as a reflection of domain owner behavior.

All valid DMARC policy records ever observed in dataset:

p=none                121,333
p=none, sp=none             19,152
p=none, sp=quarantine            126
p=none, sp=reject            509

p=quarantine              9,980
p=quarantine, sp=none          1,272
p=quarantine, sp=quarantine      1,750
p=quarantine, sp=reject          1,754

p=reject                   23,885
p=reject, sp=none          2,580
p=reject, sp=quarantine             39
p=reject, sp=reject          2,296
                  -------
                  184,676

(Note that sums may not match other sections due to manual correction
of the classification of DMARC records after automatic processing.)


Last valid DMARC policy observed for any given label:

p=none                 72,597
p=none, sp=none             12,174
p=none, sp=quarantine             58
p=none, sp=reject            241

p=quarantine              5,082
p=quarantine, sp=none            584
p=quarantine, sp=quarantine        873
p=quarantine, sp=reject          1,205

p=reject                   14,134
p=reject, sp=none          1,640
p=reject, sp=quarantine             17
p=reject, sp=reject          1,052
                  -------
                109,657


Domain owners and other potential implementors often ask how many
domains are using the different policies. For a simplified answer,
here is a percentage breakdown of the policy expressed at a given
label, ignoring subdomain policies. This will only reflect the last
policy observed for any given domain/label over the past six years.

Last valid DMARC policy observed for any given label:

p=none            85,070        77.6%
p=quarantine         7,744         7.1%
p=reject        16,843        15.4%
               -------
               109,657

Please note that these figures reflect a mix of records still
published as of July 2016 (52,000+) and those no longer published
(57,000+). It would be far preferable to provide a breakdown of just
the records still published, but the raw policy records for those
labels are not available as of this writing. Those figures will be
captured and produced at the next quarterly update.

There are several questions around those DMARC records no longer being
published, and they are the subject of a separate analysis.



Random Strings in DMARC Labels
==============================

There are a number of non-standard strings that appear in the invalid
DMARC records in the dataset.

Records dynamically generated        872
kasserver.com                    673
Please contact your registr        152
This domain may be available for purch    113
UPDATE-                                      20
This domain's zone has been disabled     16
nodigispam                                10
*                                           6
Text                                    5
ERROR                      1
dont do this                  1


There are other strings that appear very infrequently, but are a
little more personal.

Computer Reseller News/Russian Edition      7
Dare Contract Service NZ          1
Entwickler, Admin, Techie ...com/jobs.php 5
Koninklijke Nederlandsche Voetbalbond      1
PC Magazine/Russian Edition          1
Red Cactus Design Limited          1
Shannon Development Authority          2

A small number of other strings appeared because of operator error or
confusion in managing the domain's DNS data. The following strings
typify these cases:

   “For” “text” “record” “put:” “v=DMARC1; p=none; sp=none;
   rua=mailto:[email protected]; ruf=mailto:[email protected];
   rf=afrf; pct=100; ri=14400″

   “_dmarc.clementine.XXXXXXXX.fr descriptive text v=DMARC1;
   p=reject;”



Malformed DMARC Records
=======================

Of the more than 36,000 TXT records at DMARC labels that were not
valid DMARC records, only about 4,100 feature the string "DMARC" or
"v=dmarc". An exhaustive classification hasn't been compiled, but
here are the malformations that have been isolated so far.

(no "p=" tag/value)  1,698    Missing "p=" tag/value at policy label
                          (e.g. not ..._report._dmarc...)
v=dmarc1           810    The string "DMARC" must be upper case
DMARC1; p=...              62    Missing "v="
\\226\\128\\156v=DMARC1    40    Incorrect quoting/escaping
_dmarc v=DMARC1        19    Extraneous text
v=DMARC;        15    Missing the "1" in "DMARC1"
TXT \\\"v=DMARC1     7    Extraneous text and quoting
v=DMARC1/;         6    Incorrect escape character
v=DMARC1:         4    Use of ":" instead of ";"
v=DMARC1<SP>p=         4    Using "<SP>" at all
v%253DDMARC1%253B     3    Two levels of character escaping
v=DMARC1; \\226\\128     3    Incorrect quoting/escaping
=DMARC1             2    Missing the "v" in "v=" tag/value pair
v-DMARC1         1    Missing the "=" in "v=" tag/value pair

In addition, a number of records appear with widely varying numbers
and arrangements of escape or quoting characters. It is not at all
uncommon to see constructs like the following:

\"\\\"v=DMARC1;       618     Numerous escapes/quotes
\"v=DMARC1;\" \"p     187    Escapes/quotes within the record
\"v=DMARC1;\" \"p=\"    4    Escapes and missing "p=" value
\"\" \"v=DMARC1\"       2    Escapes, quotes, and spaces...
\" \\\"v=DMARC1;    1    ...



Incorrect Policy Values
=======================

The use of incorrect policy values in the "p=" tag/value pair, or of a
pre-publication tag name such as "policy" instead of "p", may be of
particular interest to the working group when refining the protocol
specification. For this reason they appear below instead of above with
the general malformed DMARC records.

p=monitor           125    Incorrect "p=" value
policy=none        52    Incorrect/outdated tag
p=monitor; sp=monitor    41    Incorrect "p=" and "sp=" value
p=nonee               20    Incorrect "p=" value
p=;            16    Missing value of "p=" tag/value pair
p=quarantaine        11    Incorrect "p=" value
p=quarentine         7    Incorrect "p=" value
p=quarintine         7    Incorrect "p=" value
p=non;             3    Incorrect "p=" value
p=blocked         1    Incorrect "p=" value



Summary
=======

The majority of non-DMARC DNS TXT records appearing at "_dmarc" labels
in the six year Farsight dataset is most likely due to the widespread
use of wildcard SPF records. Given that, how many records from the
dataset should actually be considered as malformed or problematic
DMARC records?

Taking the records described in the last two sections, "Malformed
DMARC Records" and "Incorrect Policy Values," we can be fairly
confident we're considering records intended to be DMARC deployments.
That gives us 3,769 invalid DMARC records, recognizing there may be
some overlap of records with more than one type of flaw.

Since these are all the invalid records captured, they should be
compared to all the valid DMARC records captured (184,676). This
appears to yield an error rate of little over 2% for all DMARC records
in our dataset.

What are the most common faults observed? The most common fault in a
published DMARC record appears to be omission of the required "p="
policy tag. The second most common fault appears to be not ensuring
the string "DMARC" in the "v=" tag is in fact upper case. And the
third most common fault appears to involve using too many escape and
quoting characters in the nameserver data, resulting in extraneous
characters in the published record.

It's known that many mail receiver implementations will strip out some
quoting and escaping characters in DMARC records they retrieve, though
the details are not known. They may only strip the outermost pair,
only operate between the beginning and "v=" at the start of the
record, etc. While a given receiver's validation methods might produce
different totals, this issue is probably still a solid third place
ranking as a class of problem.




-- 
Steven M Jones
DMARC.org

e: [email protected], [email protected]

tl;dr
=====

Two percent of all recognizable DMARC records published were invalid.

The three top problems with DMARC records observed over six years were:

1: Missing "p=" tag
2: The "DMARC" in "v=DMARC1" wasn't capitalized
3: Extraneous escape/quoting characters in record

There's a "Summary" at the bottom if you want it.




Introduction
============

Farsight Security has made over six years of DNS data related to email
authentication protocols available to DMARC.org. We recently reviewed
DMARC records across this period, which seems like it might provide
useful input for the IETF DMARC working group.

The Farsight data is captured by network probes proximate to a number
of nameservers. While this represents a subset of all Internet
nameservice traffic, it nevertheless provides a unique opportunity to
extract trends in the use of email authentication over time.



Records in the Dataset
======================

In all, 41 million records were made available, captured in the period
from June 2010 through June 2016 (minus three weeks in December
2015). This included over 37 million TXT records and over 1.5 million
CNAME records. Almost 924,000 records dealt with labels beginning with
"_dmarc."



Non-DMARC Records at DMARC Labels
=================================

Out of almost 924,000 records in the dataset at DMARC labels
(beginning "_dmarc"), over 435,000 were not even TXT records. Here is
a breakdown of all the resource record types appearing at these
labels:

Resource Record Types at DMARC Labels

A records                         3,487
AAAA records                         10
NS records                        9,053
NSEC records                      3,793
SOA records                     182,959
TXT records                     488,656


Many of the TXT records at DMARC labels were in fact records for a
different protocol such as SPF, Sender-ID, or DKIM.

Other Protocols at DMARC TXT Record Labels

SPF records                     268,731
Sender-ID records                28,172
DKIM records                        176
                                -------
                                297,079

Bear in mind that TXT records are multivalued, and it is common to
publish records for multiple protocols at the same label - SPF and
Sender-ID are meant to operate this way, and SPF and DMARC are often
deployed this way. Unfortunately these records are often
(mis-)configured as wildcards for the entire domain, and get returned
for an TXT record request under that domain. Fortunately non-DMARC
records should have no effect being deployed at a "_dmarc" label.

But there are other recognizable schemes or protocols in use that
show up here, such as domain- or site-ownership verification.

Site Verification Strings at DMARC Labels

"bio=<hexstring>"               20,094
Google site verification         1,920
"v=msv1 t=<hexstring>"             246
Globalsign domain verification      82
                                ------
                                22,342

Again, this is probably due to the unecessary creation of wildcard
records for these functions.



Values For DMARC TXT Records
============================

Examination of more than 488,000 TXT records at labels beginning with
"_dmarc" yielded the following breakdown of recognized email
authentication protocols:

Valid DMARC records     184,805
SPF records             268,731
Sender-ID records        28,172
DKIM records                176
Other values             36,687

Many of these labels include both SPF and DMARC records, and Sender-ID
records are almost always seen alongside SPF records, so the above
breakdown does not match a simple count of TXT records with "_dmarc"
labels in the dataset.

Most of these record types were identified by string matching the
first several characters of the value. In all cases this was done
after any extraneous leading quotation marks or backslashes (the
escape character for many nameservers) were removed.

Classifying most records involved looking for matches of "v=spf1",
"spf/2.0", "v=DKIM1", et cetera at the beginning of the value. For
evaluating DMARC records, the payload was validated using the
Mail::DMARC module written by Matt Simerson.

While more rigorous validity checks for the non-DMARC record types
could be performed, it did not seem worthwhile for labels in the
"_dmarc" namespace.



Policies of Valid DMARC Records
===============================

We will present two views of the distribution of policies expressed in
valid DMARC records. One is a count of all policies expressed in the
dataset over the entire period. The other is a count based on the last
policy published for any given domain during the entire period. So if
example.com published one DMARC policy in 2014 and a different policy
in 2015, both would appear in the first breakdown below, but only the
2015 policy would be part of the second breakdown.

Note that some domains allow the subdomain policy ("sp=") to be set
via inheritance, while others set it explicitly to the same value
despite the inheritance mechanism. We will keep the two counts
separate as a reflection of domain owner behavior.

All valid DMARC policy records ever observed in dataset:

p=none                          121,333
p=none, sp=none                  19,152
p=none, sp=quarantine               126
p=none, sp=reject                   509

p=quarantine                      9,980
p=quarantine, sp=none             1,272
p=quarantine, sp=quarantine       1,750
p=quarantine, sp=reject           1,754

p=reject                         23,885
p=reject, sp=none                 2,580
p=reject, sp=quarantine              39
p=reject, sp=reject               2,296
                                -------
                                184,676

(Note that sums may not match other sections due to manual correction
of the classification of DMARC records after automatic processing.)


Last valid DMARC policy observed for any given label:

p=none                           72,597
p=none, sp=none                  12,174
p=none, sp=quarantine                58
p=none, sp=reject                   241

p=quarantine                      5,082
p=quarantine, sp=none               584
p=quarantine, sp=quarantine         873
p=quarantine, sp=reject           1,205

p=reject                         14,134
p=reject, sp=none                 1,640
p=reject, sp=quarantine              17
p=reject, sp=reject               1,052
                                -------
                                109,657


Domain owners and other potential implementors often ask how many
domains are using the different policies. For a simplified answer,
here is a percentage breakdown of the policy expressed at a given
label, ignoring subdomain policies. This will only reflect the last
policy observed for any given domain/label over the past six years.

Last valid DMARC policy observed for any given label:

p=none                  85,070          77.6%
p=quarantine             7,744           7.1%
p=reject                16,843          15.4%
                       -------
                       109,657

Please note that these figures reflect a mix of records still
published as of July 2016 (52,000+) and those no longer published
(57,000+). It would be far preferable to provide a breakdown of just
the records still published, but the raw policy records for those
labels are not available as of this writing. Those figures will be
captured and produced at the next quarterly update.

There are several questions around those DMARC records no longer being
published, and they are the subject of a separate analysis.



Random Strings in DMARC Labels
==============================

There are a number of non-standard strings that appear in the invalid
DMARC records in the dataset.

Records dynamically generated           872
kasserver.com                           673
Please contact your registr             152
This domain may be available for purch  113
UPDATE-                                  20
This domain's zone has been disabled     16
nodigispam                               10
*                                         6
Text                                      5
ERROR                                     1
dont do this                              1


There are other strings that appear very infrequently, but are a
little more personal.

Computer Reseller News/Russian Edition    7
Dare Contract Service NZ                  1
Entwickler, Admin, Techie ...com/jobs.php 5
Koninklijke Nederlandsche Voetbalbond     1
PC Magazine/Russian Edition               1
Red Cactus Design Limited                 1
Shannon Development Authority             2

A small number of other strings appeared because of operator error or
confusion in managing the domain's DNS data. The following strings
typify these cases:

   “For” “text” “record” “put:” “v=DMARC1; p=none; sp=none;
   rua=mailto:[email protected]; ruf=mailto:[email protected];
   rf=afrf; pct=100; ri=14400″

   “_dmarc.clementine.XXXXXXXX.fr descriptive text v=DMARC1;
   p=reject;”



Malformed DMARC Records
=======================

Of the more than 36,000 TXT records at DMARC labels that were not
valid DMARC records, only about 4,100 feature the string "DMARC" or
"v=dmarc". An exhaustive classification hasn't been compiled, but
here are the malformations that have been isolated so far.

(no "p=" tag/value)  1,698      Missing "p=" tag/value at policy label
                                (e.g. not ..._report._dmarc...)
v=dmarc1               810      The string "DMARC" must be upper case
DMARC1; p=...           62      Missing "v="
\\226\\128\\156v=DMARC1 40      Incorrect quoting/escaping
_dmarc v=DMARC1         19      Extraneous text
v=DMARC;                15      Missing the "1" in "DMARC1"
TXT \\\"v=DMARC1         7      Extraneous text and quoting
v=DMARC1/;               6      Incorrect escape character
v=DMARC1:                4      Use of ":" instead of ";"
v=DMARC1<SP>p=           4      Using "<SP>" at all
v%253DDMARC1%253B        3      Two levels of character escaping
v=DMARC1; \\226\\128     3      Incorrect quoting/escaping
=DMARC1                  2      Missing the "v" in "v=" tag/value pair
v-DMARC1                 1      Missing the "=" in "v=" tag/value pair

In addition, a number of records appear with widely varying numbers
and arrangements of escape or quoting characters. It is not at all
uncommon to see constructs like the following:

\"\\\"v=DMARC1;       618       Numerous escapes/quotes
\"v=DMARC1;\" \"p     187       Escapes/quotes within the record
\"v=DMARC1;\" \"p=\"    4       Escapes and missing "p=" value
\"\" \"v=DMARC1\"       2       Escapes, quotes, and spaces...
\" \\\"v=DMARC1;        1       ...



Incorrect Policy Values
=======================

The use of incorrect policy values in the "p=" tag/value pair, or of a
pre-publication tag name such as "policy" instead of "p", may be of
particular interest to the working group when refining the protocol
specification. For this reason they appear below instead of above with
the general malformed DMARC records.

p=monitor              125      Incorrect "p=" value
policy=none             52      Incorrect/outdated tag
p=monitor; sp=monitor   41      Incorrect "p=" and "sp=" value
p=nonee                 20      Incorrect "p=" value
p=;                     16      Missing value of "p=" tag/value pair
p=quarantaine           11      Incorrect "p=" value
p=quarentine             7      Incorrect "p=" value
p=quarintine             7      Incorrect "p=" value
p=non;                   3      Incorrect "p=" value
p=blocked                1      Incorrect "p=" value



Summary
=======

The majority of non-DMARC DNS TXT records appearing at "_dmarc" labels
in the six year Farsight dataset is most likely due to the widespread
use of wildcard SPF records. Given that, how many records from the
dataset should actually be considered as malformed or problematic
DMARC records?

Taking the records described in the last two sections, "Malformed
DMARC Records" and "Incorrect Policy Values," we can be fairly
confident we're considering records intended to be DMARC deployments.
That gives us 3,769 invalid DMARC records, recognizing there may be
some overlap of records with more than one type of flaw.

Since these are all the invalid records captured, they should be
compared to all the valid DMARC records captured (184,676). This
appears to yield an error rate of little over 2% for all DMARC records
in our dataset.

What are the most common faults observed? The most common fault in a
published DMARC record appears to be omission of the required "p="
policy tag. The second most common fault appears to be not ensuring
the string "DMARC" in the "v=" tag is in fact upper case. And the
third most common fault appears to involve using too many escape and
quoting characters in the nameserver data, resulting in extraneous
characters in the published record.

It's known that many mail receiver implementations will strip out some
quoting and escaping characters in DMARC records they retrieve, though
the details are not known. They may only strip the outermost pair,
only operate between the beginning and "v=" at the start of the
record, etc. While a given receiver's validation methods might produce
different totals, this issue is probably still a solid third place
ranking as a class of problem.

_______________________________________________
dmarc mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dmarc

[dmarc-ietf] Most common problems with DMARC records

Reply via email to