In June/July I slogged through about six years of captured email
authentication records, and wrote a blog post about what I found
regarding DMARC policy records.
(https://dmarc.org/2016/07/common-problems-with-dmarc-records/)
A colleague suggested that a write-up with more numbers would be
considered useful for this working group. So I did a lot more slogging,
and here's the result. There are undoubtedly shortcomings in the
methodology, feel free to point them out if you enjoy that sort of
thing. No telling when I could incorporate any suggestions, and I am not
able to share the underlying data - sorry. But I will receive any
suggestions gladly and save them for later use.
tl;dr
=====
Two percent of all recognizable DMARC records published were invalid.
The three top problems with DMARC records observed over six years were:
1: Missing "p=" tag
2: The "DMARC" in "v=DMARC1" wasn't capitalized
3: Extraneous escape/quoting characters in record
There's a "Summary" at the bottom if you want it.
Introduction
============
Farsight Security has made over six years of DNS data related to email
authentication protocols available to DMARC.org. We recently reviewed
DMARC records across this period, which seems like it might provide
useful input for the IETF DMARC working group.
The Farsight data is captured by network probes proximate to a number
of nameservers. While this represents a subset of all Internet
nameservice traffic, it nevertheless provides a unique opportunity to
extract trends in the use of email authentication over time.
Records in the Dataset
======================
In all, 41 million records were made available, captured in the period
from June 2010 through June 2016 (minus three weeks in December
2015). This included over 37 million TXT records and over 1.5 million
CNAME records. Almost 924,000 records dealt with labels beginning with
"_dmarc."
Non-DMARC Records at DMARC Labels
=================================
Out of almost 924,000 records in the dataset at DMARC labels
(beginning "_dmarc"), over 435,000 were not even TXT records. Here is
a breakdown of all the resource record types appearing at these
labels:
Resource Record Types at DMARC Labels
A records 3,487
AAAA records 10
NS records 9,053
NSEC records 3,793
SOA records 182,959
TXT records 488,656
Many of the TXT records at DMARC labels were in fact records for a
different protocol such as SPF, Sender-ID, or DKIM.
Other Protocols at DMARC TXT Record Labels
SPF records 268,731
Sender-ID records 28,172
DKIM records 176
-------
297,079
Bear in mind that TXT records are multivalued, and it is common to
publish records for multiple protocols at the same label - SPF and
Sender-ID are meant to operate this way, and SPF and DMARC are often
deployed this way. Unfortunately these records are often
(mis-)configured as wildcards for the entire domain, and get returned
for an TXT record request under that domain. Fortunately non-DMARC
records should have no effect being deployed at a "_dmarc" label.
But there are other recognizable schemes or protocols in use that
show up here, such as domain- or site-ownership verification.
Site Verification Strings at DMARC Labels
"bio=<hexstring>" 20,094
Google site verification 1,920
"v=msv1 t=<hexstring>" 246
Globalsign domain verification 82
------
22,342
Again, this is probably due to the unecessary creation of wildcard
records for these functions.
Values For DMARC TXT Records
============================
Examination of more than 488,000 TXT records at labels beginning with
"_dmarc" yielded the following breakdown of recognized email
authentication protocols:
Valid DMARC records 184,805
SPF records 268,731
Sender-ID records 28,172
DKIM records 176
Other values 36,687
Many of these labels include both SPF and DMARC records, and Sender-ID
records are almost always seen alongside SPF records, so the above
breakdown does not match a simple count of TXT records with "_dmarc"
labels in the dataset.
Most of these record types were identified by string matching the
first several characters of the value. In all cases this was done
after any extraneous leading quotation marks or backslashes (the
escape character for many nameservers) were removed.
Classifying most records involved looking for matches of "v=spf1",
"spf/2.0", "v=DKIM1", et cetera at the beginning of the value. For
evaluating DMARC records, the payload was validated using the
Mail::DMARC module written by Matt Simerson.
While more rigorous validity checks for the non-DMARC record types
could be performed, it did not seem worthwhile for labels in the
"_dmarc" namespace.
Policies of Valid DMARC Records
===============================
We will present two views of the distribution of policies expressed in
valid DMARC records. One is a count of all policies expressed in the
dataset over the entire period. The other is a count based on the last
policy published for any given domain during the entire period. So if
example.com published one DMARC policy in 2014 and a different policy
in 2015, both would appear in the first breakdown below, but only the
2015 policy would be part of the second breakdown.
Note that some domains allow the subdomain policy ("sp=") to be set
via inheritance, while others set it explicitly to the same value
despite the inheritance mechanism. We will keep the two counts
separate as a reflection of domain owner behavior.
All valid DMARC policy records ever observed in dataset:
p=none 121,333
p=none, sp=none 19,152
p=none, sp=quarantine 126
p=none, sp=reject 509
p=quarantine 9,980
p=quarantine, sp=none 1,272
p=quarantine, sp=quarantine 1,750
p=quarantine, sp=reject 1,754
p=reject 23,885
p=reject, sp=none 2,580
p=reject, sp=quarantine 39
p=reject, sp=reject 2,296
-------
184,676
(Note that sums may not match other sections due to manual correction
of the classification of DMARC records after automatic processing.)
Last valid DMARC policy observed for any given label:
p=none 72,597
p=none, sp=none 12,174
p=none, sp=quarantine 58
p=none, sp=reject 241
p=quarantine 5,082
p=quarantine, sp=none 584
p=quarantine, sp=quarantine 873
p=quarantine, sp=reject 1,205
p=reject 14,134
p=reject, sp=none 1,640
p=reject, sp=quarantine 17
p=reject, sp=reject 1,052
-------
109,657
Domain owners and other potential implementors often ask how many
domains are using the different policies. For a simplified answer,
here is a percentage breakdown of the policy expressed at a given
label, ignoring subdomain policies. This will only reflect the last
policy observed for any given domain/label over the past six years.
Last valid DMARC policy observed for any given label:
p=none 85,070 77.6%
p=quarantine 7,744 7.1%
p=reject 16,843 15.4%
-------
109,657
Please note that these figures reflect a mix of records still
published as of July 2016 (52,000+) and those no longer published
(57,000+). It would be far preferable to provide a breakdown of just
the records still published, but the raw policy records for those
labels are not available as of this writing. Those figures will be
captured and produced at the next quarterly update.
There are several questions around those DMARC records no longer being
published, and they are the subject of a separate analysis.
Random Strings in DMARC Labels
==============================
There are a number of non-standard strings that appear in the invalid
DMARC records in the dataset.
Records dynamically generated 872
kasserver.com 673
Please contact your registr 152
This domain may be available for purch 113
UPDATE- 20
This domain's zone has been disabled 16
nodigispam 10
* 6
Text 5
ERROR 1
dont do this 1
There are other strings that appear very infrequently, but are a
little more personal.
Computer Reseller News/Russian Edition 7
Dare Contract Service NZ 1
Entwickler, Admin, Techie ...com/jobs.php 5
Koninklijke Nederlandsche Voetbalbond 1
PC Magazine/Russian Edition 1
Red Cactus Design Limited 1
Shannon Development Authority 2
A small number of other strings appeared because of operator error or
confusion in managing the domain's DNS data. The following strings
typify these cases:
“For” “text” “record” “put:” “v=DMARC1; p=none; sp=none;
rua=mailto:[email protected]; ruf=mailto:[email protected];
rf=afrf; pct=100; ri=14400″
“_dmarc.clementine.XXXXXXXX.fr descriptive text v=DMARC1;
p=reject;”
Malformed DMARC Records
=======================
Of the more than 36,000 TXT records at DMARC labels that were not
valid DMARC records, only about 4,100 feature the string "DMARC" or
"v=dmarc". An exhaustive classification hasn't been compiled, but
here are the malformations that have been isolated so far.
(no "p=" tag/value) 1,698 Missing "p=" tag/value at policy label
(e.g. not ..._report._dmarc...)
v=dmarc1 810 The string "DMARC" must be upper case
DMARC1; p=... 62 Missing "v="
\\226\\128\\156v=DMARC1 40 Incorrect quoting/escaping
_dmarc v=DMARC1 19 Extraneous text
v=DMARC; 15 Missing the "1" in "DMARC1"
TXT \\\"v=DMARC1 7 Extraneous text and quoting
v=DMARC1/; 6 Incorrect escape character
v=DMARC1: 4 Use of ":" instead of ";"
v=DMARC1<SP>p= 4 Using "<SP>" at all
v%253DDMARC1%253B 3 Two levels of character escaping
v=DMARC1; \\226\\128 3 Incorrect quoting/escaping
=DMARC1 2 Missing the "v" in "v=" tag/value pair
v-DMARC1 1 Missing the "=" in "v=" tag/value pair
In addition, a number of records appear with widely varying numbers
and arrangements of escape or quoting characters. It is not at all
uncommon to see constructs like the following:
\"\\\"v=DMARC1; 618 Numerous escapes/quotes
\"v=DMARC1;\" \"p 187 Escapes/quotes within the record
\"v=DMARC1;\" \"p=\" 4 Escapes and missing "p=" value
\"\" \"v=DMARC1\" 2 Escapes, quotes, and spaces...
\" \\\"v=DMARC1; 1 ...
Incorrect Policy Values
=======================
The use of incorrect policy values in the "p=" tag/value pair, or of a
pre-publication tag name such as "policy" instead of "p", may be of
particular interest to the working group when refining the protocol
specification. For this reason they appear below instead of above with
the general malformed DMARC records.
p=monitor 125 Incorrect "p=" value
policy=none 52 Incorrect/outdated tag
p=monitor; sp=monitor 41 Incorrect "p=" and "sp=" value
p=nonee 20 Incorrect "p=" value
p=; 16 Missing value of "p=" tag/value pair
p=quarantaine 11 Incorrect "p=" value
p=quarentine 7 Incorrect "p=" value
p=quarintine 7 Incorrect "p=" value
p=non; 3 Incorrect "p=" value
p=blocked 1 Incorrect "p=" value
Summary
=======
The majority of non-DMARC DNS TXT records appearing at "_dmarc" labels
in the six year Farsight dataset is most likely due to the widespread
use of wildcard SPF records. Given that, how many records from the
dataset should actually be considered as malformed or problematic
DMARC records?
Taking the records described in the last two sections, "Malformed
DMARC Records" and "Incorrect Policy Values," we can be fairly
confident we're considering records intended to be DMARC deployments.
That gives us 3,769 invalid DMARC records, recognizing there may be
some overlap of records with more than one type of flaw.
Since these are all the invalid records captured, they should be
compared to all the valid DMARC records captured (184,676). This
appears to yield an error rate of little over 2% for all DMARC records
in our dataset.
What are the most common faults observed? The most common fault in a
published DMARC record appears to be omission of the required "p="
policy tag. The second most common fault appears to be not ensuring
the string "DMARC" in the "v=" tag is in fact upper case. And the
third most common fault appears to involve using too many escape and
quoting characters in the nameserver data, resulting in extraneous
characters in the published record.
It's known that many mail receiver implementations will strip out some
quoting and escaping characters in DMARC records they retrieve, though
the details are not known. They may only strip the outermost pair,
only operate between the beginning and "v=" at the start of the
record, etc. While a given receiver's validation methods might produce
different totals, this issue is probably still a solid third place
ranking as a class of problem.
--
Steven M Jones
DMARC.org
e: [email protected], [email protected]
tl;dr
=====
Two percent of all recognizable DMARC records published were invalid.
The three top problems with DMARC records observed over six years were:
1: Missing "p=" tag
2: The "DMARC" in "v=DMARC1" wasn't capitalized
3: Extraneous escape/quoting characters in record
There's a "Summary" at the bottom if you want it.
Introduction
============
Farsight Security has made over six years of DNS data related to email
authentication protocols available to DMARC.org. We recently reviewed
DMARC records across this period, which seems like it might provide
useful input for the IETF DMARC working group.
The Farsight data is captured by network probes proximate to a number
of nameservers. While this represents a subset of all Internet
nameservice traffic, it nevertheless provides a unique opportunity to
extract trends in the use of email authentication over time.
Records in the Dataset
======================
In all, 41 million records were made available, captured in the period
from June 2010 through June 2016 (minus three weeks in December
2015). This included over 37 million TXT records and over 1.5 million
CNAME records. Almost 924,000 records dealt with labels beginning with
"_dmarc."
Non-DMARC Records at DMARC Labels
=================================
Out of almost 924,000 records in the dataset at DMARC labels
(beginning "_dmarc"), over 435,000 were not even TXT records. Here is
a breakdown of all the resource record types appearing at these
labels:
Resource Record Types at DMARC Labels
A records 3,487
AAAA records 10
NS records 9,053
NSEC records 3,793
SOA records 182,959
TXT records 488,656
Many of the TXT records at DMARC labels were in fact records for a
different protocol such as SPF, Sender-ID, or DKIM.
Other Protocols at DMARC TXT Record Labels
SPF records 268,731
Sender-ID records 28,172
DKIM records 176
-------
297,079
Bear in mind that TXT records are multivalued, and it is common to
publish records for multiple protocols at the same label - SPF and
Sender-ID are meant to operate this way, and SPF and DMARC are often
deployed this way. Unfortunately these records are often
(mis-)configured as wildcards for the entire domain, and get returned
for an TXT record request under that domain. Fortunately non-DMARC
records should have no effect being deployed at a "_dmarc" label.
But there are other recognizable schemes or protocols in use that
show up here, such as domain- or site-ownership verification.
Site Verification Strings at DMARC Labels
"bio=<hexstring>" 20,094
Google site verification 1,920
"v=msv1 t=<hexstring>" 246
Globalsign domain verification 82
------
22,342
Again, this is probably due to the unecessary creation of wildcard
records for these functions.
Values For DMARC TXT Records
============================
Examination of more than 488,000 TXT records at labels beginning with
"_dmarc" yielded the following breakdown of recognized email
authentication protocols:
Valid DMARC records 184,805
SPF records 268,731
Sender-ID records 28,172
DKIM records 176
Other values 36,687
Many of these labels include both SPF and DMARC records, and Sender-ID
records are almost always seen alongside SPF records, so the above
breakdown does not match a simple count of TXT records with "_dmarc"
labels in the dataset.
Most of these record types were identified by string matching the
first several characters of the value. In all cases this was done
after any extraneous leading quotation marks or backslashes (the
escape character for many nameservers) were removed.
Classifying most records involved looking for matches of "v=spf1",
"spf/2.0", "v=DKIM1", et cetera at the beginning of the value. For
evaluating DMARC records, the payload was validated using the
Mail::DMARC module written by Matt Simerson.
While more rigorous validity checks for the non-DMARC record types
could be performed, it did not seem worthwhile for labels in the
"_dmarc" namespace.
Policies of Valid DMARC Records
===============================
We will present two views of the distribution of policies expressed in
valid DMARC records. One is a count of all policies expressed in the
dataset over the entire period. The other is a count based on the last
policy published for any given domain during the entire period. So if
example.com published one DMARC policy in 2014 and a different policy
in 2015, both would appear in the first breakdown below, but only the
2015 policy would be part of the second breakdown.
Note that some domains allow the subdomain policy ("sp=") to be set
via inheritance, while others set it explicitly to the same value
despite the inheritance mechanism. We will keep the two counts
separate as a reflection of domain owner behavior.
All valid DMARC policy records ever observed in dataset:
p=none 121,333
p=none, sp=none 19,152
p=none, sp=quarantine 126
p=none, sp=reject 509
p=quarantine 9,980
p=quarantine, sp=none 1,272
p=quarantine, sp=quarantine 1,750
p=quarantine, sp=reject 1,754
p=reject 23,885
p=reject, sp=none 2,580
p=reject, sp=quarantine 39
p=reject, sp=reject 2,296
-------
184,676
(Note that sums may not match other sections due to manual correction
of the classification of DMARC records after automatic processing.)
Last valid DMARC policy observed for any given label:
p=none 72,597
p=none, sp=none 12,174
p=none, sp=quarantine 58
p=none, sp=reject 241
p=quarantine 5,082
p=quarantine, sp=none 584
p=quarantine, sp=quarantine 873
p=quarantine, sp=reject 1,205
p=reject 14,134
p=reject, sp=none 1,640
p=reject, sp=quarantine 17
p=reject, sp=reject 1,052
-------
109,657
Domain owners and other potential implementors often ask how many
domains are using the different policies. For a simplified answer,
here is a percentage breakdown of the policy expressed at a given
label, ignoring subdomain policies. This will only reflect the last
policy observed for any given domain/label over the past six years.
Last valid DMARC policy observed for any given label:
p=none 85,070 77.6%
p=quarantine 7,744 7.1%
p=reject 16,843 15.4%
-------
109,657
Please note that these figures reflect a mix of records still
published as of July 2016 (52,000+) and those no longer published
(57,000+). It would be far preferable to provide a breakdown of just
the records still published, but the raw policy records for those
labels are not available as of this writing. Those figures will be
captured and produced at the next quarterly update.
There are several questions around those DMARC records no longer being
published, and they are the subject of a separate analysis.
Random Strings in DMARC Labels
==============================
There are a number of non-standard strings that appear in the invalid
DMARC records in the dataset.
Records dynamically generated 872
kasserver.com 673
Please contact your registr 152
This domain may be available for purch 113
UPDATE- 20
This domain's zone has been disabled 16
nodigispam 10
* 6
Text 5
ERROR 1
dont do this 1
There are other strings that appear very infrequently, but are a
little more personal.
Computer Reseller News/Russian Edition 7
Dare Contract Service NZ 1
Entwickler, Admin, Techie ...com/jobs.php 5
Koninklijke Nederlandsche Voetbalbond 1
PC Magazine/Russian Edition 1
Red Cactus Design Limited 1
Shannon Development Authority 2
A small number of other strings appeared because of operator error or
confusion in managing the domain's DNS data. The following strings
typify these cases:
“For” “text” “record” “put:” “v=DMARC1; p=none; sp=none;
rua=mailto:[email protected]; ruf=mailto:[email protected];
rf=afrf; pct=100; ri=14400″
“_dmarc.clementine.XXXXXXXX.fr descriptive text v=DMARC1;
p=reject;”
Malformed DMARC Records
=======================
Of the more than 36,000 TXT records at DMARC labels that were not
valid DMARC records, only about 4,100 feature the string "DMARC" or
"v=dmarc". An exhaustive classification hasn't been compiled, but
here are the malformations that have been isolated so far.
(no "p=" tag/value) 1,698 Missing "p=" tag/value at policy label
(e.g. not ..._report._dmarc...)
v=dmarc1 810 The string "DMARC" must be upper case
DMARC1; p=... 62 Missing "v="
\\226\\128\\156v=DMARC1 40 Incorrect quoting/escaping
_dmarc v=DMARC1 19 Extraneous text
v=DMARC; 15 Missing the "1" in "DMARC1"
TXT \\\"v=DMARC1 7 Extraneous text and quoting
v=DMARC1/; 6 Incorrect escape character
v=DMARC1: 4 Use of ":" instead of ";"
v=DMARC1<SP>p= 4 Using "<SP>" at all
v%253DDMARC1%253B 3 Two levels of character escaping
v=DMARC1; \\226\\128 3 Incorrect quoting/escaping
=DMARC1 2 Missing the "v" in "v=" tag/value pair
v-DMARC1 1 Missing the "=" in "v=" tag/value pair
In addition, a number of records appear with widely varying numbers
and arrangements of escape or quoting characters. It is not at all
uncommon to see constructs like the following:
\"\\\"v=DMARC1; 618 Numerous escapes/quotes
\"v=DMARC1;\" \"p 187 Escapes/quotes within the record
\"v=DMARC1;\" \"p=\" 4 Escapes and missing "p=" value
\"\" \"v=DMARC1\" 2 Escapes, quotes, and spaces...
\" \\\"v=DMARC1; 1 ...
Incorrect Policy Values
=======================
The use of incorrect policy values in the "p=" tag/value pair, or of a
pre-publication tag name such as "policy" instead of "p", may be of
particular interest to the working group when refining the protocol
specification. For this reason they appear below instead of above with
the general malformed DMARC records.
p=monitor 125 Incorrect "p=" value
policy=none 52 Incorrect/outdated tag
p=monitor; sp=monitor 41 Incorrect "p=" and "sp=" value
p=nonee 20 Incorrect "p=" value
p=; 16 Missing value of "p=" tag/value pair
p=quarantaine 11 Incorrect "p=" value
p=quarentine 7 Incorrect "p=" value
p=quarintine 7 Incorrect "p=" value
p=non; 3 Incorrect "p=" value
p=blocked 1 Incorrect "p=" value
Summary
=======
The majority of non-DMARC DNS TXT records appearing at "_dmarc" labels
in the six year Farsight dataset is most likely due to the widespread
use of wildcard SPF records. Given that, how many records from the
dataset should actually be considered as malformed or problematic
DMARC records?
Taking the records described in the last two sections, "Malformed
DMARC Records" and "Incorrect Policy Values," we can be fairly
confident we're considering records intended to be DMARC deployments.
That gives us 3,769 invalid DMARC records, recognizing there may be
some overlap of records with more than one type of flaw.
Since these are all the invalid records captured, they should be
compared to all the valid DMARC records captured (184,676). This
appears to yield an error rate of little over 2% for all DMARC records
in our dataset.
What are the most common faults observed? The most common fault in a
published DMARC record appears to be omission of the required "p="
policy tag. The second most common fault appears to be not ensuring
the string "DMARC" in the "v=" tag is in fact upper case. And the
third most common fault appears to involve using too many escape and
quoting characters in the nameserver data, resulting in extraneous
characters in the published record.
It's known that many mail receiver implementations will strip out some
quoting and escaping characters in DMARC records they retrieve, though
the details are not known. They may only strip the outermost pair,
only operate between the beginning and "v=" at the start of the
record, etc. While a given receiver's validation methods might produce
different totals, this issue is probably still a solid third place
ranking as a class of problem.
_______________________________________________
dmarc mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dmarc