Re: [dmarc-ietf] Does Aggregate Reporting meet "Internet Scale" test?

Mark Alley Sun, 11 Dec 2022 17:13:02 -0800

Success reports and message count can give empirical evidence of adomain's mail flow health in most situations as far as implementationand monitoring goes, but there are circumstances that can cause datacorrelation from from this information to be less useful.

Anecdotal example - one domain I've worked with in the past doesnot have very high mail-flow, probably ~1-10k legitimate emails a monthto recipients whose choice email systems participate in feedbackreporting. Due to this domain's industry, it was highly coveted byspammers spoofing the domain, to the order of 10-100x their monthlylegitimate mail volume in terms of reporting.

In this case, traditional data metrics (such as aggregate % ofemails that pass/failed DMARC) becomes diluted by the amount of failurereports from illegitimate sources, rendering this type of informationmuch less helpful to the domain owner.

This raises another question; Hypothetically speaking, let's assumean alternate implementation of the DMARC RFC requires a feedbackreporter only send reports to a feedback endpoint in the event ofSPF/DKIM alignment/authentication failure, instead of both success andfailure reports. With the lack of evidence (success reports) to gaugeDMARC pass rates, would this inherent assumption that a domain starts at"100% DMARC compliance" be more useful in terms of data analysis byaggregators using this data? (i.e. It is assumed a domain is alreadycompliant, and failure reports are measured against this assumedcompliance metric.)

Addressing unaligned signatures, a domain owner could use these toinfer context of a particular mail-flow's handling or origin/purpose,but this is based off the assumption that said signature selectorname(s) or domain(s) are relevant to the owner's (perhaps previouslyunknown) uses.

An example being Microsoft 365's default DKIM signing domain(<tenant-name>.onmicrosoft.com) for tenants with DKIM unconfigured.Another example might be a domain owner wanting to determine if theirmail flow is being signed correctly or not from particular mailinfrastructure. But the absence of reporting on unaligned signaturescould also be misleading; an owner would not know if the message wasn'tsigned at all, or the signature(s) just didn't align for some reason.




On 12/11/2022 2:21 PM, Douglas Foster wrote:

I would not want to use randomization or percentages to discardactionable data.,


1)  When to send reports.

An actionable result is one which says "this server sent a messagewithout a verifiable and aligned DKIM signature". This applies because:- Any message can be subject to forwarding, so any attempt to movetoward "reject" implies a need to put DKIM signatures on every message.- SPF results are overridden if DKIM is verified and aligned, becausea perfectly formed message can be SPF FAIL if forwarded withoutMailFrom rewrite, or SPF Not Aligned if forwarded with MailFrom rewrite.

A report which has only DKIM PASS results can be called a "successreport". It does not provide actionable data, and is thereforeunnecessary. However, a system which never receives a report is atrisk of undetected configuration errors, so it becomes necessary tosend occasional success reports to protect against this risk. Wecould accomplish this with a SHOULD rule to send a success report, ona weekly basis, to X% of domains that had only success results. Thesuccess reports will also allow the domain owner to identify andcorrect SPF policy errors, if he has them.


2) What to include in reports

I have one reporting source that always reports a message count of 1,without regard to the number of messages that I sent and he received. This helped me realize that there is no need to report quantity. Acorrectly configured server will apply a correct signature on everymessage. Whether the problem is uniform or random, all that thedomain owner needs to know is that a particular server is not signingcorrectly.

And as I have said before, collecting every signature adds unnecessarycomplication to the reporting process, while adding no value to thedomain owner. All that needs to be reported is one alignedsignature, because the domain owner's server only needs to apply onealigned signature.

These changes would reduce the overhead reporting, especially forsmaller organizations where the effort is not noise level. They wouldalso reduce the risk of unwanted data leakage.

But I am willing to be convinced. Can someone explain how successreports, message counts, or unaligned signatures serve a domain ownerpurpose which is relevant to DMARC?


Doug

On Thu, Dec 8, 2022 at 7:56 AM Mark Alley<[email protected]> wrote:


    Adding clarification since I forgot to specify - this would be
    per-sender per-source. Not a set percentage of all mail received
    from a source, that obviously would not work as intended.

    On 12/8/2022 6:52 AM, Mark Alley wrote:


    This may have been thought of before, so forgive the potentially
    duplicate idea, I was musing earlier about feedback reporting
    based on a percent of the overall mail per-source. I'm thinking
    of something similar in concept to the pct= tag for published policy.

    This would reduce the overhead required to report from particular
    sources... But as I'm typing this idea out, this seems less than
    feasible due to the other considerations that come to mind; If a
    receiver designed to report only on 10% of mail received from a
    source, was sent 100 emails from said source, and the 80 of those
    emails of mail were forwards, the feedback would be
    overwhelmingly biased towards forwarding data, and the sender
    would miss out on reports from direct senders and therefore fully
    compliant (and arguably more useful) reports.

    Evolving on this thought, if a receiver reported subset
    percentages of all different types of compliant/non-compliant
    email per-source (SPF fails/DKIM passes, SPF passes/DKIM fails...
    etc, etc.) this might provide the data needed while still keeping
    the reporting volume manageable for less internet-scale receivers.

    Though, it goes without saying, this type of reporting would be
    woefully inadequate in terms of data availability, and only gives
    an idea of traffic types seen, not inclusive of all-encompassing
    volumetric data that could be derived normally from feedback
    reporters that process all emails.


    On 12/8/2022 12:58 AM, Douglas Foster wrote:


    1) DMARC was a successful 2-company experiment, which was turned
    into a widely implemented informational RFP.   We are now
    writing the standards-track version of that concept.  We hope
    that Standards Track will provide the basis for significantly
    increased adoption.  This seems the appropriate time to ask
    whether the design can be optimized for efficiency. If you were
    designing from scratch, would this reporting design be the
    result?   What alternatives have we considered and ruled out?

    2) The burden of reporting is not experienced equally by all
    report senders.   If I send a batch of messages from 1 source
    domain to:
    - 10 target domains at Google, I will get 1 report, because
    Google consolidates across target domains.
    - 10 target domains at Yahoo, I will get 10 reports, because
    Yahoo chooses to disaggregate by target domain.
    - 10 target domains to Ironport clients, I will get 20 or 30
    reports.    These are client-specific appliances, many clients
    have multiple appliances configured in parallel for load
    balancing, and each appliance produces its own report.

    Google presumably can dedicate servers to the reporting
    function, while the Ironport servers seem to generate reports in
    parallel with message processing.   Altogether, I conclude that
    Google can absorb an increase in workload much more easily than
    an appliance

    3) The burden of reporting is not shared equally at present. 
     Substantially all of my reporting comes from the three sources
    just stated:  Google, Yahoo, and Ironport appliances.  Since
    these organizations have not been actively participating,
    perhaps you are right and they are happy with the present
    design.   On the other hand, perhaps someone with connections
    should ask them whether they want to see optimizations.

    4) As DMARC participation grows, the growth curve is not really
    linear.  Currently, 40% of my mailstream is covered by DMARC
    reporting because more than 30% of my outbound mail goes to
    Google servers.   Altogether, the number of reporting domains,
    from all sources, is somewhere around 40.  To move reporting
    from 40% of messages to 40% of domains, the volume of reports
    will grow by orders of magnitude.

    5) Which then raises the question of, "Who do we expect to do
    reporting?"    Several participants in this group have expressed
    the conviction that everyone who benefits from DMARC should also
    contribute to DMARC by doing reporting.    This seems fair, but
    it is probably not necessary.  Reporting from Google alone is
    probably sufficient for domain owners to know whether or not
    their servers are properly configured.    But as long as we want
    everyone to participate, we cannot assume that everyone will
    have Google's resources to contribute to the reporting task.

    All of which says to me that we should be looking to optimize
    the reporting function to minimize the cost of participation.

    Doug Foster


    On Tue, Dec 6, 2022 at 10:15 PM Seth Blank <[email protected]>
    wrote:

        I'm super unclear what you're talking about.

        https://dmarc.org/2022/03/dmarc-policies-up-84-for-2021/

        Aggregate reporting is used by the largest volume senders on
        earth, and the vast majority of mail received by mailbox
        providers comes with a dmarc record and reporting address
        attached.

        This is umpteen billions of messages a day that get
        aggregated into reports.

        What are you getting at? That seems pretty internet scale to
        me...

        Seth

        On Mon, Dec 5, 2022 at 2:01 PM Douglas Foster
        <[email protected]> wrote:

            I began wondering if Aggregate Reporting works only
            because DMARC has been embraced by a small portion of
            domain owners.

            1) Is Aggregate Reporting a significant portion of all
            mail?  In some cases, Yes.

            My organization's data:
            Inbound volume is 11 times greater than my outbound volume.
            Inbound mail has 1 new domain for every 5 messages

            Net result:   If I were to do reporting, and reporting
            became requested for most or all domains, my outbound
            mail volume would triple, because my outbound report
            volume would be twice as large as my outbound business
            mail volume.

            2) Is Aggregate Reporting efficient?  Restating previous
            concerns:

            "All Signature" reporting means:
            We keep evaluating even after successful authentication
            has been established,
            so that we can capture and store data of little actual
            value,
            even though it causes reduced aggregation and longer
            reports.

            "No Problems found, No changes found" reporting means:
            We send redundant reports day after day.

            "All Requesters" reporting means:
            We send reports even to domain owners that were blocked
            because of domain reputation.

            A good place to start would be to extend the reporting
            interval for no-problem-found reports.

            Doug Foster


            _______________________________________________
            dmarc mailing list
            [email protected]
            https://www.ietf.org/mailman/listinfo/dmarc


    _______________________________________________
    dmarc mailing list
    [email protected]
    https://www.ietf.org/mailman/listinfo/dmarc

    _______________________________________________
    dmarc mailing list
    [email protected]
    https://www.ietf.org/mailman/listinfo/dmarc


_______________________________________________
dmarc mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dmarc

_______________________________________________
dmarc mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dmarc

Re: [dmarc-ietf] Does Aggregate Reporting meet "Internet Scale" test?

Reply via email to