Martin,

  1.  I’ll see if we can get this cleaned up (I’ll create a proper ticket for 
tracking this)
  2.  I’d welcome other inputs here on the original idea for this option.  I 
would imagine modern systems would be able to deal with rather large XML files, 
though MTAs routinely set limits under 50M for accepting messages.
  3.  I don’t think it suggests we should all send at the same time (Unless I’m 
reading a different section). It suggests that the report producer should 
create reports on the same UTC boundaries.  For example, we do abide by the day 
boundary, but our reports are generated a few hours after 0000UTC (and 
delivered upon completion).  If you’d like, I can put a clarifying note into 
the document.

--
Alex Brotman
Sr. Engineer, Anti-Abuse & Messaging Policy
Comcast

From: dmarc <dmarc-boun...@ietf.org> On Behalf Of Martin Kealey
Sent: Friday, May 7, 2021 1:24 AM
To: dmarc@ietf.org
Subject: [dmarc-ietf] nits in draft-ietf-dmarc-aggregate-reporting-02

I'm not quite sure how I'm supposed to submit nitpickery like this, so if 
there's a better forum please let me know.

1. Filename & content-type

Section 2.6.1 among other things says that the name for the mime-part 
containing the report MUST end with ".xml" or ".xml.gz", yet the example given 
ends with neither of those (it ends with just ".gz").

The main use for this is as a unique report identifier; its use as a filename 
is entirely secondary and only relevant to manual processing by a human, so 
MUST seems quite excessive.

It seems like there are separate drivers for each part of the filename suffix, 
and perhaps they should be two independent SHOULD requirements. If we want to 
facilitate its use as a filename, perhaps we should just say that the filename 
SHOULD be universally unique and MUST NOT contain "/" or start with ".".

It seems strange to vary the content-type based solely on what amounts to a 
transport optimization, namely gzip; this smells of working around deficiencies 
in other standards. (From the perspective of an application using email as a 
transport, it would seem to make more sense to allow 
"content-transfer-encoding" to be a chain such as "base64+gzip", or 
alternatively, for "content-type" to accept the addition of a "gzip/" prefix, 
forming "gzip/text/xml". However I digress, as that's a discussion for an 
entirely different standards track.)

According to rfc 7303 
§9.2<https://urldefense.com/v3/__https:/tools.ietf.org/html/rfc7303*section-9.2__;Iw!!CQl3mcHX2A!RFC9J_lhPahv_fk_lJ87nHfTTbdL4ck7ruqYS3RZJXmOfkOe73AOD1_nSftMp7jrqYjm$>,
 the "text/xml" content-type is merely an alias for "application/xml". Other 
standards such as related documents by 
w3c<https://urldefense.com/v3/__https:/www.w3.org/2006/02/son-of-3023/draft-murata-kohn-lilley-xml-04.html*textxml__;Iw!!CQl3mcHX2A!RFC9J_lhPahv_fk_lJ87nHfTTbdL4ck7ruqYS3RZJXmOfkOe73AOD1_nSftMp3d4W5dH$>
 go further in actively declaring it deprecated.

It seems to me that rfc7303 
§4.2<https://urldefense.com/v3/__https:/tools.ietf.org/html/rfc7303*section-4.2__;Iw!!CQl3mcHX2A!RFC9J_lhPahv_fk_lJ87nHfTTbdL4ck7ruqYS3RZJXmOfkOe73AOD1_nSftMp6dQnJve$>
 and rfc6838 
§4.2.5<https://urldefense.com/v3/__https:/tools.ietf.org/html/rfc6838*section-4.2.5__;Iw!!CQl3mcHX2A!RFC9J_lhPahv_fk_lJ87nHfTTbdL4ck7ruqYS3RZJXmOfkOe73AOD1_nSftMpzTRLTLD$>
 taken together indicate that registration of a content type such as 
application/dmarc-feedback+xml would be appropriate.

2. Size limit

I'm concerned that specifying the maximum report size after compression is 
possibly focussing on the wrong costs, and distorts the conceptual model:

  1.  It implies that the compressed file is the relevant artefact being 
transported, which leads to the weirdness with filenames and content-types 
mentioned above.
  2.  The size of the report is trivial compared with the size of the messages 
it's reporting on, both in terms of storage and bandwidth, and gzip 
decompression is very cheap, so compression makes negligible difference to 
those costs.
  3.  The cost of processing the received report to incorporate it into the 
bulk reporting correlates more closely with its "uncompressed" size. In 
particular, the memory footprint of the receiver process is likely to be 
correlated with this limit, especially if its first step is to build an 
in-memory DOM from the XML. (I would be surprised if any real report-accepting 
system didn't work this way.)

3. Scheduling

Concern about processing load also brings me to section 2.4.2, which 
essentially directs everyone to send their reports simultaneously. Since the 
receiver needs to be able handle reports with any reporting period, it seems 
likely that having most but not all reports arriving at the same time would be 
the worst outcome, needing both (a) complex coding to cope with asynchronous 
reporting, and (b) having to cope with high load spikes (or suffer delays with 
the reports spooled for batch processing).

It also imposes a load spike on the report generators, to generate all their 
reports at once (or spool and delay), but at least they can derive some benefit 
from not having overlapping reporting periods.

In the scheme of things this isn't a huge load compared with the actual 
processing of email, it seems like it would be preferable to allow the receiver 
to specify their preference in this regard, at least to choose between "UTC 
synchronized" and "randomized". Or for this document to specify "randomized" as 
the default.

-Martin
_______________________________________________
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc

Reply via email to