Re: [dmarc-ietf] Formal specification, URI

Alessandro Vesely Tue, 17 Mar 2015 08:29:46 -0700

On Mon 16/Mar/2015 20:22:31 +0100 Murray S. Kucherawy wrote: 
> On Mon, Mar 16, 2015 at 3:51 AM, Alessandro Vesely <[email protected]> wrote:
> 
>>> Section 2.2 of RFC3986 lists semi-colon as a reserved character that has to
>>> be percent-encoded in these URLs.  We don't need to repeat it here, I think.
>>
>> If the spec is going to be read by ignorants like me, it's better to repeat
>> than to omit.  RFC3986 has a very wide scope, and uses phrases like "may
>> (or may not) be defined as delimiters".  It says:
>>
>>    If data for a URI component would conflict with a reserved
>>    character's purpose as a delimiter, then the conflicting data must be
>>    percent-encoded before the URI is formed.
> 
> Right, which is why things like semi-colon don't need to be
> percent-encoded; they're already special characters in the context of a URL.


So are comma and exclamation.  What puzzles me is that DMARC spec treats them
differently while RFC3896 does not.  Comma and semicolon seem to behave the
same; e.g.:

http://www.tana.it/comma,comma.txt
http://www.tana.it/comma%2ccomma.txt
http://www.tana.it/comma%25%32%63comma.txt
http://www.tana.it/semicolon;semicolon.txt
http://www.tana.it/semicolon%3bsemicolon.txt
http://www.tana.it/semicolon%25%33%62semicolon.txt

>> Commma and exclamation (which are sub-delims like semicolon) are apparently
>> used in dmarc-uri's rule.  The preceding DMARC section says:
>>
>>    DMARC records follow the extensible "tag-value" syntax for DNS-based
>>    key records defined in DKIM [DKIM].
>>
>> However, DKIM production rules don't seem to be formally imported.  If
>> they are
>> imported, semicolon exclusion is implied by the definition:
>>
>>    VALCHAR   =  %x21-3A / %x3C-7E
>>                      ; EXCLAMATION to TILDE except SEMICOLON
> 
> They aren't formally imported, and I'm not sure that's necessary here.  The
> ABNF we have should be comprehensive over DMARC tag-value sets.  The prose
> you cited is merely meant to convey that they follow the same style.

Right.  The question is if implementations can reuse DKIM parsers.

>> How about the other two questions?  I didn't survey but a few DMARC
>> records, but RFC6068 exemplifies the following:
>>
>>    Also note that it is syntactically valid to specify both <to> and an
>>    <hfname> whose value is "to".  That is,
>>
>>    <mailto:[email protected],[email protected]>
>>
>>    is equivalent to
>>
>>    <mailto:[email protected],[email protected]>
>>
>>    is equivalent to
>>
>>    <mailto:[email protected][email protected]>
>>
>>    However, the latter form is NOT RECOMMENDED because different user
>>    agents handle this case differently.  In particular, some existing
>>    clients ignore "to" <hfvalue>s.
>>
>> Yahoo instead uses 1st level syntax:
>>
>>    rua=mailto:[email protected], mailto:[email protected];
> 
> Your question is "Are they equivalent?"  I believe they are.  Although it
> might be ideal to have a specification so tight that there's exactly one
> way to do something, in the end I don't think it's harmful to have two ways
> to say the same thing.  It's more of a concern if there's to ways to
> interpret a single thing; that's when we arguably have something to fix.

I tried the "NOT RECOMMENDED" syntax quoted above.  Dmarcian[1] doesn't raise a
brow, and RFC3896-compliant uriparser[2] ingests it smoothly.  However,
although I sent a test message to a gmail account, I received no report.  I
guess Google's implementation doesn't deploy a proper URI parser, but just
looks for "mailto:"; followed by a plain path consisting of a single[3]
addr-spec (as defined in RFC6068, i.e. w/o comments) with no query nor fragment
--that's what I'd do myself, but I find no arguments in the spec that help
proving that that record is bad.

[1] https://dmarcian.com/dmarc-inspector/torreinpietra.it
[2] http://uriparser.sourceforge.net/doc/html/
[3] haven't yet tried two %2c-separated addr-specs.

> The goal in allowing a comma-separated list of URLs is that you might
> conceivably want to put an http and a mailto URL in there, in the "try A
> first, then try B" sense.  We need to allow for that possibility.  We also
> need to account for the possibility of a comma that is inside of a URL;
> those are the ones that need to be encoded.  Outside of a URL, they're
> delimiters.

The spec says a report "is normally sent to each".  How can a publisher express
that two URIs are meant to be either-or alternatives to each other?

> Unless I'm missing something, the ABNF for DMARC allows all three of the
> cited examples, as well as Yahoo's use, and all four of them mean the same
> thing.  That doesn't strike me as a bug.

Recall that it took years and an RFC revision to have mailto: URIs treated with
reasonable uniformity by web browsers.  Here we are specifying an entirely new
kind of client, which avails of its specific URI-transmission protocol.  IMHO,
if we want %2c to be interpreted as addr-spec separator or otherwise, we ought
to spell it loud and clear.

It may also be worth to require domain in addr-spec to be A-label, as that
simplifies verification and improves dn_ compression.  Such idea apparently
conflicts with the example at the end of Section 6.3 of RFC6068, where the IDN
is percent-encoded instead.

Ale

_______________________________________________
dmarc mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dmarc

Re: [dmarc-ietf] Formal specification, URI

Reply via email to