On 10/8/24 22:35, Matthäus Wander wrote:
> Am 2024-10-08 um 11:43 schrieb Alessandro Vesely:
>> However clever, then, those expressions will never exclude RFC 1918 and
>> other addresses which are not valid or not useful in an aggregate
>> report. And maybe there are still bugs that exclude valid ones. So,
>> why don't we replace all that toilsome stuff with an easy one-liner:
>>
>> <xs:pattern value="[0-9a-fA-F.:]{2,45}"/>
>>
>> (string of two to forty five hexadecimal numbers, dots and columns)?
TL;DR
RFC 7489, in the DMARC XML Schema of Appendix C, only allowed the full
IPv6 address textual representation without zero bit compression (::)
([A-Fa-f0-9]{1,4}:){7}[A-Fa-f0-9]{1,4}
And, judging from the aggregate reports we receive, that clearly did not
prevent the implementors from using zero bit compression of IPv6
addresses in the XML files when sending reports.
I think it is prudent to specify what values we want to see in those
reports, and that is, in my opinion, values matching the canonical
textual representation format specified by RFC 5952.
I've prepared a pull request to fix up the IPv6 regular expressions to
validate according to that format, and to add comments telling that we
want to see one of:
* A globally routable IPv4 unicast address
in the dotted-decimal format
* A globally routable IPv6 Global Unicast address
in the canonical textual representation format;
see RFC 5952 for details
https://github.com/ietf-wg-dmarc/draft-ietf-dmarc-aggregate-reporting/pull/20
Longer version
Investigating and writing this e-mail made my head spin a little, but I
think I've spent enough time to avoid any severe mistake. I have tested
locally, and using regex101.
> We have seen a few attempts at formulating precise regexps for syntax
> checking IP addresses, which turned out to be faulty in rare corner
> cases.
Indeed, and the current IPv6 regex patterns in the XSD is among them.
A comment near the list of the IPv6 regular expressions says: "RFC 5952
zero compression IPv6 (lax)", however
([A-Fa-f\d]{1,4}:){1,7}:
Matches 7 groups of up to 4 hexdigits + ":", then a trailing ":"
This is an invalid text representation, as RFC 5952 (#4.2.2) does not
allow to use "::" compression on just one 16 bit field.
([A-Fa-f\d]{1,4}:){1,6}:[A-Fa-f\d]{1,4}
([A-Fa-f\d]{1,4}:){1,5}:[A-Fa-f\d]{1,4}:[A-Fa-f\d]{1,4}
([A-Fa-f\d]{1,4}:){1,4}:([A-Fa-f\d]{1,4}:){1,2}[A-Fa-f\d]{1,4}
([A-Fa-f\d]{1,4}:){1,3}:([A-Fa-f\d]{1,4}:){1,3}[A-Fa-f\d]{1,4}
([A-Fa-f\d]{1,4}:){1,2}:([A-Fa-f\d]{1,4}:){1,4}[A-Fa-f\d]{1,4}
[A-Fa-f\d]{1,4}::([A-Fa-f\d]{1,4}:){1,5}[A-Fa-f\d]{1,4}
Similarly, all of these fail, because the maximum repetition of the
first group + the maximum repetition of the following parts is 7, and
that allows just one group of 16 bits to be eliminated by "::", which is
invalid.
Also, it's hard to read.
::([A-Fa-f\d]{1,4}:){1,6}[A-Fa-f\d]{1,4}
"::" + up to 7 groups of ":" separated hexdigits, 7 is invalid, see above.
::[A-Fa-f\d]{1,4}
loopback address, etc.
Here I present an alternative approach to the regex.
Starting from the legible:
* No zero-bit compression
([a-f\d]{1,4}:){7}[a-f\d]{1,4}
* Zero-bit compression
Maximum 6 groups of ":" separated hexdigits including one instance of
"::"; formatted for legibility:
([a-f\d]{1,4}:){1,6}:
([a-f\d]{1,4}:){1,5}(:[a-f\d]{1,4}){1,1}
([a-f\d]{1,4}:){1,4}(:[a-f\d]{1,4}){1,2}
([a-f\d]{1,4}:){1,3}(:[a-f\d]{1,4}){1,3}
([a-f\d]{1,4}:){1,2}(:[a-f\d]{1,4}){1,4}
([a-f\d]{1,4}:){1,1}(:[a-f\d]{1,4}){1,5}
:(:[a-f\d]{1,4}){1,6}
A tad easier to read and visually balanced, but there are superfluous
syntax added for legibility that can be eliminated.
Uppercase letters are disallowed by RFC 5952 (#4.3)
With the redundant syntax removed:
([a-f\d]{1,4}:){7}[a-f\d]{1,4}
([a-f\d]{1,4}:){1,6}
([a-f\d]{1,4}:){1,5}:[a-f\d]{1,4}
([a-f\d]{1,4}:){1,4}(:[a-f\d]{1,4}){1,2}
([a-f\d]{1,4}:){1,3}(:[a-f\d]{1,4}){1,3}
([a-f\d]{1,4}:){1,2}(:[a-f\d]{1,4}){1,4}
[a-f\d]{1,4}:(:[a-f\d]{1,4}){1,5}
:(:[a-f\d]{1,4}){1,6}
> Aiming for perfection has not worked here, so yes, let's replace
> it with Ale's simple and lax suggestion.
>
> I also wouldn't mind dropping the pattern altogether and just allowing
> any xs:string in the schema definition, with an informative comment that
> this field is expected to be an IP address. Implementations are free to
> implement a strict input validation if they need it.
RFC 5952 has more to say about leading zeroes being disallowed in a
16-bit field, unless the field contains a single zero. This is not
disallowed by the proposed regex.
To eliminate leading zeroes being valid, we can replace
[a-f\d]{1,4}
with
(0|[a-f1-9][a-f\d]{0,3})
That is: Exactly '0' or something not starting with '0', with a maximum
length of 4.
The final regex becoming:
((0|[a-f1-9][a-f\d]{0,3}):){7}(0|[a-f1-9][a-f\d]{0,3})
((0|[a-f1-9][a-f\d]{0,3}):){1,6}
((0|[a-f1-9][a-f\d]{0,3}):){1,5}:(0|[a-f1-9][a-f\d]{0,3})
((0|[a-f1-9][a-f\d]{0,3}):){1,4}(:(0|[a-f1-9][a-f\d]{0,3})){1,2}
((0|[a-f1-9][a-f\d]{0,3}):){1,3}(:(0|[a-f1-9][a-f\d]{0,3})){1,3}
((0|[a-f1-9][a-f\d]{0,3}):){1,2}(:(0|[a-f1-9][a-f\d]{0,3})){1,4}
(0|[a-f1-9][a-f\d]{0,3}):(:(0|[a-f1-9][a-f\d]{0,3})){1,5}
:(:(0|[a-f1-9][a-f\d]{0,3})){1,6}
Less legible again, but you can see the evolution of it, and hopefully
that will inspire some trust in its efficacy.
Let's reach for the goal of doing the best we can. I believe the
proposed regular expression will take us there.
The final regex aligns with the recommendations in RFC 5952, section 4
as follows below, and I have highlighted where it falls short.
4.1. Handling Leading Zeros in a 16-Bit Field
Leading zeros MUST be suppressed.
A single 16-bit 0000 field MUST be represented as 0.
4.2. "::" Usage
4.2.1. Shorten as Much as Possible
Not enforced by the regexp
4.2.2. Handling One 16-Bit 0 Field
The symbol "::" MUST NOT be used to shorten just one
16-bit 0 field.
4.2.3. Choice in Placement of "::"
[...] the longest run of consecutive 16-bit 0 fields
MUST be shortened
Not enforced by the regexp
4.3. Lowercase
The characters [a-f] MUST be represented in lowercase.
All the IPv6 address strings to be put into the xml file are likely to
come from inet_ntop() and similar, which does the right thing where I've
tested, and hopefully elsewhere too. I'm not too worried about 4.2.1 and
4.2.3 not being enforceable by the regex, and I've prepared a pull
request to this effect, see above.
Daniel K.
_______________________________________________
dmarc mailing list -- [email protected]
To unsubscribe send an email to [email protected]