On 23/08/2017 12:52, Vladimir Vassilev wrote:
On 08/21/2017 05:14 PM, Robert Wilton wrote:
Hi Acee,
That makes sense.
The other thing that I think that we have got wrong is modelling
regex pattern statements. I think that it would be much better if
these were written to be less exhaustive and much simpler.
E.g. the "route distinguisher" pattern in
draft-ietf-rtgwg-routing-types-09 is defined as this:
pattern
'(0:(6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|'
+ '6[0-4][0-9]{3}|'
+ '[0-5]?[0-9]{0,3}[0-9]):(429496729[0-5]|'
+ '42949672[0-8][0-9]|'
+ '4294967[01][0-9]{2}|429496[0-6][0-9]{3}|'
+ '42949[0-5][0-9]{4}|'
+ '4294[0-8][0-9]{5}|429[0-3][0-9]{6}|'
+ '42[0-8][0-9]{7}|4[01][0-9]{8}|'
+ '[0-3]?[0-9]{0,8}[0-9]))|'
+ '(1:((([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|'
+ '25[0-5])\.){3}([0-9]|[1-9][0-9]|'
+ '1[0-9]{2}|2[0-4][0-9]|25[0-5])):(6553[0-5]|'
+ '655[0-2][0-9]|'
+ '65[0-4][0-9]{2}|6[0-4][0-9]{3}|'
+ '[0-5]?[0-9]{0,3}[0-9]))|'
+ '(2:(429496729[0-5]|42949672[0-8][0-9]|'
+ '4294967[01][0-9]{2}|'
+ '429496[0-6][0-9]{3}|42949[0-5][0-9]{4}|'
+ '4294[0-8][0-9]{5}|'
+ '429[0-3][0-9]{6}|42[0-8][0-9]{7}|4[01][0-9]{8}|'
+ '[0-3]?[0-9]{0,8}[0-9]):'
+ '(6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|'
+ '6[0-4][0-9]{3}|'
+ '[0-5]?[0-9]{0,3}[0-9]))|'
+ '(6(:[a-fA-F0-9]{2}){6})|'
+ '(([3-57-9a-fA-F]|[1-9a-fA-F][0-9a-fA-F]{1,3}):'
+ '[0-9a-fA-F]{1,12})';
}
But I think that it would be much easier to read, and quite possibly
more performant to execute, if the pattern regex was written
something like the following:
pattern:
'(0:[0-9]{1,5}:[0-9]{1,10})|
(1:([0-9]{1,3}\.){4}:[0-9]{1,5})|
(2:[0-9]{1,10}:0:[0-9]{1,5})|
(6(:[a-fA-F0-9]{2}){6})';
Of course, this would allow more invalid values, but most servers
would be expected to reject those when it converts them into an
internal binary format any way.
What do you, and others, think?
You still need the
|(([3-57-9a-fA-F]|[1-9a-fA-F][0-9a-fA-F]{1,3}):[0-9a-fA-F]{1,12}) in
the end to not reject valid values though.
Sure, OK.
IMO a pattern statement has value if it absolutely defines the set of
valid strings.
It still has value if it also performs some simple checks and removes
obvious mistakes.
But even if a value passes the regex filter, it still doesn't guarantee
that is the value is correct. Someone could put a typo in there, or
perhaps configure a multicast IP address where only unicast addresses
are allowed, or put the same IP address on two separate interfaces, or
use a IP address that they don't own, etc ...
In general I do not see the benefit of pattern statements that do not
reject all invalid string instances. I prefer the original pattern or
none at all.
OK, so some potential counter examples:
1) Email address. I understand that the full regex to validate all
email addresses is very complex, but checking that it at least contains
an @ symbol still has benefit. It would seem that a short imperfect
regex is better than a complete perfect regex.
2) A list of VLAN ranges, e.g. want to allow strings that look like
this: "1-10,20-400,600,2000-3000", but only with non overlapping values
in ascending order. It is easy to write a regex to check that the
structure is right, but AFAIK it is hard (impossible?) to write a regex
that ensures that the ranges don't overlap and are specified in
ascending order.
So, I propose that we use regexes for checking that the string is
structurally correct, but don't use regexes to perform numerical range
checks of string encoded numbers, since it makes the regexes hard to
read/verify, and doesn't improve the readability of the YANG file either.
Thanks,
Rob
Vladimir
Thanks,
Rob
.
_______________________________________________
netmod mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/netmod