Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

Kent Watsen Wed, 30 Aug 2017 13:04:05 -0700

As Andy says, readability is #1, and it follows that a restricted subset would 
be more understandable.  Standardizing this would require an update to RFC 7950 
(read: not going to happen anytime soon).  Maybe we could start with just 
having a tool detect when something outside the common-subset is used.   Can a 
"common subset" be well-defined?  - "common" between how many engines? - would 
it be forever evolving?


K. // contributor


On 8/30/17, 12:44 PM, "netmod on behalf of Robert Wilton" 
<[email protected]<mailto:[email protected]> on behalf of 
[email protected]<mailto:[email protected]>> wrote:

I actually think that XML RE is a good choice for YANG pattern statements 
(because it is one of the more simple RE languages), I just don't think that we 
need all of it.


First question: How many pattern statements in draft and standard IETF YANG 
modules actually use Unicode properties (e.g \p{}).
Answer: Just 2.  To add a zone at the end of the IPv4/IPv6 address.

E.g.       pattern
        '(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.){3}'
      +  '([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])'
      + '(%[\p{N}\p{L}]+)?';

This could quite possibly have been written just as 
"\d{1,3}\.{3}\d{1,3)(%\w+)?" and not use Unicode properties at all.

There a couple more occurrences of Unicode character classes in the vendor 
models on github, but only to restrict them to the ASCII character set (oh the 
irony), which I believe can be accomplished without resorting to Unicode 
properties.


Another question: How often is character class subtraction (e.g. [A-Z-[PQ]] 
used in standard & the github YANG modules?
Answer: 0.  AFAICT, it isn't used at all, anywhere ...



Now, I'm not proposing using a different regex syntax for pattern statements, 
just a sensible subset of XSD RE, such that it easier for folks to read/review 
pattern statements, and it is easier for client and server implementations to 
translate into other common regex implementations if they so wish.

Of course, as part of that translation, I would expect a translation function 
to check and generate an error if the translation cannot handle the input regex 
(e.g. if it uses an obscure unmatched unicode property or a unicode block, or 
character class subtraction syntax).  This really doesn't seem hard to me.

But the XML RE language has stuff in it that I don't think anyone is ever going 
to use in a standardized network management YANG model.   Forcing everyone to 
implement support for this stuff just seems like a complete waste of time and 
effort.  Looking at the regex info website it looks like there are about 143 
unicode properties and blocks defined (it may be incomplete), or which I think 
that 135+ of these probably have no relevance in network management YANG 
modules, and the benefit of the remaining ones is pretty suspect.

I mean, how many network management YANG modules really need a pattern 
statement that only matches Runic characters?  Perhaps someone out there is 
busy defining "middle-earth.yang" ;-)

If I am the only person opposed to making life unnecessarily difficult to 
readers of YANG models, and client/server tool implementors interacting with 
YANG then it is probably time to give up this discussion. ;-)

Python, quite likely a common tool for client side network management, also 
doesn't seem to have any support of unicode properties or blocks.  Perhaps 
implementations will hook it up to libxml2 instead, or write a full translation 
XML RE to Python RE conversion tool.  But probably most people will just feed 
the pattern statement into the native Python regex engine, and my guess is that 
this will probably work 95% of the time.  The other 5% ... who knows what will 
happen ... oh well, better to try and fail than to not try at all.

Apologies if this email comes across as a rant.

Rob

_______________________________________________
netmod mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/netmod

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

Reply via email to