Rob

Speaking as a contributor.


On August 30, 2017 12:44:42 PM Robert Wilton <[email protected]> wrote:

Hi,

On 30/08/2017 15:52, Andy Bierman wrote:


On Wed, Aug 30, 2017 at 5:31 AM, Juergen Schoenwaelder
<[email protected]
<mailto:[email protected]>> wrote:

    On Wed, Aug 30, 2017 at 12:48:19PM +0100, Robert Wilton wrote:
    >
    >
    > On 30/08/2017 11:29, Juergen Schoenwaelder wrote:
    > > On Wed, Aug 30, 2017 at 10:16:30AM +0100, Robert Wilton wrote:
    > > > Hi Andy,
    > > >
    > > > What I am suggesting makes it easier for readers, because I
    am a proponent
    > > > of simpler regular expressions that are easy to read and
    understand.
    > > >
    > > > For example, I wonder how many YANG model readers would
    immediately
    > > > comprehend what this pattern statement means:
    > > >
    > > > pattern "\p{Sc}\p{Zs}?\p{Nd}+\.\p{Nd}{2}"?
    > > >
    > > > Does allowing such patterns really make it easier for model
    readers?
    > > This is always difficult to judge but to be fair you have to
    show how
    > > you express _the same_ (and not a subset) with some other kind of
    > > regular expressions. (My understanding is that \p{Sc} is a
    currency
    > > symbol.)
    > Yes, the expression would cover a currency amount, along with
    associated
    > symbol (e.g. "$200.00").
    >
    > If I was writing a module, I would probably use the following
    pattern
    > statement instead, which I think a lot more people would likely
    be able to
    > comprehend:
    >
    > pattern "[A-Z]{3}\s?\d+\.\d{2}", using the 3 letter, ISO 4217,
    currency codes.  e.g. ("USD 200.00")

    But that is not the same. Apples versus oranges. (I expect people to
    tell me that (i) currency is irrelevant and (ii) that three ASCII
    letter currency acronyms are better than currency symbols anyway but
    this is a separate discussion I am not interested in.)

    > >
    > > > The proposes guidelines obviously make it easier (or at
    least no harder) for
    > > > tool makers.
    > > >
    > > > I agree that there is an minor impact to model writers, but
    really only in
    > > > the sense that the guidelines would be telling them not to
    use the esoteric
    > > > options of the XML regex syntax that they probably don't
    know about anyway.
    > > What is 'esoteric' largely depends on your language
    environment. What
    > > you are saying by 'do not use \p{}' is essentially 'do not use any
    > > unicode long live ASCII'.
    > No, that is not my intention, i.e. I'm not suggesting banning
    all use of
    > \p{}, but instead limiting it to the character classes that seem
    like they
    > may plausibly be used in standardized YANG modules.

    This is entirely subjective. And if you still allow some \p{}, what is
    the point of the exercise?

    > I'm not trying to change what 6020/7950 defines the pattern
    statement as,
    > just give what I perceive as some pragmatic guidance as to what
    parts of XML
    > RE it makes sense to use in standardized YANG modules, making it
    easier for
    > readers and implementations.
    >
    > I think that it is fine for companies, vendors, etc to use the
    full breadth
    > of XML RE if they wish.

    Implementations have to be prepared to handle XSD pattern if they
    claim compliance to YANG 1.0 and 1.1. So all this only helps
    non-compliant implementations. This may indeed be a goal - but then we
    should spell this out as such - this helps non-compliant
    implementations (and they may still fail on the first \p{} that
    you still allow).

    If implementations do not implement the YANG pattern statement but
    something else, then then they should ignore patterns they can't
    understand and treat the pattern as if it would have been in a
    description clause - i.e., leave it to humans to write the code that
    implements the pattern correctly. Note that YANG does not say anything
    how stuff is implemented.



This does not work.
There are 3 outcomes from the regex compiler

1) proper syntax was used and accepted; pattern matches correctly
2) improper syntax was used and accepted; pattern matches incorrectly
3) improper syntax was used and rejected; compiler error generated

Case (2) is the really bad one and we have seen in in bug reports.

This issue was discussed in detail for almost 2 years and the
conclusion was
that a YANG extension would be used to specify other pattern types than
the XSD pattern mandated by the standard.
I actually think that XML RE is a good choice for YANG pattern
statements (because it is one of the more simple RE languages), I just
don't think that we need all of it.


First question: How many pattern statements in draft and standard IETF
YANG modules actually use Unicode properties (e.g \p{}).
Answer: Just 2.  To add a zone at the end of the IPv4/IPv6 address.

E.g.       pattern
         '(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.){3}'
       +  '([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])'
       + '(%[\p{N}\p{L}]+)?';

This could quite possibly have been written just as
"\d{1,3}\.{3}\d{1,3)(%\w+)?" and not use Unicode properties at all.

There a couple more occurrences of Unicode character classes in the
vendor models on github, but only to restrict them to the ASCII
character set (oh the irony), which I believe can be accomplished
without resorting to Unicode properties.


Another question: How often is character class subtraction (e.g.
[A-Z-[PQ]] used in standard & the github YANG modules?
Answer: 0.  AFAICT, it isn't used at all, anywhere ...



Now, I'm not proposing using a different regex syntax for pattern
statements, just a sensible subset of XSD RE, such that it easier for
folks to read/review pattern statements, and it is easier for client and
server implementations to translate into other common regex
implementations if they so wish.

Of course, as part of that translation, I would expect a translation
function to check and generate an error if the translation cannot handle
the input regex (e.g. if it uses an obscure unmatched unicode property
or a unicode block, or character class subtraction syntax).  This really
doesn't seem hard to me.

But the XML RE language has stuff in it that I don't think anyone is
ever going to use in a standardized network management YANG model.
Forcing everyone to implement support for this stuff just seems like a
complete waste of time and effort.  Looking at the regex info website it
looks like there are about 143 unicode properties and blocks defined (it
may be incomplete), or which I think that 135+ of these probably have no
relevance in network management YANG modules, and the benefit of the
remaining ones is pretty suspect.

I mean, how many network management YANG modules really need a pattern
statement that only matches Runic characters?  Perhaps someone out there
is busy defining "middle-earth.yang" ;-)

If I am the only person opposed to making life unnecessarily difficult
to readers of YANG models, and client/server tool implementors
interacting with YANG then it is probably time to give up this
discussion. ;-)


I agree with you 100%

And I see Xufeng's proposal for 6087bis as an attempt at putting some language together to support this desire. Perhaps you can suggest alternate language.

Lou

Python, quite likely a common tool for client side network management,
also doesn't seem to have any support of unicode properties or blocks. 
Perhaps implementations will hook it up to libxml2 instead, or write a
full translation XML RE to Python RE conversion tool.  But probably most
people will just feed the pattern statement into the native Python regex
engine, and my guess is that this will probably work 95% of the time. 
The other 5% ... who knows what will happen ... oh well, better to try
and fail than to not try at all.

Apologies if this email comes across as a rant.

Rob




    /js


Andy

    --
    Juergen Schoenwaelder           Jacobs University Bremen gGmbH
    Phone: +49 421 200 3587         Campus Ring 1 | 28759 Bremen | Germany
    Fax:   +49 421 200 3103         <http://www.jacobs-university.de/
    <http://www.jacobs-university.de/>>






----------
_______________________________________________
netmod mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/netmod

_______________________________________________
netmod mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/netmod

Reply via email to