Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

Robert Wilton Mon, 04 Sep 2017 09:08:45 -0700

Hi Lada,

On 04/09/2017 15:59, Ladislav Lhotka wrote:

Robert Wilton píše v Po 04. 09. 2017 v 15:05 +0100:

Hi Andy,


On 02/09/2017 17:46, Andy Bierman wrote:

On Sat, Sep 2, 2017 at 4:28 AM, Juergen Schoenwaelder 
<[email protected]> wrote:

On Sat, Sep 02, 2017 at 10:39:57AM +0000, Acee Lindem (acee) wrote:

This is not an effort to change or bifurcate the YANG 1.1. It is simply to
RECOMMEND a proper subset of XSD pattern that is more portable.

If you implement YANG as it is defined, pattern are portable. Given
this, I do not understand the notion of 'more portable'.

Anyway, it seems that those who want a more portable subset do not
even agree on what that subset is. Perhaps people pushing for this
should go and write an I-D that explains why a 'more portable' subset
is needed (which problems are we fixing), that defines such a 'more
portable subset', and which includes the reasoning how the subset has
been determined.


I do not agree that the YANG pattern contains a string that is both a POSIX and 
XSD regular expression.
The RFC is very clear it contains an XSD expression. Pretending it is both is a 
hack that does not even seem
to work 100%, so it is not reliable.

  I am not suggesting that the YANG pattern is both a POSIX and XSD regular 
expression.

I am only suggesting that the guidelines recommend that authors use a subset of 
XSD, to make it easier to programmatically *convert* the 'XSD subset compliant 
regular expression' into a functionally equivalent regular expression for 
whatever regular expression engine the tooling decides to use.

And that's the point, I think: each developer needs to get a library function so
as to translate the XSD pattern into a native regex of whatever programming
language he/she is currently using. So I guess what we really need is to
identify libraries for common languages that do it correctly - or write simple
translators ourselves if none is available.

Yes, exactly.

Looking at http://www.regular-expressions.info/ then XML RE does looklike a good standard choice of RE language for YANG pattern statementsbecause it is generally one of the most basic RE languages, and hence itshould be feasible to convert an XML RE into a form usable by most RElanguages.


But converting some parts of the XML RE syntax would probably be laborious:

1) E.g. the unicode property '\p{Nd}' that is equivalent to '\d' matches590 characters(http://www.fileformat.info/info/unicode/category/Nd/list.htm). Thereare approx 32 unicode properties, presumably these could also beextended over time as well.2) There are currently 105 unicode blocks, which each block is adiscrete range of characters (e.g. \p{InTibetan}: U+0F00–U+0FFF)3) Handling the character class subtraction is also possible, butprobably tedious to implement, since it requires the translation tofully understand the set of characters in the character class so it canform an equivalent character class without any subtractions.These were the three parts of the XML RE that I was hoping to discouragein the YANG author guidelines so that performing a translation is mucheasier. Spotting these 3 parts in the regex should be simple, so thetranslation would still be robust, even if not complete.

There are other conversions that may also need to be performed(depending on the target RE engine):1) Character class shorthands (e.g. \d, \w) need to be converted torepresent the Unicode set equivalent, since for a lot of engines theyonly match ASCII characters. For '\s' it must match ASCII whitespace only.2) If the engine supports greedy alternation (e.g. POSIX basic/extendedregex), then alternations need to be converted to an eager form if required.3) The syntax for escaping characters seems to differ in XML RE fromother common languages.

4) Linebreak match handling seems to differ.

These conversions would need to be done regardless, but would seem to bemuch quicker/simpler to implement than the ones above.


Thanks,
Rob

E.g. this seems to be the approach used by "libyang" that uses libpcre as the backend RE 
library rather than libxml.  Unfortunately, I think that the libyang library would currently fail 
if the pattern statement contained "[[A-Z]-[P-R]]" because it looks like the PCRE2 
language does not support character class subtraction.  ACAICT, no standard YANG modules currently 
support character class subtraction, so the authors of libyang have a choice here:

Note that your example is incorrect, it should be [A-Z-[P-R]]. FWIW, Python
module PyXB (that I used in Yangson library) does support this.

Lada

   (i) write a block of code that most likely nobody is going to use, or
   (ii) document the limitation, spot character class subtraction in the regex, 
and flag that it is not supported (or perhaps just ignore it).

If the community wants to support both XSD and POSIX expressions, then the 
proper engineering
solution is to introduce a new statement that is defined to contain a POSIX 
expression.
This can be done with a YANG extension now and added to YANG 2.0 later.

  I think that this is an inferior solution:
- there are many languages that YANG tools could be written in: C/C++, Python, 
Java, Go, Rust, Javascript are all reasonably plausible choices.
- they all have similar, but with small differences regular expression flavours 
(according to http://www.regular-expressions.info/reference.html).
- Personally, I see no inherent advantage of the POSIX Extended Regex over XML 
RE.   In fact, given that it doesn't support Unicode at all, it would seem to 
be a somewhat strange choice for a second pattern statement.
- Nor does it seem pragmatic to introduce lots of different flavors of pattern 
statements into YANG each supporting a different regex syntax.

I also don't like the solution that every YANG tool maker has to either link 
against libxml2,  or write their own efficient regular expression engine.  I'm 
not convinced that what the world needs is yet more regular expression 
implementations :-)

So, I still see that the better technical solution is always only define the 
pattern statements in XML RE language, but to strongly encourage folks to use a 
subset of that language for standards models (which they appear to be doing 
anyway) to make it easier to covert the regular expression into compatible 
versions for other engines.

Thanks,
Rob

/js

Andy

--
Juergen Schoenwaelder           Jacobs University Bremen gGmbH
Phone: +49 421 200 3587         Campus Ring 1 | 28759 Bremen | Germany
Fax:   +49 421 200 3103         <http://www.jacobs-university.de/>

_______________________________________________
netmod mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/netmod

_______________________________________________

netmod mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/netmod


_______________________________________________
netmod mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/netmod

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

Reply via email to