[email protected] (Dale R. Worley) wrote:
> From: Juergen Schoenwaelder <[email protected]>
> >     An unquoted string is any sequence of characters and does not
> >     contain any space, tab, or newline characters, a single or double
> >     quote character, a semicolon (";"), braces ("{" or "}"), or
> >     comment sequences ("//", "/*", or "*/").
> > 
> >     Note that any keyword can legally appear as an unquoted string.
> 
> That text seems to be quite clear to me.  As Juergen notes, it directly
> points out to the implementor an important fact about lexing Yang.
> (Similarly, some programming languages allow keywords to be used as
> identifiers, they're not reserved.)

Ok, let's use Juergen's proposed text.

> Martin Bjorklund <[email protected]> writes:
> >> If I understand correctly, the tokens of Yang (as the term is usually
> >> used in programming languages) are:
> >> 
> >>     whitespace (which is ignored)
> >>     comments (which is ignored)
> >>     single-quoted strings
> >>     double-quoted strings
> >>     unquoted strings (including keywords)
> >>     ;
> >>     {
> >>     }
> >> 
> >> >From the point of view of the tokenizer, these tokens fall into the
> >> obvious classes:
> >> 
> >>      type    unquoted string
> >>      "type"  double-quoted string
> >>      abc     unquoted-string
> >>      "abc"   double-quoted string
> >>      '---'   single-quoted string
> >> 
> >> I'm not quite sure how they are classified from the parser's point of
> >> view, though.
> >> 
> >>                              type    "type"  abc     "abc"   '---'
> >> 
> >> Is a string?                 ?       Y       ?       Y       Y
> >> (Can it appear as the
> >> argument of "description"?)
> >> 
> >> Is a keyword?                Y       ?       N       N       N
> >> (Can it appear as the first
> >> token of some statement?)
> >> 
> >> Is an identifier?            Y       ?       Y       ?       N
> >> (Can it appear as the second
> >> token of a type statement?)
> >> 
> >> Usually programming languages use the particular syntax of different
> >> types of tokens to determine where they can be used in the
> >> context-free grammar.  Yang seems to be more relaxed, but I'm not sure
> >> whether it is so relaxed thay any of the types of string tokens can be
> >> used anywhere.
> >
> > No; a keyword must be written w/o quotes, so it is special.
> 
> OK, that answers one '?'.  But there are effectively two that remain:
> 
> Is abc   valid as a string?

Yes.

> Is "abc" valid as an identifier?

Yes.

All YANG statements have the same structure:

  statement = keyword [argument] (";" / "{" *statement "}")

All arguments are strings.  Strings can be unquoted or quoted (and
possibly concatenated).

> I *think* the answer is No for both of those, but I can't put my finger
> on the rules that make that definite.
> 
> >> There are two types that don't have a canonical form, identityref and
> >> instance-identifier.  It seems that comparisons in XPath expressions
> >> are inexact if the type doesn't have a canonical form (section 6.4).
> >> But if I understand you correctly, the implicit comparisons in leafref
> >> are done based on the abstract values involved, not the lexical
> >> representation.
> >
> > Yes.
> 
> I'm willing to take that as understood.
> 
> >> > > The current ABNF doesn't allow for "+" for joining quoted strings.
> >> > > Also, it doesn't show that \" can be included in a double quoted string
> >> > > to include a literal ", and allows the string contents to continue --
> >> > > the current ABNF "DQUOTE string DQUOTE" matches "abcd\", despite that
> >> > > the latter is not a proper double-quoted string.
> >> > 
> >> > Note that the prose text (within <...>) says "a string that
> >> > matches...".  That string can be any YANG token string, for example
> >> > one of:
> >> > 
> >> >    "hello"
> >> >    "he" + "llo"
> >> 
> >> If I haven't gotten confused, you're referring to
> >> 
> >>    string              = < an unquoted string as returned by >
> >>                          < the scanner, that matches the rule >
> >>                          < yang-string >
> >> 
> >>    yang-string        = *yang-char
> >> 
> >>    ;; any Unicode or ISO/IEC 10646 character including tab, carriage
> >>    ;; return, and line feed, but excluding the other C0 control
> >>    ;; characters, the surrogate blocks, and the noncharacters.
> >>    yang-char = %x09 / %x0A / %x0D / %x20-D7FF /
> >>                                ; exclude surrogate blocks %xD800-DFFF
> >>               %xE000-FDCF /    ; exclude noncharacters %xFDD0-FDEF
> >>               %xFDF0-FFFD /    ; exclude noncharacters %xFFFE-FFFF
> >>               %x10000-1FFFD /  ; exclude noncharacters %x1FFFE-1FFFF
> >>               %x20000-2FFFD /  ; exclude noncharacters %x2FFFE-2FFFF
> >>               %x30000-3FFFD /  ; exclude noncharacters %x3FFFE-3FFFF
> >>               %x40000-4FFFD /  ; exclude noncharacters %x4FFFE-4FFFF
> >>               %x50000-5FFFD /  ; exclude noncharacters %x5FFFE-5FFFF
> >>               %x60000-6FFFD /  ; exclude noncharacters %x6FFFE-6FFFF
> >>               %x70000-7FFFD /  ; exclude noncharacters %x7FFFE-7FFFF
> >>               %x80000-8FFFD /  ; exclude noncharacters %x8FFFE-8FFFF
> >>               %x90000-9FFFD /  ; exclude noncharacters %x9FFFE-9FFFF
> >>               %xA0000-AFFFD /  ; exclude noncharacters %xAFFFE-AFFFF
> >>               %xB0000-BFFFD /  ; exclude noncharacters %xBFFFE-BFFFF
> >>               %xC0000-CFFFD /  ; exclude noncharacters %xCFFFE-CFFFF
> >>               %xD0000-DFFFD /  ; exclude noncharacters %xDFFFE-DFFFF
> >>               %xE0000-EFFFD /  ; exclude noncharacters %xEFFFE-EFFFF
> >>               %xF0000-FFFFD /  ; exclude noncharacters %xFFFFE-FFFFF
> >>               %x100000-10FFFD  ; exclude noncharacters %x10FFFE-10FFFF
> >> 
> >> But if that's taken at face value, you can lex as single "string"s not only
> >> 
> >>     "hello"
> >> 
> >> and
> >> 
> >>     "he" + "llo"
> >> 
> >> but also
> >> 
> >>     "The MTU of the interface."; myext:c-define "MY_MTU"
> >
> > Why whould this be treated a single string?  Note that line breaks are
> > not special in YANG, so by the same logic, this example:
> >
> >   description
> >     "The MTU of the interface.";
> >   reference
> >     "RFC XYZ";
> >
> > would be scanned as the three tokens:
> >
> >   description
> >
> >   "The MTU of the interface.";
> >   reference
> >     "RFC XYZ"
> >
> >   ;  
> 
> Actually, I've gotten completely lost in regard to how the grammar
> specifies how strings can be written.

The grammar doesn't specify how strings can be written.  This is
defined in the section "Lexical Tokenization":

   unquoted
   quoted
   quoted and concatenated


> Let me take a single example.
> There are these productions:
> 
>    organization-stmt   = organization-keyword sep string stmtend
> 
>    string              = < an unquoted string as returned by >
>                          < the scanner, that matches the rule >
>                          < yang-string >
> 
>    yang-string        = *yang-char
> 
> (I won't quote the production for yang-char.)  As written, this seems to
> say that in an organization statement, there is the organization
> keyword, a sep, a string, and then a stmtend.  But the text above says
> that a string is "an unquoted string as returned by the scanner

This means that in all these examples, the resulting string is
the same:

   hello
   "hello"
   'hello'
   'he' + "llo"

and the resulting string is a string with 5 characters.

>, that
> matches the rule yang-string".  But looking at the example:
> 
>        organization "Example Inc.";
> 
> I see that the string involved is quoted, which doesn't seem to be
> allowed by the production for "string", because the production says "an
> unquoted string".
> 
> Now I suspect that what the production for "string" really wants to be
> is "a quoted or unquoted string whose *value* matches the rule
> yang-string".  That is, what is *written* isn't allowed to be any
> *yang-char, but the denoted value must be *yang-char.

Yes, but it is more than quoted or unquoted; it is also concatenation,
escape char substition and whitespace trimming.

> (Some of the
> other productions seem to have the same issue.)
> 
> Another point which is confusing me is how quoted strings are specified
> by the grammar.  The only appearance of DQUOTE is in
> 
>    quoted-string       = (DQUOTE string DQUOTE) / (SQUOTE string SQUOTE)
> 
> and the only appearance of quoted-string is in
> 
>    key-predicate-expr  = node-identifier *WSP "=" *WSP quoted-string
> 
>    leaf-list-predicate-expr = "." *WSP "=" *WSP quoted-string

Note that these are quoted strings *after* the scanner's processing of
the input, so e.g., you might have:

  path '/foo[bar="hi"]';


/martin





> 
> which leaves me wondering how quoted strings are allowed elsewhere in
> the language.
> 
> Dale
> 

_______________________________________________
netmod mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/netmod

Reply via email to