Hi,

[email protected] (Dale R. Worley) wrote:
> (This is the second part of my response.)
> 
> > > > > - section 6.1
> > > > > 
> > > > >    This section details the rules for recognizing tokens from an input
> > > > >    stream.
> > > > > 
> > > > > Generally, language definitions intersperse the narrative text with
> > > > > the relevant grammar definitions.  Yang's statement grammar is simple
> > > > > enough that one doesn't need to see the context-free part of the
> > > > > grammar to understand the narrative for statements.  But when reading
> > > > > about tokenization, not having the grammar presented at the same time
> > > > > is quite a burden.  I'd recommend duplicating the relevant productions
> > > > > from section 14 into the subsections of section 6.
> > > > > 
> > > > > There is some sort of exposition problem.  The result of
> > > > > "tokenization" is that the sequence of characters of the source is
> > > > > converted into a sequence of "tokens".  Then some subset of the tokens
> > > > > is discarded as being non-significant (e.g., whitespace and comments),
> > > > > and the remainder is parsed with a context-free grammar.  Here I can't
> > > > > figure out what the set of tokens is.  Looking at the grammar in
> > > > > section 14, it seems to be a context-free grammar on characters.  But
> > > > > that implies that there is no separate tokenization phase.
> > > > > 
> > > > > An example that shows the problems:
> > > > > 
> > > > >    mod:ext
> > > > > 
> > > > > Is this one token, which is also an extension keyword, or is it a
> > > > > sequence of three tokens?
> > > > 
> > > > The text says:
> > > > 
> > > >   A token in YANG is either a keyword, a string, a semicolon (";"), or
> > > >   braces ("{" or "}").
> > > > 
> > > > and:
> > > > 
> > > >   A keyword is [...] or a prefix identifier, followed by a colon
> > > >   (":"), followed by a language extension keyword.
> > > > 
> > > > So "mod:ext" is one token.
> > > 
> > > Certainly it can be one token.  My question is how do verify that it is
> > > not a string?  I think that may be the origin of my confusion here is
> > > that I haven't spotted a clear syntax for unquoted string.  In most
> > > programming languages, mod:ext would be parsed as an identifier, a
> > > colon, and an identifier.  In YANG, identifiers are usually tokenized as
> > > strings, so I ask whether YANG tokenizes it as a string, a colon, and a
> > > string.
> > > 
> > > Looking at the beginning of 6.1.3, it doesn't appear that an unquoted
> > > string is forbidden from containing a colon.
> > > 
> > > I think that the underlying problem is that I'm not clear on what gets
> > > tokenized as an unquoted string.
> > 
> > Note that this is legal YANG:
> > 
> >    leaf type {
> >      type string;
> >    }
> 
> So keywords aren't reserved; they can also be used as identifiers.

Yes.

> > I think there are two ways to look at this.  Either we describe the
> > tokenizer as being context-dependent, or we describe the "argument" in
> > a "statement" to be a "string or keyword".
> > 
> > In the latter case maybe we can do:
> > 
> > OLD:
> > 
> >   If a string contains any space, tab, or newline characters, a single
> >   or double quote character, a semicolon (";"), braces ("{" or "}"),
> >   or comment sequences ("//", "/*", or "*/"), then it MUST be enclosed
> >   within double or single quotes.
> > 
> > NEW:
> > 
> >   An unquoted string is any sequence of characters that does not start
> >   with a double or single quote character, is not a keyword, and does
> >   not contain any space, tab, or newline characters, a single or
> >   double quote character, a semicolon (";"), braces ("{" or "}"), or
> >   comment sequences ("//", "/*", or "*/").
> 
> That's a lot clearer.  Though you can shorten it to:
> 
>    An unquoted string is any sequence of characters that is not a
>    keyword, and does not contain any space, tab, or newline
>    characters, a single or double quote character, a semicolon (";"),
>    braces ("{" or "}"), or comment sequences ("//", "/*", or "*/").

Thanks, better.

> > In section 6.3 we must also do:
> > 
> > OLD:
> > 
> >    The argument is a string, as defined in Section 6.1.2.
> > 
> > NEW:
> > 
> >    The argument is a string or a keyword, as defined in Section 6.1.2.
> 
> If I understand correctly, the tokens of Yang (as the term is usually
> used in programming languages) are:
> 
>     whitespace (which is ignored)
>     comments (which is ignored)
>     single-quoted strings
>     double-quoted strings
>     unquoted strings (including keywords)
>     ;
>     {
>     }
> 
> >From the point of view of the tokenizer, these tokens fall into the
> obvious classes:
> 
>       type    unquoted string
>       "type"  double-quoted string
>       abc     unquoted-string
>       "abc"   double-quoted string
>       '---'   single-quoted string
> 
> I'm not quite sure how they are classified from the parser's point of
> view, though.
> 
>                               type    "type"  abc     "abc"   '---'
> 
> Is a string?                  ?       Y       ?       Y       Y
> (Can it appear as the
> argument of "description"?)
> 
> Is a keyword?                 Y       ?       N       N       N
> (Can it appear as the first
> token of some statement?)
> 
> Is an identifier?             Y       ?       Y       ?       N
> (Can it appear as the second
> token of a type statement?)
> 
> Usually programming languages use the particular syntax of different
> types of tokens to determine where they can be used in the
> context-free grammar.  Yang seems to be more relaxed, but I'm not sure
> whether it is so relaxed thay any of the types of string tokens can be
> used anywhere.

No; a keyword must be written w/o quotes, so it is special.

> > > > > -- it must be an unquoted string.
> > > > > 
> > > > >    If a double-quoted string contains a line break followed by space 
> > > > > or
> > > > >    tab characters that are used to indent the text according to the
> > > > >    layout in the YANG file, this leading whitespace is stripped from 
> > > > > the
> > > > >    string, up to and including the column of the double quote 
> > > > > character,
> > > > >    or to the first non-whitespace character, whichever occurs first.  
> > > > > In
> > > > >    this process, a tab character is treated as 8 space characters.
> > > > > 
> > > > > This description isn't quite careful enough.  Better:
> > > > > 
> > > > >    If a double-quoted string contains a line break followed by space 
> > > > > or
> > > > >    tab characters, an initial part of this whitespace is removed from 
> > > > > the
> > > > >    string.  The amount removed is the longest prefix whose width is no
> > > > >    larger than the width of the prefix of Yang source line containing
> > > > >    the opening double quote character of the string to and including 
> > > > > the
> > > > >    opening double quote character.  For this purpose, the width of a
> > > > >    tab character is 8 and the width of any other character is 1.
> > > > > 
> > > > > This does assume that all tabs are considered to have width 8, that
> > > > > is, tabs do not have the usual semantics of "advance to the next
> > > > > column that is divisible by 8".  That will sometimes cause unexpected
> > > > > results, e.g., if some source lines start with SPC TAB.  (Consider
> > > > > that whitespace before a line break is removed, which suggests the
> > > > > intention is that the value of the string should depend only on its
> > > > > visual appearance.)
> > > > > 
> > > > > Also, we're using the convention that "whitespace" does NOT include CR
> > > > > or LF, which is not always how the term is used.  Perhaps a definition
> > > > > of "whitespace" should be put in section 3.
> > > > > 
> > > > > There is also the special case:
> > > > > 
> > > > >    SPC " LF
> > > > >    TAB x "
> > > > > 
> > > > > Is the initial TAB of the second line to be removed or not?  There is
> > > > > no whitespace removal in the second line that will exactly reach the
> > > > > opening double quote.  As I've written it, the TAB is not removed.
> > > 
> > > Don't forget this ugly special case.
> > 
> > So, let's follow the rules.  We need to trim to the column of the
> > double quote character (2).  The second line starts with "space or
> > tab" so we do whitespace trimming, while treating the tab as 8
> > spaces.  So from 8 spaces we subtract 2, and get the resulting string
> > of 6 characters:
> > 
> >   LF SPC SPC SPC SPC SPC SPC x
> 
> OK, but that process wasn't clear to me.  I take it that any tab that
> appears before the starting double-quote counts as 8 spaces, and any
> tab that needs to be examined for deletion is turned into 8 spaces --
> but any other tabs in the string are unconverted.
> 
> I think it would be clearer to insert "starting" where I've indicated
> it, and replace the final sentence:
> 
>    If a double-quoted string contains a line break followed by space or
>    tab characters that are used to indent the text according to the
>    layout in the YANG file, this leading whitespace is stripped from the
>    string, up to and including the column of the >starting< double quote 
> character,
>    or to the first non-whitespace character, whichever occurs first.
>    In this process, any tab character before the starting double quote
>    character is treated as 8 spaces.  Any tab character in a succeeding
>    line that must be examined to for stripping is first converted into 8
>    spaces.

Ok, fixed.

> > > Actually, there is a somewhat subtle problem:  If I say "the system can
> > > sort them any way it wants", I am asserting that *there is a sorting
> > > order*.  Which means that if value A is put before value B at one time,
> > > then if values A and B are in the list at some other time, A will
> > > precede B.
> > 
> > The next sentences says:
> > 
> >   An implementation SHOULD use the same order for the same data,
> >   regardless of how the data were created.  Using a deterministic
> >   order will make comparisons possible using simple tools like "diff".
> 
> OK, I'm willing to go with that.  I mis-read the application of those
> sentences through an even more arcane ambiguity in the term "the same
> data".  But I'm willing to ignore that.
> 
> > > > > - section 7.21.4
> > > > > 
> > > > >    The "reference" statement takes as an argument a string ...
> > > > > 
> > > > > Perhaps s/a string/a human-readable string/.
> > > > 
> > > > "string" refers to the YANG token "string".  The same wording is used
> > > > across the document for all arguments.
> > > 
> > > I was thinking that it is a string, but in this particular case, it is
> > > supposed to be human-readable, whereas strings in other contexts aren't
> > > expected to be.
> > 
> > Ok.  Maybe:
> > 
> > OLD:
> > 
> >   The "reference" statement takes as an argument a string that is used
> >   to specify a textual cross-reference to an external document,
> > 
> > NEW:
> > 
> >   The "reference" statement takes as an argument a string that is used
> >   to specify a human-readable cross-reference to an external document,
> 
> Or even "is a human-readable cross-reference ...", but either is OK
> with me.

Ok.

> > > > > - section 7.21.5
> > > > > 
> > > > > Note that if a data definition has both an "if-feature" and a "when",
> > > > > then the "if-feature" is tested first.
> > > > > 
> > > > >    If the XPath expression references any node that also has 
> > > > > associated
> > > > >    "when" statements, these "when" expressions MUST be evaluated 
> > > > > first.
> > > > >    There MUST NOT be any circular dependencies in these "when"
> > > > >    expressions.
> > > > > 
> > > > > I think this could be better phrased:
> > > > > 
> > > > >    If the XPath expression references any node that also has
> > > > >    associated "when" statements, then the "when" expressions of the
> > > > >    referenced nodes MUST be evaluated first.  There MUST NOT be any
> > > > >    circular dependencies among "when" expressions.
> > > > 
> > > > Ok to the last sentence.  Do you think that the word "these" in the
> > > > first sentence is ambigious?
> > > 
> > > I must have thought it was unclear when I read it, otherwise I would not
> > > have suggested changing it.  But reading it again, I think that there is
> > > no ambiguity.  Perhaps it would be a little clearer to use 'those "when"
> > > expressions' rather than 'these "when" expressions'.  (I can't explain
> > > clearly why "those" seems less ambiguous than "these".)
> > 
> > Ok, as a non-native english speaker I trust you that "those" is better.
> 
> I can't tell that you're non-native.  Perhaps leave it as is and let
> the RFC Editor review it.
> 
> > > By implication, the leafref's value is considered to be a pointer to a
> > > particular leaf instance, the one with the matching value.  But that
> > > idea is not embedded in the Yang semantics of leafref types in any way
> > > (other than the output of the deref function), so the fact that there
> > > might be more than one matching leaf instance does not matter.
> > > 
> > > As stated in 9.9.4 and 9.9.5, the lexical representations of its values
> > > are the same as those of the referenced nodes.
> > > 
> > > How is the leafref's value compared to the values of the referenced
> > > nodes?  I can see that question getting ugly for the more complex types
> > > (e.g., anyxml)
> > 
> > You can't have a leafref to an anyxml node; just to a leaf or
> > leaf-list.
> > 
> > > which do not have canonical forms.  I suspect the
> > > intention is that values are equal if they have the same canonical form
> > 
> > No, the idea is that they are equal if their *value* is equal,
> > regardless of the lexical representation.
> 
> There are two types that don't have a canonical form, identityref and
> instance-identifier.  It seems that comparisons in XPath expressions
> are inexact if the type doesn't have a canonical form (section 6.4).
> But if I understand you correctly, the implicit comparisons in leafref
> are done based on the abstract values involved, not the lexical
> representation.

Yes.

> > > The current ABNF doesn't allow for "+" for joining quoted strings.
> > > Also, it doesn't show that \" can be included in a double quoted string
> > > to include a literal ", and allows the string contents to continue --
> > > the current ABNF "DQUOTE string DQUOTE" matches "abcd\", despite that
> > > the latter is not a proper double-quoted string.
> > 
> > Note that the prose text (within <...>) says "a string that
> > matches...".  That string can be any YANG token string, for example
> > one of:
> > 
> >    "hello"
> >    "he" + "llo"
> 
> If I haven't gotten confused, you're referring to
> 
>    string              = < an unquoted string as returned by >
>                          < the scanner, that matches the rule >
>                          < yang-string >
> 
>    yang-string        = *yang-char
> 
>    ;; any Unicode or ISO/IEC 10646 character including tab, carriage
>    ;; return, and line feed, but excluding the other C0 control
>    ;; characters, the surrogate blocks, and the noncharacters.
>    yang-char = %x09 / %x0A / %x0D / %x20-D7FF /
>                                ; exclude surrogate blocks %xD800-DFFF
>               %xE000-FDCF /    ; exclude noncharacters %xFDD0-FDEF
>               %xFDF0-FFFD /    ; exclude noncharacters %xFFFE-FFFF
>               %x10000-1FFFD /  ; exclude noncharacters %x1FFFE-1FFFF
>               %x20000-2FFFD /  ; exclude noncharacters %x2FFFE-2FFFF
>               %x30000-3FFFD /  ; exclude noncharacters %x3FFFE-3FFFF
>               %x40000-4FFFD /  ; exclude noncharacters %x4FFFE-4FFFF
>               %x50000-5FFFD /  ; exclude noncharacters %x5FFFE-5FFFF
>               %x60000-6FFFD /  ; exclude noncharacters %x6FFFE-6FFFF
>               %x70000-7FFFD /  ; exclude noncharacters %x7FFFE-7FFFF
>               %x80000-8FFFD /  ; exclude noncharacters %x8FFFE-8FFFF
>               %x90000-9FFFD /  ; exclude noncharacters %x9FFFE-9FFFF
>               %xA0000-AFFFD /  ; exclude noncharacters %xAFFFE-AFFFF
>               %xB0000-BFFFD /  ; exclude noncharacters %xBFFFE-BFFFF
>               %xC0000-CFFFD /  ; exclude noncharacters %xCFFFE-CFFFF
>               %xD0000-DFFFD /  ; exclude noncharacters %xDFFFE-DFFFF
>               %xE0000-EFFFD /  ; exclude noncharacters %xEFFFE-EFFFF
>               %xF0000-FFFFD /  ; exclude noncharacters %xFFFFE-FFFFF
>               %x100000-10FFFD  ; exclude noncharacters %x10FFFE-10FFFF
> 
> But if that's taken at face value, you can lex as single "string"s not only
> 
>     "hello"
> 
> and
> 
>     "he" + "llo"
> 
> but also
> 
>     "The MTU of the interface."; myext:c-define "MY_MTU"

Why whould this be treated a single string?  Note that line breaks are
not special in YANG, so by the same logic, this example:

  description
    "The MTU of the interface.";
  reference
    "RFC XYZ";

would be scanned as the three tokens:

  description

  "The MTU of the interface.";
  reference
    "RFC XYZ"

  ;  


/martin



> Doing that would allow the incorrect lexing of
> 
>          leaf mtu {
>            type uint32;
>            description "The MTU of the interface."; myext:c-define "MY_MTU";
>          }
> 
> as having a long description (starting with 'The MTU' and ending with
> 'MY_MTU') and no myext:c-define statement.
> 
> What we need is a production that matches "strings possibly combined
> with +" and nothing else.  That is, including '"he" + "llo"' but not the
> last example.
> 
> Dale
> 
> _______________________________________________
> netmod mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/netmod
> 

_______________________________________________
netmod mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/netmod

Reply via email to