Hi,
[email protected] (Dale R. Worley) wrote:
> (This is the second part of my response.)
>
> > > > > - section 6.1
> > > > >
> > > > > This section details the rules for recognizing tokens from an input
> > > > > stream.
> > > > >
> > > > > Generally, language definitions intersperse the narrative text with
> > > > > the relevant grammar definitions. Yang's statement grammar is simple
> > > > > enough that one doesn't need to see the context-free part of the
> > > > > grammar to understand the narrative for statements. But when reading
> > > > > about tokenization, not having the grammar presented at the same time
> > > > > is quite a burden. I'd recommend duplicating the relevant productions
> > > > > from section 14 into the subsections of section 6.
> > > > >
> > > > > There is some sort of exposition problem. The result of
> > > > > "tokenization" is that the sequence of characters of the source is
> > > > > converted into a sequence of "tokens". Then some subset of the tokens
> > > > > is discarded as being non-significant (e.g., whitespace and comments),
> > > > > and the remainder is parsed with a context-free grammar. Here I can't
> > > > > figure out what the set of tokens is. Looking at the grammar in
> > > > > section 14, it seems to be a context-free grammar on characters. But
> > > > > that implies that there is no separate tokenization phase.
> > > > >
> > > > > An example that shows the problems:
> > > > >
> > > > > mod:ext
> > > > >
> > > > > Is this one token, which is also an extension keyword, or is it a
> > > > > sequence of three tokens?
> > > >
> > > > The text says:
> > > >
> > > > A token in YANG is either a keyword, a string, a semicolon (";"), or
> > > > braces ("{" or "}").
> > > >
> > > > and:
> > > >
> > > > A keyword is [...] or a prefix identifier, followed by a colon
> > > > (":"), followed by a language extension keyword.
> > > >
> > > > So "mod:ext" is one token.
> > >
> > > Certainly it can be one token. My question is how do verify that it is
> > > not a string? I think that may be the origin of my confusion here is
> > > that I haven't spotted a clear syntax for unquoted string. In most
> > > programming languages, mod:ext would be parsed as an identifier, a
> > > colon, and an identifier. In YANG, identifiers are usually tokenized as
> > > strings, so I ask whether YANG tokenizes it as a string, a colon, and a
> > > string.
> > >
> > > Looking at the beginning of 6.1.3, it doesn't appear that an unquoted
> > > string is forbidden from containing a colon.
> > >
> > > I think that the underlying problem is that I'm not clear on what gets
> > > tokenized as an unquoted string.
> >
> > Note that this is legal YANG:
> >
> > leaf type {
> > type string;
> > }
>
> So keywords aren't reserved; they can also be used as identifiers.
Yes.
> > I think there are two ways to look at this. Either we describe the
> > tokenizer as being context-dependent, or we describe the "argument" in
> > a "statement" to be a "string or keyword".
> >
> > In the latter case maybe we can do:
> >
> > OLD:
> >
> > If a string contains any space, tab, or newline characters, a single
> > or double quote character, a semicolon (";"), braces ("{" or "}"),
> > or comment sequences ("//", "/*", or "*/"), then it MUST be enclosed
> > within double or single quotes.
> >
> > NEW:
> >
> > An unquoted string is any sequence of characters that does not start
> > with a double or single quote character, is not a keyword, and does
> > not contain any space, tab, or newline characters, a single or
> > double quote character, a semicolon (";"), braces ("{" or "}"), or
> > comment sequences ("//", "/*", or "*/").
>
> That's a lot clearer. Though you can shorten it to:
>
> An unquoted string is any sequence of characters that is not a
> keyword, and does not contain any space, tab, or newline
> characters, a single or double quote character, a semicolon (";"),
> braces ("{" or "}"), or comment sequences ("//", "/*", or "*/").
Thanks, better.
> > In section 6.3 we must also do:
> >
> > OLD:
> >
> > The argument is a string, as defined in Section 6.1.2.
> >
> > NEW:
> >
> > The argument is a string or a keyword, as defined in Section 6.1.2.
>
> If I understand correctly, the tokens of Yang (as the term is usually
> used in programming languages) are:
>
> whitespace (which is ignored)
> comments (which is ignored)
> single-quoted strings
> double-quoted strings
> unquoted strings (including keywords)
> ;
> {
> }
>
> >From the point of view of the tokenizer, these tokens fall into the
> obvious classes:
>
> type unquoted string
> "type" double-quoted string
> abc unquoted-string
> "abc" double-quoted string
> '---' single-quoted string
>
> I'm not quite sure how they are classified from the parser's point of
> view, though.
>
> type "type" abc "abc" '---'
>
> Is a string? ? Y ? Y Y
> (Can it appear as the
> argument of "description"?)
>
> Is a keyword? Y ? N N N
> (Can it appear as the first
> token of some statement?)
>
> Is an identifier? Y ? Y ? N
> (Can it appear as the second
> token of a type statement?)
>
> Usually programming languages use the particular syntax of different
> types of tokens to determine where they can be used in the
> context-free grammar. Yang seems to be more relaxed, but I'm not sure
> whether it is so relaxed thay any of the types of string tokens can be
> used anywhere.
No; a keyword must be written w/o quotes, so it is special.
> > > > > -- it must be an unquoted string.
> > > > >
> > > > > If a double-quoted string contains a line break followed by space
> > > > > or
> > > > > tab characters that are used to indent the text according to the
> > > > > layout in the YANG file, this leading whitespace is stripped from
> > > > > the
> > > > > string, up to and including the column of the double quote
> > > > > character,
> > > > > or to the first non-whitespace character, whichever occurs first.
> > > > > In
> > > > > this process, a tab character is treated as 8 space characters.
> > > > >
> > > > > This description isn't quite careful enough. Better:
> > > > >
> > > > > If a double-quoted string contains a line break followed by space
> > > > > or
> > > > > tab characters, an initial part of this whitespace is removed from
> > > > > the
> > > > > string. The amount removed is the longest prefix whose width is no
> > > > > larger than the width of the prefix of Yang source line containing
> > > > > the opening double quote character of the string to and including
> > > > > the
> > > > > opening double quote character. For this purpose, the width of a
> > > > > tab character is 8 and the width of any other character is 1.
> > > > >
> > > > > This does assume that all tabs are considered to have width 8, that
> > > > > is, tabs do not have the usual semantics of "advance to the next
> > > > > column that is divisible by 8". That will sometimes cause unexpected
> > > > > results, e.g., if some source lines start with SPC TAB. (Consider
> > > > > that whitespace before a line break is removed, which suggests the
> > > > > intention is that the value of the string should depend only on its
> > > > > visual appearance.)
> > > > >
> > > > > Also, we're using the convention that "whitespace" does NOT include CR
> > > > > or LF, which is not always how the term is used. Perhaps a definition
> > > > > of "whitespace" should be put in section 3.
> > > > >
> > > > > There is also the special case:
> > > > >
> > > > > SPC " LF
> > > > > TAB x "
> > > > >
> > > > > Is the initial TAB of the second line to be removed or not? There is
> > > > > no whitespace removal in the second line that will exactly reach the
> > > > > opening double quote. As I've written it, the TAB is not removed.
> > >
> > > Don't forget this ugly special case.
> >
> > So, let's follow the rules. We need to trim to the column of the
> > double quote character (2). The second line starts with "space or
> > tab" so we do whitespace trimming, while treating the tab as 8
> > spaces. So from 8 spaces we subtract 2, and get the resulting string
> > of 6 characters:
> >
> > LF SPC SPC SPC SPC SPC SPC x
>
> OK, but that process wasn't clear to me. I take it that any tab that
> appears before the starting double-quote counts as 8 spaces, and any
> tab that needs to be examined for deletion is turned into 8 spaces --
> but any other tabs in the string are unconverted.
>
> I think it would be clearer to insert "starting" where I've indicated
> it, and replace the final sentence:
>
> If a double-quoted string contains a line break followed by space or
> tab characters that are used to indent the text according to the
> layout in the YANG file, this leading whitespace is stripped from the
> string, up to and including the column of the >starting< double quote
> character,
> or to the first non-whitespace character, whichever occurs first.
> In this process, any tab character before the starting double quote
> character is treated as 8 spaces. Any tab character in a succeeding
> line that must be examined to for stripping is first converted into 8
> spaces.
Ok, fixed.
> > > Actually, there is a somewhat subtle problem: If I say "the system can
> > > sort them any way it wants", I am asserting that *there is a sorting
> > > order*. Which means that if value A is put before value B at one time,
> > > then if values A and B are in the list at some other time, A will
> > > precede B.
> >
> > The next sentences says:
> >
> > An implementation SHOULD use the same order for the same data,
> > regardless of how the data were created. Using a deterministic
> > order will make comparisons possible using simple tools like "diff".
>
> OK, I'm willing to go with that. I mis-read the application of those
> sentences through an even more arcane ambiguity in the term "the same
> data". But I'm willing to ignore that.
>
> > > > > - section 7.21.4
> > > > >
> > > > > The "reference" statement takes as an argument a string ...
> > > > >
> > > > > Perhaps s/a string/a human-readable string/.
> > > >
> > > > "string" refers to the YANG token "string". The same wording is used
> > > > across the document for all arguments.
> > >
> > > I was thinking that it is a string, but in this particular case, it is
> > > supposed to be human-readable, whereas strings in other contexts aren't
> > > expected to be.
> >
> > Ok. Maybe:
> >
> > OLD:
> >
> > The "reference" statement takes as an argument a string that is used
> > to specify a textual cross-reference to an external document,
> >
> > NEW:
> >
> > The "reference" statement takes as an argument a string that is used
> > to specify a human-readable cross-reference to an external document,
>
> Or even "is a human-readable cross-reference ...", but either is OK
> with me.
Ok.
> > > > > - section 7.21.5
> > > > >
> > > > > Note that if a data definition has both an "if-feature" and a "when",
> > > > > then the "if-feature" is tested first.
> > > > >
> > > > > If the XPath expression references any node that also has
> > > > > associated
> > > > > "when" statements, these "when" expressions MUST be evaluated
> > > > > first.
> > > > > There MUST NOT be any circular dependencies in these "when"
> > > > > expressions.
> > > > >
> > > > > I think this could be better phrased:
> > > > >
> > > > > If the XPath expression references any node that also has
> > > > > associated "when" statements, then the "when" expressions of the
> > > > > referenced nodes MUST be evaluated first. There MUST NOT be any
> > > > > circular dependencies among "when" expressions.
> > > >
> > > > Ok to the last sentence. Do you think that the word "these" in the
> > > > first sentence is ambigious?
> > >
> > > I must have thought it was unclear when I read it, otherwise I would not
> > > have suggested changing it. But reading it again, I think that there is
> > > no ambiguity. Perhaps it would be a little clearer to use 'those "when"
> > > expressions' rather than 'these "when" expressions'. (I can't explain
> > > clearly why "those" seems less ambiguous than "these".)
> >
> > Ok, as a non-native english speaker I trust you that "those" is better.
>
> I can't tell that you're non-native. Perhaps leave it as is and let
> the RFC Editor review it.
>
> > > By implication, the leafref's value is considered to be a pointer to a
> > > particular leaf instance, the one with the matching value. But that
> > > idea is not embedded in the Yang semantics of leafref types in any way
> > > (other than the output of the deref function), so the fact that there
> > > might be more than one matching leaf instance does not matter.
> > >
> > > As stated in 9.9.4 and 9.9.5, the lexical representations of its values
> > > are the same as those of the referenced nodes.
> > >
> > > How is the leafref's value compared to the values of the referenced
> > > nodes? I can see that question getting ugly for the more complex types
> > > (e.g., anyxml)
> >
> > You can't have a leafref to an anyxml node; just to a leaf or
> > leaf-list.
> >
> > > which do not have canonical forms. I suspect the
> > > intention is that values are equal if they have the same canonical form
> >
> > No, the idea is that they are equal if their *value* is equal,
> > regardless of the lexical representation.
>
> There are two types that don't have a canonical form, identityref and
> instance-identifier. It seems that comparisons in XPath expressions
> are inexact if the type doesn't have a canonical form (section 6.4).
> But if I understand you correctly, the implicit comparisons in leafref
> are done based on the abstract values involved, not the lexical
> representation.
Yes.
> > > The current ABNF doesn't allow for "+" for joining quoted strings.
> > > Also, it doesn't show that \" can be included in a double quoted string
> > > to include a literal ", and allows the string contents to continue --
> > > the current ABNF "DQUOTE string DQUOTE" matches "abcd\", despite that
> > > the latter is not a proper double-quoted string.
> >
> > Note that the prose text (within <...>) says "a string that
> > matches...". That string can be any YANG token string, for example
> > one of:
> >
> > "hello"
> > "he" + "llo"
>
> If I haven't gotten confused, you're referring to
>
> string = < an unquoted string as returned by >
> < the scanner, that matches the rule >
> < yang-string >
>
> yang-string = *yang-char
>
> ;; any Unicode or ISO/IEC 10646 character including tab, carriage
> ;; return, and line feed, but excluding the other C0 control
> ;; characters, the surrogate blocks, and the noncharacters.
> yang-char = %x09 / %x0A / %x0D / %x20-D7FF /
> ; exclude surrogate blocks %xD800-DFFF
> %xE000-FDCF / ; exclude noncharacters %xFDD0-FDEF
> %xFDF0-FFFD / ; exclude noncharacters %xFFFE-FFFF
> %x10000-1FFFD / ; exclude noncharacters %x1FFFE-1FFFF
> %x20000-2FFFD / ; exclude noncharacters %x2FFFE-2FFFF
> %x30000-3FFFD / ; exclude noncharacters %x3FFFE-3FFFF
> %x40000-4FFFD / ; exclude noncharacters %x4FFFE-4FFFF
> %x50000-5FFFD / ; exclude noncharacters %x5FFFE-5FFFF
> %x60000-6FFFD / ; exclude noncharacters %x6FFFE-6FFFF
> %x70000-7FFFD / ; exclude noncharacters %x7FFFE-7FFFF
> %x80000-8FFFD / ; exclude noncharacters %x8FFFE-8FFFF
> %x90000-9FFFD / ; exclude noncharacters %x9FFFE-9FFFF
> %xA0000-AFFFD / ; exclude noncharacters %xAFFFE-AFFFF
> %xB0000-BFFFD / ; exclude noncharacters %xBFFFE-BFFFF
> %xC0000-CFFFD / ; exclude noncharacters %xCFFFE-CFFFF
> %xD0000-DFFFD / ; exclude noncharacters %xDFFFE-DFFFF
> %xE0000-EFFFD / ; exclude noncharacters %xEFFFE-EFFFF
> %xF0000-FFFFD / ; exclude noncharacters %xFFFFE-FFFFF
> %x100000-10FFFD ; exclude noncharacters %x10FFFE-10FFFF
>
> But if that's taken at face value, you can lex as single "string"s not only
>
> "hello"
>
> and
>
> "he" + "llo"
>
> but also
>
> "The MTU of the interface."; myext:c-define "MY_MTU"
Why whould this be treated a single string? Note that line breaks are
not special in YANG, so by the same logic, this example:
description
"The MTU of the interface.";
reference
"RFC XYZ";
would be scanned as the three tokens:
description
"The MTU of the interface.";
reference
"RFC XYZ"
;
/martin
> Doing that would allow the incorrect lexing of
>
> leaf mtu {
> type uint32;
> description "The MTU of the interface."; myext:c-define "MY_MTU";
> }
>
> as having a long description (starting with 'The MTU' and ending with
> 'MY_MTU') and no myext:c-define statement.
>
> What we need is a production that matches "strings possibly combined
> with +" and nothing else. That is, including '"he" + "llo"' but not the
> last example.
>
> Dale
>
> _______________________________________________
> netmod mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/netmod
>
_______________________________________________
netmod mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/netmod