[email protected] (Dale R. Worley) wrote: > From: Juergen Schoenwaelder <[email protected]> > > An unquoted string is any sequence of characters and does not > > contain any space, tab, or newline characters, a single or double > > quote character, a semicolon (";"), braces ("{" or "}"), or > > comment sequences ("//", "/*", or "*/"). > > > > Note that any keyword can legally appear as an unquoted string. > > That text seems to be quite clear to me. As Juergen notes, it directly > points out to the implementor an important fact about lexing Yang. > (Similarly, some programming languages allow keywords to be used as > identifiers, they're not reserved.)
Ok, let's use Juergen's proposed text. > Martin Bjorklund <[email protected]> writes: > >> If I understand correctly, the tokens of Yang (as the term is usually > >> used in programming languages) are: > >> > >> whitespace (which is ignored) > >> comments (which is ignored) > >> single-quoted strings > >> double-quoted strings > >> unquoted strings (including keywords) > >> ; > >> { > >> } > >> > >> >From the point of view of the tokenizer, these tokens fall into the > >> obvious classes: > >> > >> type unquoted string > >> "type" double-quoted string > >> abc unquoted-string > >> "abc" double-quoted string > >> '---' single-quoted string > >> > >> I'm not quite sure how they are classified from the parser's point of > >> view, though. > >> > >> type "type" abc "abc" '---' > >> > >> Is a string? ? Y ? Y Y > >> (Can it appear as the > >> argument of "description"?) > >> > >> Is a keyword? Y ? N N N > >> (Can it appear as the first > >> token of some statement?) > >> > >> Is an identifier? Y ? Y ? N > >> (Can it appear as the second > >> token of a type statement?) > >> > >> Usually programming languages use the particular syntax of different > >> types of tokens to determine where they can be used in the > >> context-free grammar. Yang seems to be more relaxed, but I'm not sure > >> whether it is so relaxed thay any of the types of string tokens can be > >> used anywhere. > > > > No; a keyword must be written w/o quotes, so it is special. > > OK, that answers one '?'. But there are effectively two that remain: > > Is abc valid as a string? Yes. > Is "abc" valid as an identifier? Yes. All YANG statements have the same structure: statement = keyword [argument] (";" / "{" *statement "}") All arguments are strings. Strings can be unquoted or quoted (and possibly concatenated). > I *think* the answer is No for both of those, but I can't put my finger > on the rules that make that definite. > > >> There are two types that don't have a canonical form, identityref and > >> instance-identifier. It seems that comparisons in XPath expressions > >> are inexact if the type doesn't have a canonical form (section 6.4). > >> But if I understand you correctly, the implicit comparisons in leafref > >> are done based on the abstract values involved, not the lexical > >> representation. > > > > Yes. > > I'm willing to take that as understood. > > >> > > The current ABNF doesn't allow for "+" for joining quoted strings. > >> > > Also, it doesn't show that \" can be included in a double quoted string > >> > > to include a literal ", and allows the string contents to continue -- > >> > > the current ABNF "DQUOTE string DQUOTE" matches "abcd\", despite that > >> > > the latter is not a proper double-quoted string. > >> > > >> > Note that the prose text (within <...>) says "a string that > >> > matches...". That string can be any YANG token string, for example > >> > one of: > >> > > >> > "hello" > >> > "he" + "llo" > >> > >> If I haven't gotten confused, you're referring to > >> > >> string = < an unquoted string as returned by > > >> < the scanner, that matches the rule > > >> < yang-string > > >> > >> yang-string = *yang-char > >> > >> ;; any Unicode or ISO/IEC 10646 character including tab, carriage > >> ;; return, and line feed, but excluding the other C0 control > >> ;; characters, the surrogate blocks, and the noncharacters. > >> yang-char = %x09 / %x0A / %x0D / %x20-D7FF / > >> ; exclude surrogate blocks %xD800-DFFF > >> %xE000-FDCF / ; exclude noncharacters %xFDD0-FDEF > >> %xFDF0-FFFD / ; exclude noncharacters %xFFFE-FFFF > >> %x10000-1FFFD / ; exclude noncharacters %x1FFFE-1FFFF > >> %x20000-2FFFD / ; exclude noncharacters %x2FFFE-2FFFF > >> %x30000-3FFFD / ; exclude noncharacters %x3FFFE-3FFFF > >> %x40000-4FFFD / ; exclude noncharacters %x4FFFE-4FFFF > >> %x50000-5FFFD / ; exclude noncharacters %x5FFFE-5FFFF > >> %x60000-6FFFD / ; exclude noncharacters %x6FFFE-6FFFF > >> %x70000-7FFFD / ; exclude noncharacters %x7FFFE-7FFFF > >> %x80000-8FFFD / ; exclude noncharacters %x8FFFE-8FFFF > >> %x90000-9FFFD / ; exclude noncharacters %x9FFFE-9FFFF > >> %xA0000-AFFFD / ; exclude noncharacters %xAFFFE-AFFFF > >> %xB0000-BFFFD / ; exclude noncharacters %xBFFFE-BFFFF > >> %xC0000-CFFFD / ; exclude noncharacters %xCFFFE-CFFFF > >> %xD0000-DFFFD / ; exclude noncharacters %xDFFFE-DFFFF > >> %xE0000-EFFFD / ; exclude noncharacters %xEFFFE-EFFFF > >> %xF0000-FFFFD / ; exclude noncharacters %xFFFFE-FFFFF > >> %x100000-10FFFD ; exclude noncharacters %x10FFFE-10FFFF > >> > >> But if that's taken at face value, you can lex as single "string"s not only > >> > >> "hello" > >> > >> and > >> > >> "he" + "llo" > >> > >> but also > >> > >> "The MTU of the interface."; myext:c-define "MY_MTU" > > > > Why whould this be treated a single string? Note that line breaks are > > not special in YANG, so by the same logic, this example: > > > > description > > "The MTU of the interface."; > > reference > > "RFC XYZ"; > > > > would be scanned as the three tokens: > > > > description > > > > "The MTU of the interface."; > > reference > > "RFC XYZ" > > > > ; > > Actually, I've gotten completely lost in regard to how the grammar > specifies how strings can be written. The grammar doesn't specify how strings can be written. This is defined in the section "Lexical Tokenization": unquoted quoted quoted and concatenated > Let me take a single example. > There are these productions: > > organization-stmt = organization-keyword sep string stmtend > > string = < an unquoted string as returned by > > < the scanner, that matches the rule > > < yang-string > > > yang-string = *yang-char > > (I won't quote the production for yang-char.) As written, this seems to > say that in an organization statement, there is the organization > keyword, a sep, a string, and then a stmtend. But the text above says > that a string is "an unquoted string as returned by the scanner This means that in all these examples, the resulting string is the same: hello "hello" 'hello' 'he' + "llo" and the resulting string is a string with 5 characters. >, that > matches the rule yang-string". But looking at the example: > > organization "Example Inc."; > > I see that the string involved is quoted, which doesn't seem to be > allowed by the production for "string", because the production says "an > unquoted string". > > Now I suspect that what the production for "string" really wants to be > is "a quoted or unquoted string whose *value* matches the rule > yang-string". That is, what is *written* isn't allowed to be any > *yang-char, but the denoted value must be *yang-char. Yes, but it is more than quoted or unquoted; it is also concatenation, escape char substition and whitespace trimming. > (Some of the > other productions seem to have the same issue.) > > Another point which is confusing me is how quoted strings are specified > by the grammar. The only appearance of DQUOTE is in > > quoted-string = (DQUOTE string DQUOTE) / (SQUOTE string SQUOTE) > > and the only appearance of quoted-string is in > > key-predicate-expr = node-identifier *WSP "=" *WSP quoted-string > > leaf-list-predicate-expr = "." *WSP "=" *WSP quoted-string Note that these are quoted strings *after* the scanner's processing of the input, so e.g., you might have: path '/foo[bar="hi"]'; /martin > > which leaves me wondering how quoted strings are allowed elsewhere in > the language. > > Dale > _______________________________________________ netmod mailing list [email protected] https://www.ietf.org/mailman/listinfo/netmod
