From: Juergen Schoenwaelder <[email protected]> > An unquoted string is any sequence of characters and does not > contain any space, tab, or newline characters, a single or double > quote character, a semicolon (";"), braces ("{" or "}"), or > comment sequences ("//", "/*", or "*/"). > > Note that any keyword can legally appear as an unquoted string.
That text seems to be quite clear to me. As Juergen notes, it directly points out to the implementor an important fact about lexing Yang. (Similarly, some programming languages allow keywords to be used as identifiers, they're not reserved.) Martin Bjorklund <[email protected]> writes: >> If I understand correctly, the tokens of Yang (as the term is usually >> used in programming languages) are: >> >> whitespace (which is ignored) >> comments (which is ignored) >> single-quoted strings >> double-quoted strings >> unquoted strings (including keywords) >> ; >> { >> } >> >> >From the point of view of the tokenizer, these tokens fall into the >> obvious classes: >> >> type unquoted string >> "type" double-quoted string >> abc unquoted-string >> "abc" double-quoted string >> '---' single-quoted string >> >> I'm not quite sure how they are classified from the parser's point of >> view, though. >> >> type "type" abc "abc" '---' >> >> Is a string? ? Y ? Y Y >> (Can it appear as the >> argument of "description"?) >> >> Is a keyword? Y ? N N N >> (Can it appear as the first >> token of some statement?) >> >> Is an identifier? Y ? Y ? N >> (Can it appear as the second >> token of a type statement?) >> >> Usually programming languages use the particular syntax of different >> types of tokens to determine where they can be used in the >> context-free grammar. Yang seems to be more relaxed, but I'm not sure >> whether it is so relaxed thay any of the types of string tokens can be >> used anywhere. > > No; a keyword must be written w/o quotes, so it is special. OK, that answers one '?'. But there are effectively two that remain: Is abc valid as a string? Is "abc" valid as an identifier? I *think* the answer is No for both of those, but I can't put my finger on the rules that make that definite. >> There are two types that don't have a canonical form, identityref and >> instance-identifier. It seems that comparisons in XPath expressions >> are inexact if the type doesn't have a canonical form (section 6.4). >> But if I understand you correctly, the implicit comparisons in leafref >> are done based on the abstract values involved, not the lexical >> representation. > > Yes. I'm willing to take that as understood. >> > > The current ABNF doesn't allow for "+" for joining quoted strings. >> > > Also, it doesn't show that \" can be included in a double quoted string >> > > to include a literal ", and allows the string contents to continue -- >> > > the current ABNF "DQUOTE string DQUOTE" matches "abcd\", despite that >> > > the latter is not a proper double-quoted string. >> > >> > Note that the prose text (within <...>) says "a string that >> > matches...". That string can be any YANG token string, for example >> > one of: >> > >> > "hello" >> > "he" + "llo" >> >> If I haven't gotten confused, you're referring to >> >> string = < an unquoted string as returned by > >> < the scanner, that matches the rule > >> < yang-string > >> >> yang-string = *yang-char >> >> ;; any Unicode or ISO/IEC 10646 character including tab, carriage >> ;; return, and line feed, but excluding the other C0 control >> ;; characters, the surrogate blocks, and the noncharacters. >> yang-char = %x09 / %x0A / %x0D / %x20-D7FF / >> ; exclude surrogate blocks %xD800-DFFF >> %xE000-FDCF / ; exclude noncharacters %xFDD0-FDEF >> %xFDF0-FFFD / ; exclude noncharacters %xFFFE-FFFF >> %x10000-1FFFD / ; exclude noncharacters %x1FFFE-1FFFF >> %x20000-2FFFD / ; exclude noncharacters %x2FFFE-2FFFF >> %x30000-3FFFD / ; exclude noncharacters %x3FFFE-3FFFF >> %x40000-4FFFD / ; exclude noncharacters %x4FFFE-4FFFF >> %x50000-5FFFD / ; exclude noncharacters %x5FFFE-5FFFF >> %x60000-6FFFD / ; exclude noncharacters %x6FFFE-6FFFF >> %x70000-7FFFD / ; exclude noncharacters %x7FFFE-7FFFF >> %x80000-8FFFD / ; exclude noncharacters %x8FFFE-8FFFF >> %x90000-9FFFD / ; exclude noncharacters %x9FFFE-9FFFF >> %xA0000-AFFFD / ; exclude noncharacters %xAFFFE-AFFFF >> %xB0000-BFFFD / ; exclude noncharacters %xBFFFE-BFFFF >> %xC0000-CFFFD / ; exclude noncharacters %xCFFFE-CFFFF >> %xD0000-DFFFD / ; exclude noncharacters %xDFFFE-DFFFF >> %xE0000-EFFFD / ; exclude noncharacters %xEFFFE-EFFFF >> %xF0000-FFFFD / ; exclude noncharacters %xFFFFE-FFFFF >> %x100000-10FFFD ; exclude noncharacters %x10FFFE-10FFFF >> >> But if that's taken at face value, you can lex as single "string"s not only >> >> "hello" >> >> and >> >> "he" + "llo" >> >> but also >> >> "The MTU of the interface."; myext:c-define "MY_MTU" > > Why whould this be treated a single string? Note that line breaks are > not special in YANG, so by the same logic, this example: > > description > "The MTU of the interface."; > reference > "RFC XYZ"; > > would be scanned as the three tokens: > > description > > "The MTU of the interface."; > reference > "RFC XYZ" > > ; Actually, I've gotten completely lost in regard to how the grammar specifies how strings can be written. Let me take a single example. There are these productions: organization-stmt = organization-keyword sep string stmtend string = < an unquoted string as returned by > < the scanner, that matches the rule > < yang-string > yang-string = *yang-char (I won't quote the production for yang-char.) As written, this seems to say that in an organization statement, there is the organization keyword, a sep, a string, and then a stmtend. But the text above says that a string is "an unquoted string as returned by the scanner, that matches the rule yang-string". But looking at the example: organization "Example Inc."; I see that the string involved is quoted, which doesn't seem to be allowed by the production for "string", because the production says "an unquoted string". Now I suspect that what the production for "string" really wants to be is "a quoted or unquoted string whose *value* matches the rule yang-string". That is, what is *written* isn't allowed to be any *yang-char, but the denoted value must be *yang-char. (Some of the other productions seem to have the same issue.) Another point which is confusing me is how quoted strings are specified by the grammar. The only appearance of DQUOTE is in quoted-string = (DQUOTE string DQUOTE) / (SQUOTE string SQUOTE) and the only appearance of quoted-string is in key-predicate-expr = node-identifier *WSP "=" *WSP quoted-string leaf-list-predicate-expr = "." *WSP "=" *WSP quoted-string which leaves me wondering how quoted strings are allowed elsewhere in the language. Dale _______________________________________________ netmod mailing list [email protected] https://www.ietf.org/mailman/listinfo/netmod
