On Mon, 22 Oct 2007, Wojciech Polak wrote: > On 2007-10-21 at 17:46 -0400, Joel E. Denny wrote: > > > Currently, Bison puts a terminal's user number (the one returned by yylex) > > in its XML "number" attribute. I think we should rename that to > > "user-number" and add a "number" attribute for Bison's internal symbol > > number. This would be more consistent with nonterminals. > > I'd be happy the write the patch. Is all this agreeable to you, Wojciech? > > Can you write more about the practical goal (and its further usage) > of having two numbers, especially Bison's internal symbol number? > Maybe it's okay to switch, but to have only one kind of number, > thus changing nonterminal, and not terminal?
While terminals have both user numbers and internal numbers, nonterminals only have internal numbers. Thus, the only way to change the nonterminal element that I know of is to eliminate its @number altogether. Is that what you mean? On the one hand, I suppose we could argue that the user never really needs to know any of the symbol numbers for normal and clean usage of the generated parser. On the other hand, when developing and debugging Bison's front end, I know I've found all the numbers useful at different times. The user might find them helpful during low-level debugging of the generated parser code as well. At the moment, I'm mainly bothered that @number isn't guaranteed to have a unique value for each symbol since it seems like it should. If we make the change I'm suggesting, it will. Unique @number values are important if someone wants to use @number rather than @name for symbol references. For example, consider a URI fragment identifier (like s103 in http://www.example.com/index.html#s103). @name might be long and it might contain special characters that would have to be escaped in order to be placed there. @number usually requires less space and could be placed there with no extra processing. Of course, the user could use "n" and "t" prefixes to make @number based fragment identifiers unique, but my point is that it seems unintuitive that @number isn't already unique. I suppose the user could use generate-id() or position() instead of @number in that scenario. However, I'm guessing there might be situations when the user is debugging with the aid of some custom report he generated from Bison's XML. It might be less confusing if the number representing a symbol is guaranteed to be consistent between his customized report and the C parser tables he's examining. Maybe. Researchers have been known to instrument Bison and its generated parsers for various purposes. They might find the numbers in the XML output useful for generating code that depends on the C parser tables. Well, I'm brainstorming, so some of my arguments may be flimsy. In general, it seems like there are scenarios when it would be more convenient, more consistent, and cleaner for the user to be able to access all the symbol numbers than to have to resort to other techniques. What do you think?
