Hi Branden, Season's greetings.
On 20.12.25 18:43, G. Branden Robinson wrote: > I believe you're thinking of shift/reduce versus reduce/reduce > conflicts. I've never heard of a shift/shift conflict. Yes, it doesn't do to type faster than one thinks. ... > My training is that reduce/reduce conflicts indicate a "broken" grammar, > one where the parser thinks it knows enough to reduce the symbol, but > has multiple rules for doing so, and can't decide between them. For > ease of both human and machine parsing, language designers have a bias > toward context-free grammars. Ya, well, in my limited experience, avoiding conflicts is far from straightforward. Having long thought C's ';' separator on every line a bit of fluff, I found reason to imitate in one of my grammars, where first switching from left to right recursion reduced: "source/hr2gc.y: conflicts: 47 shift/reduce, 5 reduce/reduce" to "source/hr2gc.y: conflicts: 26 shift/reduce, 4 reduce/reduce" But introducing a separator: commands: command | commands ';' command ; made it: source/hr2gc.y: conflicts: 5 shift/reduce, 3 reduce/reduce They say language forms our thinking. I find that yacc/bison forms the grammar. > (As I understand it, C famously has a deeply inherent reduce/reduce > conflict; most identifiers that aren't reserved words can reduce as type > names _or_ as symbol names. How did C survive this? Well, this was > Bell Labs, where people thought nothing of entangling their lexical > analysis with their parsing, as nroff did and does. The solution is > known as "the lexer hack".[1]) Taking a peek at my most recent lexer, from 2012, confirms that I tend to make it a state machine, changing lexer rules and actions as suits ... and, yes, I handle quoted text _entirely_ in the lexer - it seemed most natural. (It's then one token - zero grammar there. That's no hack, IME.) ... > Here is a notional example for contemplation purposes, not necessarily > resembling any real parser. Without context (or "lookahead"), the C > language would have shift/reduce conflicts with the "*" symbol I chose > above in the following contexts. > > const int b = 4 * 2; > int d = *c; > int f = e /* my comment */ + 2; > > ...but it doesn't, because any C parser has context. In the first > example an identifier token is a "terminal" symbol and is reduced, or > put onto the syntax tree.[2] > > In the second example, the parser encounters a star but the most recent > terminal symbol is not an identifier or literal, so, contextually, > multiplication is foreclosed, and the '*' reduces to a dereferencing > operator, which is unary and right-associative. My lexer changes state on comment-initiation, consumes till comment-end, and passes one token of type QUOTED_TEXT. So such problems cannot arise. (I do like simplicity.) ... > I have read that a parser for RFC 822 dates, the classic Unix date > format(s), has at least one shift/reduce conflict, but since no human or > program constructs dates in an ambiguous form, we simply live with any > shift/reduce conflicts our parser generators dutifully warn us about. Yes, well, ppic.ypp appears to have a bunch, even though the grammar subordinate to line 1456 looks harmless. (When I had a similar problem, flattening the grammar fixed it - dunno precisely why, though.) > That's some parser theory as told by a ham-fisted practitioner. > Compiler experts and CS professors might, as we speak, be drawing hot > baths with Calgon to take them away. In a previous life, I had to develop and institute a software development quality system, to achieve accreditation. One thing that the auditors taught me was "The people doing the work *are* the experts." It certainly ain't management. ... > [1] https://en.wikipedia.org/wiki/Lexer_hack > > This could have been avoided via the simple expedient of introducing > a prefix sigil to mark either symbol or type identifiers (the other > could be left unadorned). > > If I ever get my 1.2T mini electric time machine working, I'm going > to drive that thing across the country like Alvin F. Straight to > Murray Hill in 1973 and give Thompson and Ritchie a piece of my > mind. Here's how I imagine it going. > > GBR: It's gross to have `cc` check a list the parser has built of > already-seen type names to disambiguate identifiers. Just stick a > dollar sign in front of either type or variable names. It's easy. > > K&T: That sounds like more typing. More typing is bad. Haven't you > noticed that all the best Unix commands have only two-letter names? > And that one of us avoids pressing the shift key wherever possible? > > GBR: Fine, use the sigil only for type names. Types are named less > often than symbols in most code. > > K&T: No. It's still too much. > > GBR: Even though you guys don't use anything but a machine integer > for anything that isn't floating point or a struct? No Booleans, no > bounds on enums-- > > K&T: That's right. The very notion of sticking a dollar sign in > front of a type name is laughable. There's no money in strong > typing. > > GBR: ...uh, I'll just be getting back in my time machine now. > > K&T: That's right, kid--nobody should have to live through the > 16-bit address space era more than once. > > <vworp vworp vworp> They say that writing believable engaging dialogue is difficult - but you have it down pat! And your transition to action is instinctive. Nevertheless, I'm with them - probably due to having been habituated to absence of avoidable swear-word symbols. Borrowing a bunch of open-source hash-table code to to add efficient symbol table capability was quick and easy, I found. Don't try to offload the CPU, it's mostly idling anyway. ;-) Cheers, Erik P.S. Found the rear half of a 1.5 foot lizard in the septic system's transfer tank on Saturday, while swapping out the non-functioning pump. I think I know where the other half is. P.P.S. I'll hide here, the fact that in a Structured English to multi-threaded multi-state-machine translator, I did require all nouns to be capitalised (as in German), and all verbs to be defined, with noun association allowing verb overloading. (It was experimental - just 1800 lines of Awk, whose associative arrays solved symbol table lookup in half a line.)
