Hi Branden,

Season's greetings.

On 20.12.25 18:43, G. Branden Robinson wrote:

> I believe you're thinking of shift/reduce versus reduce/reduce

> conflicts.  I've never heard of a shift/shift conflict.

Yes, it doesn't do to type faster than one thinks.

...

> My training is that reduce/reduce conflicts indicate a "broken" grammar,

> one where the parser thinks it knows enough to reduce the symbol, but

> has multiple rules for doing so, and can't decide between them.  For

> ease of both human and machine parsing, language designers have a bias

> toward context-free grammars.

Ya, well, in my limited experience, avoiding conflicts is far from 
straightforward. Having long thought C's ';' separator on every line a bit of 
fluff, I found reason to imitate in one of my grammars, where first switching 
from left to right recursion reduced:

"source/hr2gc.y: conflicts: 47 shift/reduce, 5 reduce/reduce"        to

"source/hr2gc.y: conflicts: 26 shift/reduce, 4 reduce/reduce"

But introducing a separator:     commands:   command

|   commands ';' command

;

made it:

source/hr2gc.y: conflicts: 5 shift/reduce, 3 reduce/reduce

They say language forms our thinking. I find that yacc/bison forms the

grammar.

> (As I understand it, C famously has a deeply inherent reduce/reduce

> conflict; most identifiers that aren't reserved words can reduce as type

> names _or_ as symbol names.  How did C survive this?  Well, this was

> Bell Labs, where people thought nothing of entangling their lexical

> analysis with their parsing, as nroff did and does.  The solution is

> known as "the lexer hack".[1])

Taking a peek at my most recent lexer, from 2012, confirms that I tend to make 
it a state machine, changing lexer rules and actions as suits ... and, yes, I 
handle quoted text _entirely_ in the lexer - it seemed most natural. (It's then 
one token - zero grammar there. That's no hack, IME.)

...

> Here is a notional example for contemplation purposes, not necessarily

> resembling any real parser.  Without context (or "lookahead"), the C

> language would have shift/reduce conflicts with the "*" symbol I chose

> above in the following contexts.

>

>   const int b = 4 * 2;

>   int d = *c;

>   int f = e /* my comment */ + 2;

>

> ...but it doesn't, because any C parser has context.  In the first

> example an identifier token is a "terminal" symbol and is reduced, or

> put onto the syntax tree.[2]

>

> In the second example, the parser encounters a star but the most recent

> terminal symbol is not an identifier or literal, so, contextually,

> multiplication is foreclosed, and the '*' reduces to a dereferencing

> operator, which is unary and right-associative.

My lexer changes state on comment-initiation, consumes till comment-end, and 
passes one token of type QUOTED_TEXT. So such problems cannot arise. (I do like 
simplicity.)

...

> I have read that a parser for RFC 822 dates, the classic Unix date

> format(s), has at least one shift/reduce conflict, but since no human or

> program constructs dates in an ambiguous form, we simply live with any

> shift/reduce conflicts our parser generators dutifully warn us about.

Yes, well, ppic.ypp appears to have a bunch, even though the grammar 
subordinate to line 1456 looks harmless. (When I had a similar problem, 
flattening the grammar fixed it - dunno precisely why, though.)

> That's some parser theory as told by a ham-fisted practitioner.

> Compiler experts and CS professors might, as we speak, be drawing hot

> baths with Calgon to take them away.

In a previous life, I had to develop and institute a software development 
quality system, to achieve accreditation. One thing that the auditors taught me 
was "The people doing the work *are* the experts." It certainly ain't 
management.

...

> [1] https://en.wikipedia.org/wiki/Lexer_hack

>

>     This could have been avoided via the simple expedient of introducing

>     a prefix sigil to mark either symbol or type identifiers (the other

>     could be left unadorned).

>

>     If I ever get my 1.2T mini electric time machine working, I'm going

>     to drive that thing across the country like Alvin F. Straight to

>     Murray Hill in 1973 and give Thompson and Ritchie a piece of my

>     mind.  Here's how I imagine it going.

>

>     GBR: It's gross to have `cc` check a list the parser has built of

>     already-seen type names to disambiguate identifiers.  Just stick a

>     dollar sign in front of either type or variable names.  It's easy.

>

>     K&T: That sounds like more typing.  More typing is bad.  Haven't you

>     noticed that all the best Unix commands have only two-letter names?

>     And that one of us avoids pressing the shift key wherever possible?

>

>     GBR: Fine, use the sigil only for type names.  Types are named less

>     often than symbols in most code.

>

>     K&T: No.  It's still too much.

>

>     GBR: Even though you guys don't use anything but a machine integer

>     for anything that isn't floating point or a struct?  No Booleans, no

>     bounds on enums--

>

>     K&T: That's right.  The very notion of sticking a dollar sign in

>     front of a type name is laughable.  There's no money in strong

>     typing.

>

>     GBR: ...uh, I'll just be getting back in my time machine now.

>

>     K&T: That's right, kid--nobody should have to live through the

>     16-bit address space era more than once.

>

>     <vworp vworp vworp>

They say that writing believable engaging dialogue is difficult - but you have 
it down pat! And your transition to action is instinctive.

Nevertheless, I'm with them - probably due to having been habituated to absence 
of avoidable swear-word symbols. Borrowing a bunch of open-source hash-table 
code to to add efficient symbol table capability was quick and easy, I found. 
Don't try to offload the CPU, it's mostly idling anyway. ;-)

Cheers,

Erik

P.S. Found the rear half of a 1.5 foot lizard in the septic system's

transfer tank on Saturday, while swapping out the non-functioning pump.

I think I know where the other half is.

P.P.S. I'll hide here, the fact that in a Structured English to multi-threaded 
multi-state-machine translator, I did require all nouns to be capitalised (as 
in German), and all verbs to be defined, with noun association allowing verb 
overloading. (It was experimental - just 1800 lines of Awk, whose associative 
arrays solved symbol table lookup in half a line.)
  • Re: shift/reduce conf... dvalin--- via GNU roff typesetting system discussion

Reply via email to