Hi Erik, Happy New Year!
At 2025-12-30T02:00:17+0000, dvalin--- via GNU roff typesetting system
discussion wrote:
> > My training is that reduce/reduce conflicts indicate a "broken"
> > grammar, one where the parser thinks it knows enough to reduce the
> > symbol, but has multiple rules for doing so, and can't decide
> > between them. For ease of both human and machine parsing, language
> > designers have a bias toward context-free grammars.
>
> Ya, well, in my limited experience, avoiding conflicts is far from
> straightforward. Having long thought C's ';' separator on every line a
> bit of fluff, I found reason to imitate in one of my grammars, where
> first switching from left to right recursion reduced:
>
> "source/hr2gc.y: conflicts: 47 shift/reduce, 5 reduce/reduce" to
> "source/hr2gc.y: conflicts: 26 shift/reduce, 4 reduce/reduce"
>
> But introducing a separator: commands: command
>
> | commands ';' command
>
> ;
>
> made it:
>
> source/hr2gc.y: conflicts: 5 shift/reduce, 3 reduce/reduce
>
> They say language forms our thinking. I find that yacc/bison forms the
> grammar.
Prompted by our recent discussion I went back and read the original YACC
paper (and had a light bulb moment about dc(1), but I'll save that story
for another time). There's a lot in there I had forgotten about. An
important element is that YACC uses the ordering of rules to lend
priority to productions, which can silently resolve some conflicts.
> > (As I understand it, C famously has a deeply inherent reduce/reduce
> > conflict; most identifiers that aren't reserved words can reduce as
> > type names _or_ as symbol names. How did C survive this? Well,
> > this was Bell Labs, where people thought nothing of entangling their
> > lexical analysis with their parsing, as nroff did and does. The
> > solution is known as "the lexer hack".[1])
>
> Taking a peek at my most recent lexer, from 2012, confirms that I tend
> to make it a state machine, changing lexer rules and actions as suits
> ... and, yes, I handle quoted text _entirely_ in the lexer - it seemed
> most natural. (It's then one token - zero grammar there. That's no
> hack, IME.)
I'll take your word for it. :)
> My lexer changes state on comment-initiation, consumes till
> comment-end, and passes one token of type QUOTED_TEXT. So such
> problems cannot arise. (I do like simplicity.)
Simplicity is a lodestar. We must guard it jealously.
Simplifying GNU troff's grammar has been a (largely unplanned) theme of
my work in the 1.24.0 release cycle.
I know I'm gonna get yelled at for it, though.
> > I have read that a parser for RFC 822 dates, the classic Unix date
> > format(s), has at least one shift/reduce conflict, but since no
> > human or program constructs dates in an ambiguous form, we simply
> > live with any shift/reduce conflicts our parser generators dutifully
> > warn us about.
>
> Yes, well, ppic.ypp appears to have a bunch, even though the grammar
> subordinate to line 1456 looks harmless. (When I had a similar
> problem, flattening the grammar fixed it - dunno precisely why,
> though.)
Maybe I'll learn pic's grammar this year. Doug is giving me a gentle
nudge in that direction by creating a Savannah user account and filing a
ticket against groff's version of it. :)
> > That's some parser theory as told by a ham-fisted practitioner.
> > Compiler experts and CS professors might, as we speak, be drawing
> > hot baths with Calgon to take them away.
>
> In a previous life, I had to develop and institute a software
> development quality system, to achieve accreditation. One thing that
> the auditors taught me was "The people doing the work *are* the
> experts." It certainly ain't management.
Agreed. My favorite job, though, was at a research lab. I think you
need both scientists and engineers, because each can anticipate
different ways that things won't work.
Junior practitioners are full of ideas, overly optimistic notions, and
unrealistic notions of scheduling, which apart from the cheaper wages
they tolerate is why bad managers love working with them. They provide
plenty of smoke a first-line manager can inhale and then go blow up the
rears of the bigger bosses. And when the schedule slips, as it almost
always does, the junior engineers are there to take the blame for their
poor estimation skills. When you're the manager and it comes time to
lay off staff, they've already given you all the excuse you need to chop
them--you don't need to do any thinking. Thinking bad!
That said, once in a while a practitioner, junior or otherwise, comes up
with a neat idea. At its best, one of these is a win for everybody.
Management gets a deliverable, the practitioner gets satisfaction and
(ideally) peer respect, and the scientists have something new they can
adapt to purposes unforeseen by others and--the best part--get to try to
figure out how they can break it.
At one point when feeling unchallenged at work, I started going back to
school for a degree in applied math. I didn't reach my goal but I
learned a lot and increased my (still meager) capabilities. I like to
keep a foot in each "camp" of "theory" and "practice", which in a good
environment are not opposed as is popularly depicted. (You've gotta be
careful of philosophical dualists. Trust only the non-dualists.)
> They say that writing believable engaging dialogue is difficult - but
> you have it down pat!
Thank you! For that and for overlooking my failure to write "R&T" or
"T&R". I'm over-accustomed to sticking Kernighan's initial in things.
Like many, I'm a frustrated writer.
"Everybody does have a book in them, but in most cases that's where it
should stay." -- Christopher Hitchens
> Nevertheless, I'm with them - probably due to having been habituated
> to absence of avoidable swear-word symbols.
Well, hell, you should have been writing in Ada, not C. ;-)
> Borrowing a bunch of open-source hash-table code to to add efficient
> symbol table capability was quick and easy, I found. Don't try to
> offload the CPU, it's mostly idling anyway. ;-)
I think Thompson's many, many admirers would disagree. He hashed
everything he encountered on the front end as a first resort. That
doesn't detract from any of his accomplishments; I suspect that in many
cases, his habits of thought enabled those accomplishments.[1]
> P.S. Found the rear half of a 1.5 foot lizard in the septic system's
> transfer tank on Saturday, while swapping out the non-functioning
> pump. I think I know where the other half is.
A twist on the classic case of sabotage! Critter was using its head.
> P.P.S. I'll hide here, the fact that in a Structured English to
> multi-threaded multi-state-machine translator, I did require all nouns
> to be capitalised (as in German), and all verbs to be defined, with
> noun association allowing verb overloading. (It was experimental -
> just 1800 lines of Awk, whose associative arrays solved symbol table
> lookup in half a line.)
I find AWK fascinating. To me, it seems close to a sweet spot in the
programming language design space. I should do some more thinking and
figure out why. Maybe it has enough features to be widely adaptable to
problems but at the same time it refuses to yield to the siren song of a
module system or "standard library". It seems inevitable that any
language offering that facility becomes weighted down by reference
material. In my career I've been disappointed that most of the fun of
learning a PL is front-loaded. You get the basic concepts/modeling/
"paradigms" and, often, you can see how it's good for something and can
help you solve problems. (And other people have seen that too, which is
how you hear of it in the first place.)
But, to "succeed", a language needs to be "enterpriseable", meaning it
needs a big standard library because we can't have engineers reinventing
wheels all the time. (Well, we can, but engineers simply must be
directed by their betters--by members of the professional-managerial
class with access to funds. Thus Java, Go, Rust.)
And that's not even wrong. No, usually, you _don't_ want engineers
reinventing wheels all the time. Except where a language's unique
features enable new wheels. But the PMCs never know where those places
are going to be--maybe no one does. The PMCs may do more poorly than
chance because they look where the venture capitalists and private
equity firms _want_ to find them. And since those people mostly ape
each other's ideas, everybody crowds into the same few spaces--or just
one. They simultaneously congratulate themselves as "innovators".
Almost every small language bloats into a big one. Even my beloved Ada
was once (on the chunky side of) lean. Niklaus Wirth kept inventing new
languages, I guess pursuing an instinct to escape that fate. People
will say that didn't happen to C, but it did. The language's authors
simply quit publishing on their creation.
But AWK? AWK hasn't bloated. Maybe Perl "stole its thunder", by which
we mean "took the bullet".
Or--maybe--AWK occupies, or is near propinquity to, that really sweet
spot in n-dimensional programming language design space I spoke of.
So if I ever write my own AWK, I have a name for it. AWKupy.
Mascot? A porcupine with a Guy Fawkes mask.
Regards,
Branden
[1] Unlike Thompson, I'm useless at chess. As soon as I learned, pretty
young, how much came from studying patterns of masters, with many
techniques named for the same, or for geographical names like wines
or cheeses, I became disaffected. When I stumbled upon
"Fischerandom" chess, a.k.a. Chess960, that seemed much more
interesting, but I've still never played that variant. It's
nevertheless more appealing to the sort of mental skills that I
think chess _should_ be exercising. Of course as a non-practitioner
in that area, my opinion is worth nothing. But I wonder what sort
of prodigies we might uncover in the chess clubs of the schools if
we cultivated _those_ skills. On the other hand, someone will
no doubt point out that we have AIs for solving chess now.
signature.asc
Description: PGP signature
