On Thu, Feb 6, 2025 at 10:43 PM Christoph M. Becker <cmbecke...@gmx.de> wrote:
>
> On 06.02.2025 at 20:24, Larry Garfield wrote:
>
> > On Thu, Feb 6, 2025, at 3:05 AM, Valentin Udaltsov wrote:
> >
> >> Are there any plans to upgrade the parser to bypass these limitations?
> >> I remember Nikita shared some thoughts on why this is not trivial in
> >> https://wiki.php.net/rfc/arrow_functions_v2. Maybe something has
> >> changed since then?
> >
> > I'm not aware of any plans to change the parser.  That would be a rather 
> > dramatic and invasive change.
>
> There have been ideas to use some more powerful features of bison[1],
> like GLR, so that would not necessarily be a drastic and invasive
> change.  I'm not aware of any concrete plans, and these more powerful
> features are not without downsides.

I don't think there's a big incentive to switch to a GLR parser right
now. First off, I don't believe it actually solves the ambiguity
problem we've described in this thread (`class C { public $prop = 42
is Foo{}; }`), which is not limited by lookahead, but is a full blown
syntax ambiguity. *Technically* it could be solved in our current
LALR(1) parser by duplicating the expr production, removing pattern
matching in this production and using it solely for property
initializers, but this is a bad long term solution.

Secondly, single lookahead grammars are easier for machines and humans
to understand. Unfortunately, it's hard to predict future syntax
changes, but I believe we have managed to find acceptable compromises
so far. It's worth noting that some newer languages also strive to
avoid +1 lookahead grammars. As an example, see Rust's turbofish
syntax (e.g. `Vec::<u32>`), used for generics in the general
expression context to avoid confusion with `<` lower than comparison.

Also worth noting: Switching to a GLR parser might cause a significant
amount of work for nikic/PHP-Parser, which is based on
ircmaxell/php-yacc, which can only generate LALR(1) parsers. It might
cause even more problems for token-based tools. Sticking with the
generics example, `[bar < Bar, Baz > ()]` will require a lot of
scanning to understand whether to remove the spaces between bar and
`<`. The `::<` turbofish syntax on the other hand immediately
indicates generics.

Anyway, it seems we have slightly gone off-topic. :)

Ilija

Reply via email to