On Thu, Feb 6, 2025 at 10:43 PM Christoph M. Becker <cmbecke...@gmx.de> wrote: > > On 06.02.2025 at 20:24, Larry Garfield wrote: > > > On Thu, Feb 6, 2025, at 3:05 AM, Valentin Udaltsov wrote: > > > >> Are there any plans to upgrade the parser to bypass these limitations? > >> I remember Nikita shared some thoughts on why this is not trivial in > >> https://wiki.php.net/rfc/arrow_functions_v2. Maybe something has > >> changed since then? > > > > I'm not aware of any plans to change the parser. That would be a rather > > dramatic and invasive change. > > There have been ideas to use some more powerful features of bison[1], > like GLR, so that would not necessarily be a drastic and invasive > change. I'm not aware of any concrete plans, and these more powerful > features are not without downsides.
I don't think there's a big incentive to switch to a GLR parser right now. First off, I don't believe it actually solves the ambiguity problem we've described in this thread (`class C { public $prop = 42 is Foo{}; }`), which is not limited by lookahead, but is a full blown syntax ambiguity. *Technically* it could be solved in our current LALR(1) parser by duplicating the expr production, removing pattern matching in this production and using it solely for property initializers, but this is a bad long term solution. Secondly, single lookahead grammars are easier for machines and humans to understand. Unfortunately, it's hard to predict future syntax changes, but I believe we have managed to find acceptable compromises so far. It's worth noting that some newer languages also strive to avoid +1 lookahead grammars. As an example, see Rust's turbofish syntax (e.g. `Vec::<u32>`), used for generics in the general expression context to avoid confusion with `<` lower than comparison. Also worth noting: Switching to a GLR parser might cause a significant amount of work for nikic/PHP-Parser, which is based on ircmaxell/php-yacc, which can only generate LALR(1) parsers. It might cause even more problems for token-based tools. Sticking with the generics example, `[bar < Bar, Baz > ()]` will require a lot of scanning to understand whether to remove the spaces between bar and `<`. The `::<` turbofish syntax on the other hand immediately indicates generics. Anyway, it seems we have slightly gone off-topic. :) Ilija