On 19-7-2024 1:09, Bob Weinand wrote:
Hey Christoph,

Am 19.07.2024 um 00:51 schrieb Christoph M. Becker <cmbecke...@gmx.de>:

Hi Bob!

On 18.07.2024 at 15:41, Bob Weinand wrote:

Moreover, it can - at least - be worked around in tooling by special casing the T_YIELD_FROM token and extracting the comment from the raw parsed string:

var_dump(token_get_all('<?php yield /* comment */ from $foo;'));

will contain:

[1]=> array(3) { [0]=> int(270) [1]=> string(24) "yield /* comment */ from" [2]=> int(1) }

It's not optimal, but probably the least bad solution to leave it unchanged in PHP 8.3, have tooling special case it and properly fix it in PHP 8.4.

And what about "code" like <https://3v4l.org/4CLhM>?  Is Codesniffer
supposed to scan the result of <https://3v4l.org/dKDcs> for possible CS
violations?

Cheers,
Christoph

I suppose you mean https://3v4l.org/IMi8Y, (you missed the <?php tag).
If you want to scan that, it's quite easy to strip the leading yield and trailing from, and tokenize that again to extract all comments.

Sure, it's a hack, but it'll work: https://3v4l.org/8eAiV.

Bob


Hi Bob,

Of course, everything can be hacked around, but that still leaves the question what should be the "proper tokenization". Having this change in PHP 8.3 and then - as you suggest - yet another in PHP 8.4, makes it mighty hard to have a consistent token stream in tooling, especially as it is unclear what the "proper tokenization" should/would be.

More than anything, I find it concerning that this change sets a precedent for tokens to include comments.

Just as an example: what does this mean for the PHP 8.0 nullsafe object operator ? Should we now suddenly allow that to be written as `? /*comment*/ ->` ? Or what about a cast token ? Should that be allowed to be `(string /*for reasons*/)` ?

Allowing this change to stay in, without having the discussion about what the "proper tokenization" should be, feels off and random to me and opens the door for more random changes.

As for the impact on tooling: a change in the tokenization of any token has an impact not only on tooling like PHPCS itself, but also on every single external standard build on top of it and is a breaking change. To give you some perspective - for PHPCS we even went as far as to "undo" the PHP 8.0 tokenization of namespaced names for the time being (in the PHPCS 3.x releases) and we'll only change the PHPCS tokenizer to use the PHP 8.0 tokenization in the PHPCS 4.0 release as it would otherwise break too many existing sniffs. [1]

Smile,
Juliette

1: https://github.com/squizlabs/PHP_CodeSniffer/issues/3041

Reply via email to