On Sat, Jul 6, 2024 at 4:25 AM Mike Schinkel <m...@newclarity.net> wrote:

>
> On Jul 5, 2024, at 1:11 PM, Claude Pache <claude.pa...@gmail.com> wrote:
>
> Le 25 juin 2024 à 16:36, Gina P. Banyard <intern...@gpb.moe> a écrit :
> https://wiki.php.net/rfc/deprecations_php_8_4
>
>
> * About strtok(): An exact replacement of `strtok()` that is reasonably
> performant may be constructed with a sequence of strspn(...) and
> strcspn(...) calls; here is an implementation using a generator in order to
> keep the state: https://3v4l.org/926tC
>
>
> Well your modern_strtok() function is not an _exact_ replacement as it
> requires using a generator and thus forces the restructure of the code that
> calls strtok().
>
> So not a drop-in — search-and-replace — replacement for strtok(). But it
> is a reasonable replacement for those who are motivated to do the
> restructure.
>
>
>
I looked a bit into this and, taking the idea further, let's also consider
defining a StringTokenizer class:

class StringTokenizer {
    private \Generator $tokenGenerator;
    public function __construct(public readonly string $string) {
    }

    public function nextToken(string $characters): string|null {
        if (!isset($this->tokenGenerator)) {
            $this->tokenGenerator = $this->generator($characters);
            return $this->tokenGenerator->current();
        }
        return $this->tokenGenerator->send($characters);
    }

    private function generator(string $characters): \Generator {
        $pos = 0;
        while (true) {
            $pos += \strspn($this->string, $characters, $pos);
            $len = \strcspn($this->string, $characters, $pos);
            if (!$len)
                return;
            $token = \substr($this->string, $pos, $len);
            $characters = yield $token;
            $pos += $len;
        }
    }
}


and if we define a wrapper function:

function strtok2(string $string, ?string $token = null): string|false {
    static $tokenizer = null;
    if ($token) {
        $tokenizer = new StringTokenizer($string);
        return $tokenizer->nextToken($token) ?? false;
    }
    if (!isset($tokenizer)) {
        return false;
    }
    return $tokenizer->nextToken($string) ?? false;
}


I think that this might be a perfect replacement.

If we want, we could implement the StringTokenizer in the core, so that it
would be a nice replacement.

If we don't want to do this at this stage, we can completely avoid the
class for now, using an anonymous class:

function strtok2(string $string, ?string $token = null): string|false {
    static $tokenizer = null;
    if ($token) {
        $tokenizer = new class($string)  {
            private \Generator $tokenGenerator;
            public function __construct(public readonly string $string) {
            }
            public function nextToken(string $characters): string|null {
                if (!isset($this->tokenGenerator)) {
                    $this->tokenGenerator = $this->generator($characters);
                    return $this->tokenGenerator->current();
                }
                return $this->tokenGenerator->send($characters);
            }
            private function generator(string $characters): \Generator {
                $pos = 0;
                while (true) {
                    $pos += \strspn($this->string, $characters, $pos);
                    $len = \strcspn($this->string, $characters, $pos);
                    if (!$len)
                        return;
                    $token = \substr($this->string, $pos, $len);
                    $characters = yield $token;
                    $pos += $len;
                }
            }
        };
        return $tokenizer->nextToken($token) ?? false;
    }
    if (!isset($tokenizer)) {
        return false;
    }
    return $tokenizer->nextToken($string) ?? false;
}

What do you think?
Mike, would you mind benchmarking this as well to make sure it's similarly
fast with the initial suggestion from Claude?

I'm hoping this can be simplified further, but to get to the point, I also
think we should have a userland replacement suggestion in the RFC.
And, ideally, we should have a class that can replace it in PHP 9.0,
similar to the above StringTokenizer.

Regards,
Alex

Reply via email to