Hi Rob

On Sat, Nov 23, 2024 at 2:12 PM Rob Landers <rob@bottled.codes> wrote:
>
> Born from the Records RFC (https://wiki.php.net/rfc/records) discussion, I 
> would like to introduce to you a competing RFC: Data Classes 
> (https://wiki.php.net/rfc/dataclass).

As others have pointed out, your RFC is very similar to my proposal
for struct. I don't quite understand the reason to compete and race
each other to the finish line. Combined efforts are usually better.

One of the bigger differences between our proposals is the addition of
mutating methods in my proposal compared to yours. You show the
following example in your RFC:

```php
data class Rectangle {
    public function __construct(public int $width, public int $height) {}

    public function resize(int $width, int $height): static {
        $this->height = $height;
        $this->width = $width;
        return $this;
    }
}
```

The resize method here modifies the instance and thus implicitly
creates a copy. That's _fine_ for such a small structure. However,
note that this still leads to the performance issues we have
previously discussed for growable data structures.

```php
data class Vector {
    public function append(mixed $value): static {
        /* Internal implementation, $values is some underlying storage. */
        $this->values[] = $value;
        return $this;
    }
}
```

Calling `$vector->append(42);` will increase the refcount of
`$vector`, and cause separation on `$this->values[] = ...;`. If
`$vector->values` is a big storage, cloning will be very expensive.
Hence, appending becomes an O(n) operation (because each element in
the vector is copied to the new structure), and hence appending to an
array in a loop will tank your performance.  That's the reason for the
introduction of the `$vector->append!(42)` syntax in my proposal. It
separates the value at call-site when necessary, and avoids separation
on `$this` in methods altogether.

There might be some general confusion on the performance issue. In one
of your e-mails in the last thread, you have mentioned:

> Like Ilija mentioned in their email, there are significant performance 
> optimizations to be had here that are simply not possible using regular 
> (readonly) classes. I didn't go into detail as to how it works because it 
> feels like an implementation detail, but I will spend some time distilling 
> this and its consequences, into the RFC, over the coming days. As a simple 
> illustration, there can be significant memory usage improvements:
>
> 100,000 arrays: https://3v4l.org/Z4CcV
> 100,000 readonly classes: https://3v4l.org/1vhNp

First off, the array example only uses less memory because [1, 2] is a
constant array. When you make it dynamic, they will become way less
efficient than objects. https://3v4l.org/pETM9

But this is not the point I was trying to make either. Rather, when it
comes to immutable, growable data structures, every mutation becomes
an extremely expensive operation because the entire data structure,
including its underlying storage, needs to be copied. For example:

https://3v4l.org/BEsYT

```php
class Vector {
    private $values;

    public function populate() {
        $this->values = range(1, 1_000_000);
    }

    public function appendMutable() {
        $this->values[] = 100_000_001;
    }

    public function appendImmutable() {
        $new = clone $this;
        $this->values[] = 100_000_001;
    }
}
```

> appendMutable(): float(8.106231689453125E-6)
> appendImmutable(): float(0.012187957763671875)

That's a factor of 1 500 difference for an array containing 1 million
numbers. Obviously, concrete numbers will vary, but the problem grows
the bigger the array becomes.

Ilija

Reply via email to