On Sat, Nov 23, 2024, at 23:10, Ilija Tovilo wrote: > Hi Rob > > On Sat, Nov 23, 2024 at 2:12 PM Rob Landers <rob@bottled.codes> wrote: > > > > Born from the Records RFC (https://wiki.php.net/rfc/records) discussion, I > > would like to introduce to you a competing RFC: Data Classes > > (https://wiki.php.net/rfc/dataclass). > > As others have pointed out, your RFC is very similar to my proposal > for struct. I don't quite understand the reason to compete and race > each other to the finish line. Combined efforts are usually better.
This isn't a race or competition, and I meant "competing with records" and not "competing with Ilija." Though, I see how it could be interpreted that way. To be honest, I don't like this RFC. I think records and/or structs are the right answer (dedicated behavior vs. trying to stuff more behavior into classes). As mentioned elsewhere in the thread, the behavior here is more of an emergent property than any actual thinking through it. So, the fact that it is similar to your proposal is merely coincidental; it wasn't planned that way (nor did I catch it before submitting the RFC). This mailing list is the only real interaction I have with other php-src devs (and the occasional PR to php-src), so please forgive me if I don't notice something. Yes, that isn't a real excuse, but not having other people to bounce ideas off of usually means I'll make an idiot out of myself, eventually. That being said, there are a ton of edge cases I've uncovered where everything breaks down in this RFC: • PHP references are "evil", so I've discovered. • "new" generates some strange opcodes—I'm currently investigating this—that make dealing with value types difficult. I'm trying to solve this in some way that doesn't require changing the generated opcodes, but that might be impossible. This solution would allow using "new" with records and solve the constructor problem in this RFC. • Did I mention PHP references are evil? In any case, I'd much rather help with your structs proposal, as the more I work on this, the more I don't like it. > > One of the bigger differences between our proposals is the addition of > mutating methods in my proposal compared to yours. You show the > following example in your RFC: > > ```php > data class Rectangle { > public function __construct(public int $width, public int $height) {} > > public function resize(int $width, int $height): static { > $this->height = $height; > $this->width = $width; > return $this; > } > } > ``` > > The resize method here modifies the instance and thus implicitly > creates a copy. That's _fine_ for such a small structure. However, > note that this still leads to the performance issues we have > previously discussed for growable data structures. > > ```php > data class Vector { > public function append(mixed $value): static { > /* Internal implementation, $values is some underlying storage. */ > $this->values[] = $value; > return $this; > } > } > ``` > > Calling `$vector->append(42);` will increase the refcount of > `$vector`, and cause separation on `$this->values[] = ...;`. If > `$vector->values` is a big storage, cloning will be very expensive. > Hence, appending becomes an O(n) operation (because each element in > the vector is copied to the new structure), and hence appending to an > array in a loop will tank your performance. That's the reason for the > introduction of the `$vector->append!(42)` syntax in my proposal. It > separates the value at call-site when necessary, and avoids separation > on `$this` in methods altogether. As I mentioned to Larry in the records discussion, the biggest problem with a "data class" is that it really needs dedicated syntax to be done properly (such as the ones in structs, and I plan to remove some syntax features from records since I got a lot of negative feedback about the syntax there; shout out to reddit). I don't think that would belong to a general class but rather a dedicated type. There really isn't a "one size fits all" solution here. > > There might be some general confusion on the performance issue. In one > of your e-mails in the last thread, you have mentioned: > > > Like Ilija mentioned in their email, there are significant performance > > optimizations to be had here that are simply not possible using regular > > (readonly) classes. I didn't go into detail as to how it works because it > > feels like an implementation detail, but I will spend some time distilling > > this and its consequences, into the RFC, over the coming days. As a simple > > illustration, there can be significant memory usage improvements: > > > > 100,000 arrays: https://3v4l.org/Z4CcV > > 100,000 readonly classes: https://3v4l.org/1vhNp > > First off, the array example only uses less memory because [1, 2] is a > constant array. When you make it dynamic, they will become way less > efficient than objects. https://3v4l.org/pETM9 > > But this is not the point I was trying to make either. Rather, when it > comes to immutable, growable data structures, every mutation becomes > an extremely expensive operation because the entire data structure, > including its underlying storage, needs to be copied. For example: > > https://3v4l.org/BEsYT > > ```php > class Vector { > private $values; > > public function populate() { > $this->values = range(1, 1_000_000); > } > > public function appendMutable() { > $this->values[] = 100_000_001; > } > > public function appendImmutable() { > $new = clone $this; > $this->values[] = 100_000_001; > } > } > ``` > > > appendMutable(): float(8.106231689453125E-6) > > appendImmutable(): float(0.012187957763671875) > > That's a factor of 1 500 difference for an array containing 1 million > numbers. Obviously, concrete numbers will vary, but the problem grows > the bigger the array becomes. > > Ilija > I'm not quite focused on actual performance (yet), but I understand your point. To that, I have an idea I've been playing around with to have "layered hashmaps" where a copy/clone is just a layer on the original (made immutable) hashmap, thus having a near zero cost for copies. There's still extra indirection involved, so it potentially could be worse 🤷. I would think an average case O(1) would be faster than a copy, at least for large maps, but wall-clock time and Big O don't always correlate. I also note that the benchmarks in the repo are based on operations, not time, so it is quite difficult to show that a feature that increases the number of operations decreases the overall time. In other words, you may add 100 operations with a strong cache locality and remove 10 with poor cache locality -- quality vs. quantity, an age-old problem. Anyway, there's no way to tell the exact performance characteristics until I finish the implementation. I'd love to discuss this more, as it's actually pretty neat and interesting to solve in a performant way. It could also be like my zend_string refactor I spent a few months on over the summer, where the performance isn't affected (at least for wall-clock time), and the only added benefits showing when you perform lots of string modifications. Other benefits won't make sense without already having an RFC I was working on (that is still in the draft stage). So, for now, that branch will just gather dust. Things like this are largely why I didn't put too much information about the performance characteristics of records in my RFC. It's really an implementation detail, and I didn't want people's decisions to be tied to an implementation detail that could change drastically. In other words, maybe the performance would be poor now, but I'm confident it can be improved later. IMHO, we should focus on whether we want the feature in the first place rather than worrying about how fast or slow it is. Many improvements don't make sense for the sake of improving them, and it isn't until there is a problem to be solved (such as a poor performing feature) that the improvement becomes worthwhile. — Rob