On 31/03/2025 23:03, Niels Dossche wrote:
Hi internals!
I'm excited to share what I've been working on!
I had an epiphany. I realized what we truly need to revolutionize PHP: a new
operator.
Hear me out.
We live in an imperfect world, and we often approximate data, but neither `==`
nor `===` are ideal comparison operators to deal with these kinds of data.
Introducing: the "approximately equal" (or "approx-equal") operator `~=` (to
immitate the maths symbol ≃).
This combines the power of type coercion with approximating equality.
Who cares if things are actually equal, close enough amirite?
First of all, if `$a == $b` holds, then `$a ~= $b` obviously.
The true power lies where the data is not exactly the same, but "close enough"!
Here are some examples:
We all had situations where we wanted to compare two floating point numbers and
it turns out that due to the non-exact representation, seemingly-equal numbers
don't match! Gone are those days because the `~=` operator nicely rounds the
numbers for you before comparing them.
This also means that the "Fundamental Theorem of Engineering" now holds!
i.e. 2.7 ~= 3 and 3.14 ~= 3. Of course also 2.7 ~= 3.14. But this is false
obviously: 2 ~= 1.
Ever had trouble with users mistyping something? Say no more!
"This is a tpyo" ~= "This is a typo". It's typo-resistant!
However, if the strings are too different, then they're not approx-equal.
For example: "vanilla" ~= "strawberry" gives false.
How does this work?
* The strings are equal if their levenshtein ratio is <= 50%, so it's adaptive
to the length.
* If the ratio is > 50%, then the shortest string comes first in the comparison, such that if we ever get
a `~<` operator, then "vanilla" ~< "strawberry".
There is of course a PoC implementation available at:
https://github.com/php/php-src/pull/18214
You can see more examples on GitHub in the tests, here is a copy:
```php
// Number compares
var_dump(2 ~= 1); // false
var_dump(1.4 ~= 1); // true
var_dump(-1.4 ~= -1); // true
var_dump(-1.5 ~= -1.8); // true
var_dump(random_int(1, 1) ~= 1.1); // true
// Array compares (just compares the lengths)
var_dump([1, 2, 3] ~= [2, 3, 4]); // true
var_dump([1, 2, 3] ~= [2, 3, 4, 5]); // false
// String / string compares
var_dump("This is a tpyo" ~= "This is a typo"); // true
var_dump("something" ~= "different"); // false
var_dump("Wtf bro" ~= "Wtf sis"); // true
// String / different type compares
var_dump(-1.5 ~= "-1.a"); // true
var_dump(-1.5 ~= "-1.aaaaaaa"); // false
var_dump(NULL ~= "blablabla"); // false
```
Note that this does not support all possible Opcache optimizations _yet_, nor
does it support the JIT yet.
However, there are no real blockers to add support for that.
I look forward to hearing you!
Have a nice first day of the month ;)
Kind regards
Niels
For the float case it's fine (because Epsilon is well defined), but I
think overloading for the string case is not fine, because the
hard-coded 50% distance is subjective and users may well want to
configure that, so an operator is thus not suitable, notwithstanding
Levenshtein has very limited application. If there is any sense in doing
string comparisons with this operator, I think the proposed case is not it.
The array case is also not good in my view, where you're just comparing
length; I see no use for that whatsoever. What it _should_ do instead is
compare where order is indistinct, i.e. [1, 2, 3] ~= [3, 2, 1], similar
to PHPUnit's assertEqualsCanonicalizing [1].
Cheers,
Bilge
[1]:
https://github.com/sebastianbergmann/comparator/blob/d67eceae47e3956aa28ab0c6e43e5a6765f45779/src/ArrayComparator.php#L43-L46