On Sat, 11 Nov 2023 at 20:43, Andreas Hennings <andr...@dqxtech.net> wrote:
>
> Hello David,
>
> On Sat, 11 Nov 2023 at 20:04, David Gebler <davidgeb...@gmail.com> wrote:
> >
> > On Sat, Nov 11, 2023 at 6:05 PM Andreas Hennings <andr...@dqxtech.net>
> > wrote:
> >
> > > Hello internals,
> > > I noticed that array functions like array_diff(), array_intersect()
> > > etc use weak comparison.
> > >
> > >
> > That's not quite correct. Using the example of array_diff, the comparison
> > is a strict equality check on a string cast of the values. So
> > array_diff([""], [false]) will indeed be empty
> > but array_diff(["0"],[false]) will return ["0"].
>
> Thanks, good to know!
> So in other words, it is still some kind of weak comparison, but with
> different casting rules than '=='.
> Still this is not desirable in many cases.
>
> >
> > Tbh any use case for whatever array function but with strict comparison is
> > such an easy thing to implement in userland[1] I'm not bothered about
> > supporting it in core. But that's just me. I don't generally like the idea
> > of adding new array_* or str_* functions to the global namespace without
> > very good cause. There is a precedent for it though, in terms of changes
> > which have gone through in PHP 8, such as array_is_list or str_starts_with.
>
>
> I would argue that the strict variants of these functions would be
> about as useful as the non-strict ones.
> Or in my opinion, they would become preferable over the old functions
> for most use cases.
>
> In other words, we could say the old/existing functions should not
> have been added to the language.
> (of course this does not mean we can or should remove them now)
>
> Regarding performance, I measure something like factor 2 for a diff of
> range(0, 500) minus [5], comparing array_diff() vs array_diff_strict()
> as proposed here.
> So for large arrays or repeated calls it does make a difference.

Some more results on this.
With the right array having only one element, i can actually optimize
the userland function to be almost as fast as the native function.
However, if I pump up the right array, the difference becomes quite bad.


function array_diff_userland(array $array1, array $array2 = [], array
...$arrays): array {
    if ($arrays) {
        // Process additional arrays only when they exist.
        $arrays = array_map('array_values', $arrays);
        $array2 = array_merge($array2, ...$arrays);
    }
    // This is actually slower, it seems.
    #return array_filter($array1, fn ($value) => !in_array($value,
$array2, TRUE));
    $diff = [];

    foreach ($array1 as $k => $value) {
        // Use non-strict in_array(), to get a fair comparison with
the native function.
        if (!in_array($value, $array2)) {
            $diff[$k] = $value;
        }
    }
    return $diff;
}

$arr = range(0, 500);
$arr2 = range(0, 1500, 2);

$dts = [];

$t = microtime(TRUE);
$diff_native = array_diff_userland($arr, $arr2);
$t += $dts['userland'] = (microtime(TRUE) - $t);
$diff_userland = array_diff($arr, $arr2);
$t += $dts['native'] = (microtime(TRUE) - $t);
assert($diff_userland === $diff_native);

// Run both again to detect differences due to warm-up.
$t = microtime(TRUE);
$diff_native = array_diff_userland($arr, $arr2);
$t += $dts['userland.1'] = (microtime(TRUE) - $t);
$diff_userland = array_diff($arr, $arr2);
$t += $dts['native.1'] = (microtime(TRUE) - $t);
assert($diff_userland === $diff_native);

// Now use a right array that has no overlap with the left array.
$t = microtime(TRUE);
$arr2 = range(501, 1500, 2);
$diff_native = array_diff_userland($arr, $arr2);
$t += $dts['userland.2'] = (microtime(TRUE) - $t);
$diff_userland = array_diff($arr, $arr2);
$t += $dts['native.2'] = (microtime(TRUE) - $t);
assert($diff_userland === $diff_native);

var_export(array_map(fn ($dt) => $dt * 1000 * 1000 . ' ns', $dts));

I see differences of factor 5 up to factor 10.

So to me, this alone is an argument to implement this natively.
The other argument is that it is kind of sad how the current functions
don't behave as one would expect.


>
> Regarding the cost of more native functions:
> Is the concern more about polluting the global namespace, or about
> adding more functions that need to be maintained?
> I can see both arguments, but I don't have a clear opinion how these
> costs should be weighed.

The most straightforward option seems to just name the new functions
like array_diff_strict() etc.
But I am happy for other proposals.

>
> Cheers
> Andreas
>
>
>
> >
> > [1] Example:
> >
> > function array_diff_strict(array $array1, array ...$arrays): array
> >     {
> >         $diff = [];
> >         foreach ($array1 as $value) {
> >             $found = false;
> >             foreach ($arrays as $array) {
> >                 if (in_array($value, $array, true)) {
> >                     $found = true;
> >                     break;
> >                 }
> >             }
> >             if (!$found) {
> >                 $diff[] = $value;
> >             }
> >         }
> >         return $diff;
> >     }

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to