On Wed, Mar 4, 2020 at 9:39 PM Dmitry Stogov <dsto...@perforce.com> wrote:

> Hi Nikita,
>
> I suspect, this way of stable sorting won't come for free.
> Most probably, It'll require more comparisons and swaps.
>
> I would prefer, extend sort() functions with optional $stable argument,
> that would lead to use stable sorting algorithm in first place.
>

I've updated the implementation to fix some performance issues (turns out,
comparison is extremely sensitive to code layout details), and did some
simple micro benchmarks using the script at
https://gist.github.com/nikic/5d44cb5d0d7c1f414f455090a0193567. It's
sorting an array with varying fraction of duplicate elements.

UNSTABLE

0.00: float(1.5692839622497559)
0.01: float(1.572052001953125)
0.05: float(1.5424528121948242)
0.10: float(1.5431289672851562)
0.20: float(1.4708871841430664)
0.50: float(1.2815279960632324)
1.00: float(0.9397969245910645)

STABLE

0.00: float(1.5857720375061035)
0.01: float(1.577707052230835)
0.05: float(1.581636905670166)
0.10: float(1.588440179824829)
0.20: float(1.600801944732666)
0.50: float(1.5892488956451416)
1.00: float(1.0026319026947021)

This shows that for arrays without duplicate elements (so unstable / stable
sort makes no difference on the result), the performance impact is ~1%, so
negligible. At 10% duplicate elements, it's 3%. At 50% duplicate elements,
it's 25%. (Generally the effect is that the stable sort performance stays
mostly stable regardless of number of duplicates, while unstable sort
improves with number of duplicates. As one would expect.)

For real-world use cases, I expect that it's unusual to have a very large
fraction of duplicate elements, so the impact should be towards the 1-3%
range.

I think as long as the impact is not too large, it is better to have a
single behavior, rather than adding more flags to our (many) sort functions.

Regards,
Nikita



> ------------------------------
> *From:* Nikita Popov <nikita....@gmail.com>
> *Sent:* Wednesday, March 4, 2020 19:42
> *To:* PHP internals <internals@lists.php.net>
> *Subject:* [PHP-DEV] Make sorting stable
>
> Hi internals,
>
> Sorting function in PHP currently do not guarantee stability, which means
> that the result order of elements that compare equal is not defined.
>
> To achieve a stable sort, you need to do something like this (untested):
>
> $arrayAndPos = [];
> $pos = 0;
> foreach ($array as $value) {
>     $arrayAndPos[] = [$value, $pos++];
> }
> usort($arrayAndPos, function($a, $b) use($compare) {
>     return $compare($a[0], $b[0]) ?: $a[1] <=> $b[1];
> });
> $array = [];
> foreach ($arrayAndPos as $elem) {
>     $array[] = $elem[0];
> }
>
> This is both cumbersome and very inefficient. Those temporary arrays
> significantly increase memory usage, and hurt performance of the sort.
>
> I believe that it is important for us to provide a stable sorting option in
> the standard library. We could introduce a separate stable function(s) for
> this purpose, but I believe we can just as well make all our existing sorts
> stable.
>
> This does not require actually switching sorting algorithms to something
> like Timsort. We can essentially do the same as the above PHP code, just
> much more efficiently. Due to certain implementation details, we can do
> this without memory usage increase, and with minimal performance impact.
> I've put https://github.com/php/php-src/pull/5236 up as a prototype.
>
> The only issue I ran into is that this change has a negative impact on code
> using illegal comparison callbacks like this:
>
> usort($array, function($a, $b) {
>     return $a > $b;
> });
>
> The return value of the sorting callback should be $a <=> $b, but using $a
> > $b also works right now, due to implementation details of the sorting
> implementation (only the outcome of $compare($a, $b) > 0 ends up being used
> by the sort).
>
> This kind of incorrect code will break under the proposed implementation,
> because we will now compare by original position if the comparison function
> reports equality. Because the comparator reports equality inconsistently
> (it says that $a == $b, but $b != $a), the sort results are also
> inconsistent.
>
> What do people think about this? Is there interest in making sorting
> stable? Is it okay to break code using illegal comparison callbacks?
>
> Regards,
> Nikita
>
>
> CAUTION: This email originated from outside of the organization. Do not
> click on links or open attachments unless you recognize the sender and know
> the content is safe.
>
> This e-mail may contain information that is privileged or confidential. If
> you are not the intended recipient, please delete the e-mail and any
> attachments and notify us immediately.
>
>

Reply via email to