Hi!

> Hi Stas,
> 
> On Wed, Sep 21, 2016 at 11:26 AM, Stanislav Malyshev
> <smalys...@gmail.com> wrote:
>>
>>> I think we are better to limit max collisions.
>>> I'm +1 for Nikita's proposal does this.
>>
>> Max collision per what? How much would be the limit?
> 
> Collision by keys.

Not sure I understand. What would be counted - number of collision per
key? Per hashtable? Per process? Per request?

> It would be nice to have configurable limit like regex stack/backtrack limit.
> That said, wouldn't 1000 enough for almost all apps?

Certainly not. Not even nearly enough. Collisions are pretty frequent
with short strings, for example, and for a big long-running application
1000 hash collisions is nothing. I think you severely underestimate how
frequent hash collisions are, with simple function like we're using,
over millions and millions of hash accesses that we're doing routinely.

I did a quick check, and just running run-tests.php -h (without any
tests!) produces about 5K collisions. Running composer (without doing
anything) - 8K collisions. Running composer update on a simple project -
400K (!) collisions. Now these are pretty simple cases compared to what
a complex modern PHP application does. So I think you are
underestimating it by about 4-5 orders of magnitude.

> 
> Anyway, we have two choices
>  - Simply limit the number of collisions. (Fast and has no impact to code)

Fast: true, no impact: not so much. If the limit is too low for your app
(and you probably have no idea how many collisions your app has, and it
is probably highly varied too) your app breaks in some very creative
ways, including dropping dead in the middle of transactions, being
unable to handle errors, etc.

>  - Use crypt safe hash and salt. (Slow and has impact to opcache/etc)

These not the only two choices.

> Limiting something is good to have sometimes.
> Python even limits number of recursions to 1000 by default.

Recursion limits are completely different. With recursion, with any
proper algorithm, the deeper you go the probability of having next
recursion step drops dramatically (usually exponentially - that's how
you build performant algorithms). However, hash collisions are
completely independent, so having one does not change the chance you
have another (in fact, it may make it higher if the same data is
processed again).

Which means, for recursion, it is very probable there is a limit that
most of your code will never reach. Even then we are cautious of it as
there's no good number for a natural limit.

But with hash collisions, I just see no way to tell "10K collisions is
enough for anything" and at least my quick checks - unless I massively
messed up the test - shows that it's not easy to find a practical limit,
at least with current hash function.

> We have PCRE stack/backtrack limits. (We'll have mbregex stack limit soon)
> Collision limit is good one also.

Does not follow, as per above.
-- 
Stas Malyshev
smalys...@gmail.com

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to