[Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant

2022-11-29 Thread Yoni Lavi
I stand by what I said, there is absolutely nothing disingenious about it.

> Over on the Discuss threads, 
> you have made it clear that the primary reason why you want hash(None) 
> to return a constant value is so that set iteration order will be 
> consistent from one run to another.
No, it's so the entire program behaves the same the next time I run it on the 
same data.
There is nothing wrong with what sets are doing, it's the hashing function that 
injects non-determinism, in this case for no good reason at all.

Again, for the n'th time, read what I wrote above:
Under this definition, sets in Python are deterministic, and _always_ have been.

I was just stating a fact, same as it is in all other languages I know. You are 
welcome to verify it by the way, if you can.
That's why I don't request any change from set. It works fine and always has.

>  you have barely mentioned that the same applies to 
> Ellipsis and NotImplemented
That's wrong, I did address that many times. What makes None special is that 
Optional as it is implemented in Python (`T | None`) relies on `None` for one 
of its possible states, and `Optional` fields are in common use on key types 
that perform structural hashing

I'd also change other sentinels to hash to a constant if it were up to me to 
decide, but there is no practical significance to that so I don't bother dying 
on that hill (especially now, given the mob reaction). That doesn't make my 
argument inconsistent in any way. Sorry.

> Let's consider a thought-experiment: suppose we agree to your proposal 
> to make hash(None) return a constant, but at the same time modify the 
> set iteration algorithm so that it starts from a different position each 
> time you iterate, making set iteration order even more unpredictable and 
> deterministically chaotic than it is now.

I address this already. I said it is a strongly unlikely FUD scenario. Sets 
have been behaving deterministically for decades now,  across all languages. 
There's a pretty good reason for that, even if this reason doesn't happen to be 
codified into the requirements of Python. So as I said, I will take my chances. 

Again it's something I already addressed here:
> Yes, in theory it is possible that a new version or implementation of Python 
> will make sets non-deterministic - shuffle
> themselves in the background independently from their history of operations, 
> or what have you
> and then my change loses all > of its value
> Like I said, anything is possible. But I think these last two points are 
> essentially FUD.

_Please_ stop reiterating the same argument over and over and pretending I 
didn't already make a case against it. It doesn't help, just turns this 
discussion into a shouting match.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BWJSBBQDN3C72OXO5XDD2PIJRCRJ4CG6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant

2022-11-29 Thread Chris Angelico
On Wed, 30 Nov 2022 at 10:48, Steven D'Aprano  wrote:
> Let's consider a thought-experiment: suppose we agree to your proposal
> to make hash(None) return a constant, but at the same time modify the
> set iteration algorithm so that it starts from a different position each
> time you iterate, making set iteration order even more unpredictable and
> deterministically chaotic than it is now.

Let's consider an equally-likely thought experiment: suppose that
every call to set.__iter__() causes a short busy-wait, spinning the
CPU for a little while for absolutely no purpose. There's nothing in
the docs that mandates performance, so it would be perfectly
reasonable for a future version of Python to do this, right? But it's
not something that I would consider *likely*.

And, in fact, modifying the iteration algorithm to be completely
unpredictable would have the same cost: an RNG lookup for no value
whatsoever.

It's notable that set() is about the only data type where this is
possible. You can't randomize dict iteration order (couldn't do that
even before it was locked in, because iterating over keys and values
was always required to return them in the same order), and a lot of
arbitrary-order collections are built on top of dictionaries. So your
thought experiment would be special-casing sets and making them
unnecessarily difficult to work with, just so you can point to that
arbitrariness and say "hah! GOTCHA! You silly fools shouldn't have
been depending on iteration order!!".

> Personally, and I don't expect that this will be a popular opinion, I
> wish that data structures that don't guarantee their iteration order
> would actively mix up their iteration order so that people wouldn't be
> tempted to rely on whatever accidental order they happen to get.

Which data structures are those? Please enumerate.

> Sure, it would cost a little bit of code to implement, but think of the
> savings in unproductive arguments and discussions like this one :-(

So basically, it's a bit of useless code, a bit of useless CPU usage,
all just so you can win an argument. Makes sense.

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/E4EVMJRFTQIKZIQMEHEB2SF7FORMZUO5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant

2022-11-29 Thread Christopher Barker
> it [randomising iteration order or sets] would also break the invariant
that `repr(data) == repr(data)` but it
is times like this that I feel that it would be worth it.

But it wouldn't -- equality of sets doesn't depend on iteration order. And
even if you are talking about other types (this *is* a thought experiment),
any type in which equality depends on order certainly shouldn't be jiggled
in this way.

And I think this is relevant -- I understand that it's handy that order be
deterministic -- but even if it is made so in the current implementation,
it still may not be consistent across different Python versions and
implementations. Which is why I would think that relying on determinism of
order is a bad idea anyway -- yes, the same computation should yield the
same result -- but what does "same" mean? As far as sets go -- "same" is
equality, and if you are checking anything else about them (such as
iteration order), I think it's a mistake.

Which I think is why Guido (or maybe someone else introduced the idea) was
talking about order-preserving sets -- THAT would provide a larger benefit,
and would change the specification of sets so that you *could* count on it
across versions and implementations, like we can for the modern dict.

So yes, the OP wasn't asking for that, but I think the point is that what
the OP is asking for is not useful enough to do, whereas a larger proposal
might be -- but only if it can be done without loss of performance, which,
alas, no one knows how to do.

Final confusion -- IIUC, even if the hash of None is made constant, you'd
still need to turn off string hash munging to get deterministic order for
sets with strings in them, yes? Which really seems to make this even less
useful.

-CHB


On Tue, Nov 29, 2022 at 3:48 PM Steven D'Aprano  wrote:

> On Tue, Nov 29, 2022 at 08:51:09PM -, Yoni Lavi wrote:
>
> > It does make your argument invalid though, since it's based on this
> > assumption that I was asking for a requirement on iteration order
> > (e.g. like dict's iteration order = insertion order guarantee), which
> > is not the case.
>
> Yoni, I think this answer is disingenious. Over on the Discuss threads,
> you have made it clear that the primary reason why you want hash(None)
> to return a constant value is so that set iteration order will be
> consistent from one run to another.
>
> Sure, you may *also* have some philosophical preference for None having
> a fixed hash, but you have barely mentioned that the same applies to
> Ellipsis and NotImplemented (and then only in passing), and that's not
> your driving motivation for this proposal.
>
> Let's consider a thought-experiment: suppose we agree to your proposal
> to make hash(None) return a constant, but at the same time modify the
> set iteration algorithm so that it starts from a different position each
> time you iterate, making set iteration order even more unpredictable and
> deterministically chaotic than it is now.
>
> Would that satisfy you? From your posts on Discuss, I don't expect that
> you will be happy with this outcome.
>
> Personally, and I don't expect that this will be a popular opinion, I
> wish that data structures that don't guarantee their iteration order
> would actively mix up their iteration order so that people wouldn't be
> tempted to rely on whatever accidental order they happen to get.
>
> Sure, it would cost a little bit of code to implement, but think of the
> savings in unproductive arguments and discussions like this one :-(
>
> It would also break the invariant that `repr(data) == repr(data)` but it
> is times like this that I feel that it would be worth it.
>
>
> --
> Steve
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/JIR4WQ5B432BQ4R27EMWZP6HCNFLFPMU/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CN535F4YTEHX2DRVBNSAOGRVSZEJ5HH4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant

2022-11-29 Thread Steven D'Aprano
On Tue, Nov 29, 2022 at 08:51:09PM -, Yoni Lavi wrote:

> It does make your argument invalid though, since it's based on this 
> assumption that I was asking for a requirement on iteration order 
> (e.g. like dict's iteration order = insertion order guarantee), which 
> is not the case.

Yoni, I think this answer is disingenious. Over on the Discuss threads, 
you have made it clear that the primary reason why you want hash(None) 
to return a constant value is so that set iteration order will be 
consistent from one run to another.

Sure, you may *also* have some philosophical preference for None having 
a fixed hash, but you have barely mentioned that the same applies to 
Ellipsis and NotImplemented (and then only in passing), and that's not 
your driving motivation for this proposal.

Let's consider a thought-experiment: suppose we agree to your proposal 
to make hash(None) return a constant, but at the same time modify the 
set iteration algorithm so that it starts from a different position each 
time you iterate, making set iteration order even more unpredictable and 
deterministically chaotic than it is now.

Would that satisfy you? From your posts on Discuss, I don't expect that 
you will be happy with this outcome.

Personally, and I don't expect that this will be a popular opinion, I 
wish that data structures that don't guarantee their iteration order 
would actively mix up their iteration order so that people wouldn't be 
tempted to rely on whatever accidental order they happen to get.

Sure, it would cost a little bit of code to implement, but think of the 
savings in unproductive arguments and discussions like this one :-(

It would also break the invariant that `repr(data) == repr(data)` but it 
is times like this that I feel that it would be worth it.


-- 
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JIR4WQ5B432BQ4R27EMWZP6HCNFLFPMU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant

2022-11-29 Thread Yoni Lavi
It does make your argument invalid though, since it's based on this assumption 
that I was asking for a requirement on iteration order (e.g. like dict's 
iteration order = insertion order guarantee), which is not the case.

Again, determinism means that given all input data and commands fed to a data 
structure is the same, it will arrive at the same observable state, any time 
you start from scratch and replay this workload. In the context of sets, "all 
input data" includes the hashing function itself, and "observable state" also 
includes the order in which items will be returned if iterated. Note that there 
is NO requirement here on what that order might be.

Under this definition, sets in Python are deterministic, and _always_ have 
been. And even outside of Python, there are aren't many cases where people 
willingly want to use data structures with non deterministic behavior. It 
usually involves concurrency (in the form of multithreading) and extreme 
performance requirements. And it's never the "standard" choice even in 
languages that do offer this. Determinism is generally considered as a valuable 
property in computation, at least when it is feasible to maintain it.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5Z3SOH4JDHRGYF4NTLND4E2UFM7QIXTL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant

2022-11-29 Thread Brett Cannon
On Mon, Nov 28, 2022 at 5:38 PM Steven D'Aprano  wrote:

> On Mon, Nov 28, 2022 at 11:13:34PM +, Oscar Benjamin wrote:
> > On Mon, 28 Nov 2022 at 22:56, Brett Cannon  wrote:
>
> > > That's actually by design. Sets are not meant to be deterministic
> > > conceptually as they are essentially a bag of stuff. If you want
> > > deterministic ordering you should convert it to a list and sort the
> > > list.
> >
> > What does "sets are not meant to be deterministic" even mean?
>
> I'm not Brett, so I'm not answering for him, but many people (sometimes
> including me) misuse "not deterministic" to refer to set order and the
> old dict order.


Yep, that's what I mean. And my comment about dicts earlier is old
knowledge from before we made dict iteration deterministic based on
insertion order since I'm been doing this for too long. 

-Brett


> Of course set order is deterministic, it is determined
> by the combination of the set implementation, the hashing algorithm, and
> the history of the set -- all the items that have ever appeared in the
> set (both those removed and those that remain).
>
> Set order is deterministic in the same way that roulette wheels are
> deterministic.
>
> We don't have a good term for this: the items don't appear in random
> order. But it is not predictable either, except in very special cases.
> "Arbitrary" is not right either, since that implies we can easily choose
> whatever set order we want.
>
> I think that physicists call this "deterministic chaos"
>
>
> https://www.quora.com/Theoretical-Physics-What-is-deterministic-but-unpredictable
>
> (sorry for the quora link) so I guess we might say that set iteration
> order is "deterministically chaotic" if you want to be precise, but life
> is too short for that level of pedantry (and that's coming from me, a
> Grade A Pedant *wink*) so people often describe it as random, arbitrary,
> or non-deterministic when it's none of those things :-).
>
> Maybe we could call set order "pseudo-random".
>
> Getting back to the design part, I think that what Brett is trying to
> get across is not that the chaotic set order is in and of itself a
> requirement, but that given the other requirements, chaotic set order is
> currently considered a necessary condition.
>
> As I understand it, we could make sets ordered, but only at the cost of
> space (much more memory) or time (slower) or both.
>
> I am sure that Guido is correct that **if** somebody comes up with a
> fast, efficient ordered set implementation that doesn't perform worse
> than the current implementation, we will happily swap to giving sets a
> predictable order, as we did with dicts. (Practicality beats purity --
> even if sets are *philosophically* unordered, preserving input order is
> too useful to give up unless we gain something in return.)
>
> But I don't think it is fair or kind to call Brett's argument FUD. At
> the very least it is uncharitable interpretation of Brett's position.
>
> > It would be useful to have a straight-forward way to sort a set into a
> > deterministic ordering but no such feature exists after the Py3K
> > changes (sorted used to do this in Python 2.x).
>
> `sorted()` works fine on homogeneous sets. It is only heterogeneous sets
> that are a problem, and in practice, that usually means None mixed in
> with some other type.
>
>
> --
> Steve
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/UAVNSMS3XFAQQ6ZRD27IXPTWZFCHP6M4/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZUACMM6HDUOG4VIAPAK7W5Q6WQRRLI2U/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant

2022-11-29 Thread Yoni Lavi
Looks like it's just miscommunication.

There is the original proposal I made, strictly about how None is ought to be 
hashed, and then there is the separate topic of changing the stability 
properties of iteration on sets, and whether that can be made with/without a 
performance regression...
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MX32ODGSEGRB2FSJS3CCR55GHXNY644X/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant

2022-11-29 Thread Yoni Lavi
> I can't see any, but then I couldn't see the security consequences of
> predictable string hashes until they were pointed out to me. So it would
> be really good to have some security experts comment on whether this is
> safe or not.

I can't either. I can point out that the complexity attack via hash collisions 
is not possible on None specifically because it is a singleton, so:
1. There may only be one None in a dict at most, 
2. For composite types that depend on None and other things, like say a tuple 
of string and an optional int, they become as resistant to attack as the same 
types without None, in this case string and int.
3. Python only explicitly added this security (hash randomization) feature to 
string and bytes hashes, and if your composite key type depends on those types, 
it will still be protected regardless of what None does


> Because this entire discussion is motivated by the OP who wants
> consistent set order across multiple runs of his Python application.
> That's what he needs; having None hash to a constant value is just a
> means to that end.

Not entirely. I do explain in my doc why there is a foundational reason why, 
once the choice was made to have None represent the `None` of Optional[T], it 
entered the family of types that you would expect to have deterministic 
behavior. And as the hashing function is intrinsic to types in Python, it is 
included in that.


> Even if we grant None a constant hash, that still does not guarantee
> consistent set order across runs. At best, we might get such consistent
> order as an undocumented and changeable implementation detail, until we
> change the implementation of hashing, or of sets, or of something
> seemingly unrelated like address randomisation.

ASLR will not cause any trouble so long as I keep objects with identity based 
hashing out of my sets. Or at least, any sets I later iterate on and run 
imperative code with side-effects under the loop.
The possibility of disabling ASLR is not a reason to dismiss this change. No 
one guarantees a user of Python is in a position to make infosec related 
decisions on the computer system they are working on.

Regarding the possibilities of hashes changing for the worse (in this regard) - 
sure. Anything is possible.

Regarding sets - the set implementation is deterministic, and always has been 
(for decades).
Yes, in theory it is possible that a new version or implementation of Python 
will make sets non-deterministic - shuffle themselves in the background 
independently from their history of operations, or what have you, and then my 
change loses all of its value.
Like I said, anything is possible. But I think these last two points are 
essentially FUD.

I made my proposal because I believe the FUD scenarios are strongly unlikely. 
(and even then, at worst we end up with a "practically useless" behavior on 
None, that can also be freely reverted along with such other changes anyway)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GD3N4DZQXISYM6UQQLMKJC6Z52N54IO3/
Code of Conduct: http://python.org/psf/codeofconduct/