[Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant

2022-11-30 Thread Yoni Lavi
> Whether determinism is fundamentally good or fundamentally bad depends
> heavily on context.

Agreed 100%. Unfortunately in Python, you cannot choose your hashing function 
depending on context.

Also, once you've decided to violate determinism somewhere, it's gone. There is 
no way, in the general case, to bring it back. 

That's why it's important not to violate it willy-nilly in a manner that cannot 
even be prevented by users who _want_ their programs to exhibit deterministic 
behavior.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TGRZPF6XWOMCBDLA3N3YWUQJJ77UHZKG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant

2022-11-30 Thread Chris Angelico
On Thu, 1 Dec 2022 at 17:26, Yoni Lavi  wrote:
>
> > the language makes no guarantee about hash consistency between
> executions
>
> because it's futile in the general case, even if objects were to get a serial 
> `id` and hash by it for example, any change in the number of objects created 
> across all of Python (including its builtin modules and various libraries 
> unrelated to the user code) would make these hashes move.
>

For the record, Jython DOES use sequential numbers for ids. And it
doesn't reuse them even if the objects are disposed of.

>>> id(None)
2
>>> lists = [[] for _ in range(10)]
>>> [id(l) for l in lists]
[3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
>>> lists = None
>>> lists = [[] for _ in range(10)]
>>> [id(l) for l in lists]
[13, 14, 15, 16, 17, 18, 19, 20, 21, 22]

IIRC an id is not assigned to an object until one is requested.

> So it's not like it's even possible to require this generally for all objects.

Well, I mean, in theory you could require that objects whose hash
isn't otherwise defined get given the hash of zero. That doesn't
violate any of the actual rules of hashes, but it does make those
hashes quite suboptimal :)

It's interesting how id() and hash() have opposite requirements (id
must return a unique number among concurrently-existing objects, hash
must return the same number among comparing-equal objects), yet a hash
can be built on an id.

> Besides, do other languages require it?
> Is it required for the language to behave in a manner that makes sense?
>
> Or maybe you think it's by pure accident that such an overwhelming majority 
> of languages and software libraries implement/use deterministic hashing 
> functions for primitive types or aggregates that consist of such types?
> I can't figure out if you think it's actually a bad property for the language 
> to have, or really just arguing that it's bad for the sake of it.

Determinism is usually the easiest option. True randomness takes a lot
of effort compared to a deterministic PRNG, hence web servers having
true entropy but old game consoles relying on PRNGs.

Whether determinism is fundamentally good or fundamentally bad depends
heavily on context. Question: Is the metaquestion "is determinism
good" deterministic (ie can it be answered entirely from predictable
facts), or is it itself entropic? I believe the former, but I'm
curious if anyone disagrees!

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NLDV5PAONVETRJL5QDITFEN34JDKR7T2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant

2022-11-30 Thread Yoni Lavi
> the language makes no guarantee about hash consistency between
executions

because it's futile in the general case, even if objects were to get a serial 
`id` and hash by it for example, any change in the number of objects created 
across all of Python (including its builtin modules and various libraries 
unrelated to the user code) would make these hashes move.

So it's not like it's even possible to require this generally for all objects.

None of that makes deterministic structural hashing any less useful in 
practice, though.

Besides, do other languages require it?
Is it required for the language to behave in a manner that makes sense?

Or maybe you think it's by pure accident that such an overwhelming majority of 
languages and software libraries implement/use deterministic hashing functions 
for primitive types or aggregates that consist of such types? 
I can't figure out if you think it's actually a bad property for the language 
to have, or really just arguing that it's bad for the sake of it.

> set order is not guaranteed
Maybe not. In practice it has fully deterministic behavior, always has across 
all versions of Python since its inception. I don't care about what the order 
is, only that it's deterministic, and it is.

Rejecting my change because someone can technically get away with breaking 
this, after 30+ years seems highly suspect.

Imagine I came in with 0.5% perf improvement, you would reject it citing that 
Python's requirements do not mandate that sets have good performance, and also 
that since they don't, someone else might come in with a change that slows down 
sets by an arbitrary amount, so there's no reason to believe my change will 
help at all

Yes, if we tried really hard, we could always make the language worse. That's a 
pretty awful reason to reject the change though.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CDEDUAMJENE5TMSEMEP4PM3JXF6WBXQP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant

2022-11-30 Thread Brett Cannon
On Tue, Nov 29, 2022 at 12:58 PM Yoni Lavi  wrote:

> It does make your argument invalid though,


It makes that single sentence invalid, but the rest of my points still
hold, e.g. the language makes no guarantee about hash consistency between
executions, set order is not guaranteed, etc. are all still valid points.
And as I said earlier, I also agree with the points made in the issue you
linked to, so I'm still -0 on this.

-Brett


> since it's based on this assumption that I was asking for a requirement on
> iteration order (e.g. like dict's iteration order = insertion order
> guarantee), which is not the case.
>
> Again, determinism means that given all input data and commands fed to a
> data structure is the same, it will arrive at the same observable state,
> any time you start from scratch and replay this workload. In the context of
> sets, "all input data" includes the hashing function itself, and
> "observable state" also includes the order in which items will be returned
> if iterated. Note that there is NO requirement here on what that order
> might be.
>
> Under this definition, sets in Python are deterministic, and _always_ have
> been. And even outside of Python, there are aren't many cases where people
> willingly want to use data structures with non deterministic behavior. It
> usually involves concurrency (in the form of multithreading) and extreme
> performance requirements. And it's never the "standard" choice even in
> languages that do offer this. Determinism is generally considered as a
> valuable property in computation, at least when it is feasible to maintain
> it.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/5Z3SOH4JDHRGYF4NTLND4E2UFM7QIXTL/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LTX5IT5ST3BU7T4CZ5M4G4Z4Q2X2CTNV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: RFC: expose ssl internals for use by ctypes/cffi

2022-11-30 Thread Gregory P. Smith
On Wed, Nov 30, 2022 at 12:47 PM Steve Dower  wrote:

> On 11/30/2022 4:52 PM, chris...@weinigel.se wrote:
> > Does this seem like a good idea?  As I said, I feel that it is a bit
> ugly, but it does mean that if someone wants to use some
> SSL_really_obscure_function in libcrypto or libssl they can do that without
> having to rebuild all of CPython themselves.
>
> Broadly, no, I don't think it's a good idea. We don't like encouraging
> users to do things that make it hard to support them in the future.
>
> Nonetheless, it's one that I've had to do, and so realistically I think
> it's okay to *enable* the hack without endorsing it. This is one of the
> reasons I switched the Windows builds to dynamically linked OpenSSL
> builds (they used to be statically linked, which meant there was no way
> to get at the unused exports). So now you can use `import _ssl;
> ctypes.CDLL("libssl-1_1")` to get at other exports from the module if
> you need them, and there's a similar trick to get the raw handle that I
> don't recall off the top of my head.
>
> But the only reason I'd ever want to document this is to tell people not
> to rely on it. If you control your environment well enough that you can
> guarantee it'll work for you, that's great. Nobody else should ever
> think they're doing the "right thing".
>

+1 ... and in general if you want access to other OpenSSL APIs not already
in the ssl module, getting them via non-stdlib packages on PyPI would be a
better idea.

https://pypi.org/project/cryptography/ is very well supported.
https://pypi.org/project/oscrypto/ exists and is quite interesting.
the old https://pypi.org/project/M2Crypto/ package still exists and seems
to be maintained (wow).

More context: We don't like the ssl module in the standard library - it is
already too tightly tied to OpenSSL:
https://discuss.python.org/t/our-future-with-openssl/21486

So if you want specific OpenSSL APIs that are not exposed, seeking to see
them added to the standard library where they would then become features
that need to be supported for a very long time, is going to be the most
difficult approach as there'd need to be a very good reason to have them in
the stdlib. Third party libraries that can provide what you need, or
rolling your own libssl API wrappings however you choose to implement them,
are better bets.

-Greg
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5E62GYXUJLTPQWJAPEOLQDHY3F2IDRER/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: RFC: expose ssl internals for use by ctypes/cffi

2022-11-30 Thread Steve Dower

On 11/30/2022 4:52 PM, chris...@weinigel.se wrote:

Does this seem like a good idea?  As I said, I feel that it is a bit ugly, but 
it does mean that if someone wants to use some SSL_really_obscure_function in 
libcrypto or libssl they can do that without having to rebuild all of CPython 
themselves.


Broadly, no, I don't think it's a good idea. We don't like encouraging 
users to do things that make it hard to support them in the future.


Nonetheless, it's one that I've had to do, and so realistically I think 
it's okay to *enable* the hack without endorsing it. This is one of the 
reasons I switched the Windows builds to dynamically linked OpenSSL 
builds (they used to be statically linked, which meant there was no way 
to get at the unused exports). So now you can use `import _ssl; 
ctypes.CDLL("libssl-1_1")` to get at other exports from the module if 
you need them, and there's a similar trick to get the raw handle that I 
don't recall off the top of my head.


But the only reason I'd ever want to document this is to tell people not 
to rely on it. If you control your environment well enough that you can 
guarantee it'll work for you, that's great. Nobody else should ever 
think they're doing the "right thing".


Cheers,
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/AJ3PQTOMD4GL27LH5XZR7XUDWU3XKUUE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] RFC: expose ssl internals for use by ctypes/cffi

2022-11-30 Thread christer
Hi all,

for quite some time I've been working on a Python implementation of a protocol 
called NTS which requires access to an API in OpenSSL which is not provided by 
the Python ssl module.  I added a patch for that which unfortunately for me the 
maintainer did not want to accept.  Some comments were made of a possible 
future way to give more generic access to the openssl library via ctypes/cffi 
but I have been unable to find more information about that.  I was home sick 
last week and decided to take a shot at it and have built something that I feel 
is a bit ugly but does seem to work.  I'd like to some feedback on this 
approach.  

My patches can be found on github, based on the Python 3.11 tag:

https://github.com/python/cpython/compare/3.11...wingel:cpython:main

Here's a short description of each patch on this branch:

"bpo-37952: SSL: add support for export_keying_material" is my old patch which 
adds the method I need to the ssl library just for reference. 

The other commits add the necessary infrastructure with some example code.  
These commits are not ready for submission but hopefully they show what I have 
in mind.

"Add CRYPTO_DLL_PATH and SSL_DLL_PATH to the _ssl module. "

This commit adds two constants to the "_ssl" C module with the paths to 
libcrypto and libssl respectively.  On Linux dladdr and on Windows 
GetModuleHandle/GetModuleFilename are used on a symbol in each library to find 
the path to the corresponding DLL.  I've verified that this works Debian 
Bulleye and on Windows 10 with Visual Studio 2017.  I don't own a Mac so I 
haven't been able to test this on macOS, but I believe dladdr is available on 
modern macOS so it might work out of the box.  With the paths it's possible to 
use ctypes or cffi get a handle to these libraries.

"Add API to get the address of the SSL structure" then adds an API to an 
SSLSocket which returns the address of the corresponding "SSL" C structure.  
This address can be used by ctypes/cffi.  One would probably want to expose 
SSL_CTX, SSL_SESSION and BIO too but I started with just SSL since that's what 
my code needs right now.

"Add a small test program" is a small test program that uses the infrastructure 
from the two above commits to call C functions in libssl/libcrypto using both 
ctypes and cffi.  It's a bit ugly but hopefully it's not too hard to understand.

"Example of how to extend the ssl library using ctypes" is an example of how a 
Python module that extends the SSL library using ctypes could look.  First get 
a handle to libssl using ctypes, set up ctypes with the correct API for the 
export_keying_material function, wrap it in a more Pythonic function and then 
extend SSLSocket with the new function.  A simplified version looks like this:

import ssl, ctypes
ssl_lib = ctypes.CDLL(ssl._ssl.SSL_DLL_PATH)
ssl_lib.SSL_export_keying_material.argtypes = (
ctypes.c_void_p,  # SSL pointer
ctypes.c_void_p, ctypes.c_size_t, # out pointer, out length
ctypes.c_void_p, ctypes.c_size_t, # label buffer, label length
ctypes.c_void_p, ctypes.c_size_t, # context, context length
ctypes.c_int) # use context flag
ssl_lib.SSL_export_keying_material.restype = ctypes.c_int

def SSL_export_keying_material(self, label, key_len, context = None):
c_key = ctypes.create_string_buffer(key_len)
c_label = ctypes.create_string_buffer(len(label))
c_context = ctypes.create_string_buffer(context, len(context))
if ssl_lib.SSL_export_keying_material(
   self._sslobj.get_internal_addr(),
c_key, key_len,
c_label, len(label),
c_context, len(context), 1);
return bytes(c_key)

ssl.SSLSocket.export_keying_material = SSL_export_keying_material

There's a final commit "Expose more OPENSSL_ variables" which add some more 
constants to the ssl module which expose the cflags and build information from 
OpenSSL.  This patch is not really necessary, but it might be a good idea to 
compare these constants with the corresponding constants retrieved using 
ctypes/cffi to make sure that exactly the same version of the openssl library 
is used.

Does this seem like a good idea?  As I said, I feel that it is a bit ugly, but 
it does mean that if someone wants to use some SSL_really_obscure_function in 
libcrypto or libssl they can do that without having to rebuild all of CPython 
themselves.  Or if they want to integrate with some other C library that wants 
a raw pointer to a SSL socket.  Hopefully this would reduce the burden on the 
ssl module maintainers a bit.

Anyway, if you think this is a good approach I could clean up my patches, add 
support for SSL_CTX/SSL_SESSION/BIO, document all of this and make it into a 
proper pull request.

  /Christer
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to 

[Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant

2022-11-30 Thread Oscar Benjamin
On Tue, 29 Nov 2022 at 23:46, Steven D'Aprano  wrote:
>
> On Tue, Nov 29, 2022 at 08:51:09PM -, Yoni Lavi wrote:
>
> > It does make your argument invalid though, since it's based on this
> > assumption that I was asking for a requirement on iteration order
> > (e.g. like dict's iteration order = insertion order guarantee), which
> > is not the case.
>
> Yoni, I think this answer is disingenious.

I don't think it is disingenuous. There are just a lot of people
talking past each other and not quite understanding what each person
means because there is confusion about even the intended meaning of
terms like "deterministic". I will expand here with enough detail that
we should hopefully be able to avoid misunderstanding each other.

There are probably other places where you could find mentions of this
in the docs but I just took a quick look in the Python 3.5 docs
(before hash randomisation) to find this mention of dictionary
iteration order:
https://docs.python.org/3.5/library/stdtypes.html#dictionary-view-objects

What it says is
"""
Keys and values are iterated over in an arbitrary order which is
non-random, varies across Python implementations, and depends on the
dictionary’s history of insertions and deletions.
"""
The key point is the use of the term "non-random" which here is
intended to mean that although no particular ordering is guaranteed
you can expect to rerun the same program and get the same result
deterministically. A different version or implementation of Python
might give a different order but rerunning the same program twice
without changing anything should give the same result even if that
result depends in some way on the iteration order of some
dictionaries. I can't immediately find a similar statement about sets
but in practice the same behaviour applied to sets as well. Note
carefully that it is this *narrow* form of determinism that Yoni is
interested in.

Of course there are some caveats to this and the obvious one is that
this statement does not apply if there are some objects that use
identity based hashing so this is not deterministic:

class A:
def __init__(self, data):
self.data = data
def __repr__(self):
return 'A(%s)' % self.data

a1 = A(1)
a2 = A(2)

for a in {a1, a2}:
print(a)

Running this gives:

$ python3.5 t.py
A(2)
A(1)
$ python3.5 t.py
A(1)
A(2)

On the other hand if all of the hashes themselves are deterministic
then the program as a whole will be as well so this is deterministic:

class A:
def __init__(self, data):
self.data = data
def __repr__(self):
return 'A(%s)' % self.data
def __hash__(self):
return hash(self.data)
def __eq__(self):
return self.data == other.data

a1 = A(1)
a2 = A(2)

for a in {a1, a2}:
print(a)

$ python3.5 t.py
A(1)
A(2)
$ python3.5 t.py
A(1)
A(2)

So we have two classes of hashable objects:

1. Those with deterministic hash
2. Those with non-deterministic hash

A program that avoids depending on the iteration order of sets or
dicts containing objects with non-deterministic hash could be
deterministic. It is not the case that the program would depend on the
iteration order for its *correctness* but just that the behaviour of
the program is *reproducible* which is useful in various ways e.g.:

- You could say to someone else "run this code with CPython 3.5 and
you should be able to reproduce exactly what I see when I run the
program". It is common practice e.g. in scientific programming to
record things like random seeds so that someone else can precisely
reproduce the results shown in a paper or some other work and this in
general requires that it is at least possible to make everything
deterministic.

- When debugging it is useful to be able to reproduce an error
condition precisely. Debugging non-deterministic failures can be
extremely difficult. In the same way that you might want to reproduce
correctly functioning code it is also very useful to be able to
reproduce bugs.

I can list more examples but really it shouldn't be necessary to
justify from first principles why determinism in programming is
usually a good thing. There can be reasons sometimes why determinism
is undesired or cannot or should not be guaranteed. It should not be
controversial though to say that all things being equal determinism is
usually a desirable feature and should be preferred by default. I
don't think that the 3.5 docs I quoted above used the words
"non-random" casually: it was an intended feature and people were
aware that breaking that behaviour would be problematic in many
situations.

Of course in Python 3.6 this determinism was broken with the
introduction of hash randomisation for strings. It was considered that
for security purposes it would be better to have some internal
non-deterministic behaviour to guard against attackers. Specifically
the hashes of three types (str, bytes and datetime) were made
non-deterministic between subsequent CPython processes. The effect was