[Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant
> Whether determinism is fundamentally good or fundamentally bad depends > heavily on context. Agreed 100%. Unfortunately in Python, you cannot choose your hashing function depending on context. Also, once you've decided to violate determinism somewhere, it's gone. There is no way, in the general case, to bring it back. That's why it's important not to violate it willy-nilly in a manner that cannot even be prevented by users who _want_ their programs to exhibit deterministic behavior. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TGRZPF6XWOMCBDLA3N3YWUQJJ77UHZKG/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant
On Thu, 1 Dec 2022 at 17:26, Yoni Lavi wrote: > > > the language makes no guarantee about hash consistency between > executions > > because it's futile in the general case, even if objects were to get a serial > `id` and hash by it for example, any change in the number of objects created > across all of Python (including its builtin modules and various libraries > unrelated to the user code) would make these hashes move. > For the record, Jython DOES use sequential numbers for ids. And it doesn't reuse them even if the objects are disposed of. >>> id(None) 2 >>> lists = [[] for _ in range(10)] >>> [id(l) for l in lists] [3, 4, 5, 6, 7, 8, 9, 10, 11, 12] >>> lists = None >>> lists = [[] for _ in range(10)] >>> [id(l) for l in lists] [13, 14, 15, 16, 17, 18, 19, 20, 21, 22] IIRC an id is not assigned to an object until one is requested. > So it's not like it's even possible to require this generally for all objects. Well, I mean, in theory you could require that objects whose hash isn't otherwise defined get given the hash of zero. That doesn't violate any of the actual rules of hashes, but it does make those hashes quite suboptimal :) It's interesting how id() and hash() have opposite requirements (id must return a unique number among concurrently-existing objects, hash must return the same number among comparing-equal objects), yet a hash can be built on an id. > Besides, do other languages require it? > Is it required for the language to behave in a manner that makes sense? > > Or maybe you think it's by pure accident that such an overwhelming majority > of languages and software libraries implement/use deterministic hashing > functions for primitive types or aggregates that consist of such types? > I can't figure out if you think it's actually a bad property for the language > to have, or really just arguing that it's bad for the sake of it. Determinism is usually the easiest option. True randomness takes a lot of effort compared to a deterministic PRNG, hence web servers having true entropy but old game consoles relying on PRNGs. Whether determinism is fundamentally good or fundamentally bad depends heavily on context. Question: Is the metaquestion "is determinism good" deterministic (ie can it be answered entirely from predictable facts), or is it itself entropic? I believe the former, but I'm curious if anyone disagrees! ChrisA ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NLDV5PAONVETRJL5QDITFEN34JDKR7T2/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant
> the language makes no guarantee about hash consistency between executions because it's futile in the general case, even if objects were to get a serial `id` and hash by it for example, any change in the number of objects created across all of Python (including its builtin modules and various libraries unrelated to the user code) would make these hashes move. So it's not like it's even possible to require this generally for all objects. None of that makes deterministic structural hashing any less useful in practice, though. Besides, do other languages require it? Is it required for the language to behave in a manner that makes sense? Or maybe you think it's by pure accident that such an overwhelming majority of languages and software libraries implement/use deterministic hashing functions for primitive types or aggregates that consist of such types? I can't figure out if you think it's actually a bad property for the language to have, or really just arguing that it's bad for the sake of it. > set order is not guaranteed Maybe not. In practice it has fully deterministic behavior, always has across all versions of Python since its inception. I don't care about what the order is, only that it's deterministic, and it is. Rejecting my change because someone can technically get away with breaking this, after 30+ years seems highly suspect. Imagine I came in with 0.5% perf improvement, you would reject it citing that Python's requirements do not mandate that sets have good performance, and also that since they don't, someone else might come in with a change that slows down sets by an arbitrary amount, so there's no reason to believe my change will help at all Yes, if we tried really hard, we could always make the language worse. That's a pretty awful reason to reject the change though. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CDEDUAMJENE5TMSEMEP4PM3JXF6WBXQP/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant
On Tue, Nov 29, 2022 at 12:58 PM Yoni Lavi wrote: > It does make your argument invalid though, It makes that single sentence invalid, but the rest of my points still hold, e.g. the language makes no guarantee about hash consistency between executions, set order is not guaranteed, etc. are all still valid points. And as I said earlier, I also agree with the points made in the issue you linked to, so I'm still -0 on this. -Brett > since it's based on this assumption that I was asking for a requirement on > iteration order (e.g. like dict's iteration order = insertion order > guarantee), which is not the case. > > Again, determinism means that given all input data and commands fed to a > data structure is the same, it will arrive at the same observable state, > any time you start from scratch and replay this workload. In the context of > sets, "all input data" includes the hashing function itself, and > "observable state" also includes the order in which items will be returned > if iterated. Note that there is NO requirement here on what that order > might be. > > Under this definition, sets in Python are deterministic, and _always_ have > been. And even outside of Python, there are aren't many cases where people > willingly want to use data structures with non deterministic behavior. It > usually involves concurrency (in the form of multithreading) and extreme > performance requirements. And it's never the "standard" choice even in > languages that do offer this. Determinism is generally considered as a > valuable property in computation, at least when it is feasible to maintain > it. > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/5Z3SOH4JDHRGYF4NTLND4E2UFM7QIXTL/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/LTX5IT5ST3BU7T4CZ5M4G4Z4Q2X2CTNV/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: RFC: expose ssl internals for use by ctypes/cffi
On Wed, Nov 30, 2022 at 12:47 PM Steve Dower wrote: > On 11/30/2022 4:52 PM, chris...@weinigel.se wrote: > > Does this seem like a good idea? As I said, I feel that it is a bit > ugly, but it does mean that if someone wants to use some > SSL_really_obscure_function in libcrypto or libssl they can do that without > having to rebuild all of CPython themselves. > > Broadly, no, I don't think it's a good idea. We don't like encouraging > users to do things that make it hard to support them in the future. > > Nonetheless, it's one that I've had to do, and so realistically I think > it's okay to *enable* the hack without endorsing it. This is one of the > reasons I switched the Windows builds to dynamically linked OpenSSL > builds (they used to be statically linked, which meant there was no way > to get at the unused exports). So now you can use `import _ssl; > ctypes.CDLL("libssl-1_1")` to get at other exports from the module if > you need them, and there's a similar trick to get the raw handle that I > don't recall off the top of my head. > > But the only reason I'd ever want to document this is to tell people not > to rely on it. If you control your environment well enough that you can > guarantee it'll work for you, that's great. Nobody else should ever > think they're doing the "right thing". > +1 ... and in general if you want access to other OpenSSL APIs not already in the ssl module, getting them via non-stdlib packages on PyPI would be a better idea. https://pypi.org/project/cryptography/ is very well supported. https://pypi.org/project/oscrypto/ exists and is quite interesting. the old https://pypi.org/project/M2Crypto/ package still exists and seems to be maintained (wow). More context: We don't like the ssl module in the standard library - it is already too tightly tied to OpenSSL: https://discuss.python.org/t/our-future-with-openssl/21486 So if you want specific OpenSSL APIs that are not exposed, seeking to see them added to the standard library where they would then become features that need to be supported for a very long time, is going to be the most difficult approach as there'd need to be a very good reason to have them in the stdlib. Third party libraries that can provide what you need, or rolling your own libssl API wrappings however you choose to implement them, are better bets. -Greg ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5E62GYXUJLTPQWJAPEOLQDHY3F2IDRER/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: RFC: expose ssl internals for use by ctypes/cffi
On 11/30/2022 4:52 PM, chris...@weinigel.se wrote: Does this seem like a good idea? As I said, I feel that it is a bit ugly, but it does mean that if someone wants to use some SSL_really_obscure_function in libcrypto or libssl they can do that without having to rebuild all of CPython themselves. Broadly, no, I don't think it's a good idea. We don't like encouraging users to do things that make it hard to support them in the future. Nonetheless, it's one that I've had to do, and so realistically I think it's okay to *enable* the hack without endorsing it. This is one of the reasons I switched the Windows builds to dynamically linked OpenSSL builds (they used to be statically linked, which meant there was no way to get at the unused exports). So now you can use `import _ssl; ctypes.CDLL("libssl-1_1")` to get at other exports from the module if you need them, and there's a similar trick to get the raw handle that I don't recall off the top of my head. But the only reason I'd ever want to document this is to tell people not to rely on it. If you control your environment well enough that you can guarantee it'll work for you, that's great. Nobody else should ever think they're doing the "right thing". Cheers, Steve ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/AJ3PQTOMD4GL27LH5XZR7XUDWU3XKUUE/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] RFC: expose ssl internals for use by ctypes/cffi
Hi all, for quite some time I've been working on a Python implementation of a protocol called NTS which requires access to an API in OpenSSL which is not provided by the Python ssl module. I added a patch for that which unfortunately for me the maintainer did not want to accept. Some comments were made of a possible future way to give more generic access to the openssl library via ctypes/cffi but I have been unable to find more information about that. I was home sick last week and decided to take a shot at it and have built something that I feel is a bit ugly but does seem to work. I'd like to some feedback on this approach. My patches can be found on github, based on the Python 3.11 tag: https://github.com/python/cpython/compare/3.11...wingel:cpython:main Here's a short description of each patch on this branch: "bpo-37952: SSL: add support for export_keying_material" is my old patch which adds the method I need to the ssl library just for reference. The other commits add the necessary infrastructure with some example code. These commits are not ready for submission but hopefully they show what I have in mind. "Add CRYPTO_DLL_PATH and SSL_DLL_PATH to the _ssl module. " This commit adds two constants to the "_ssl" C module with the paths to libcrypto and libssl respectively. On Linux dladdr and on Windows GetModuleHandle/GetModuleFilename are used on a symbol in each library to find the path to the corresponding DLL. I've verified that this works Debian Bulleye and on Windows 10 with Visual Studio 2017. I don't own a Mac so I haven't been able to test this on macOS, but I believe dladdr is available on modern macOS so it might work out of the box. With the paths it's possible to use ctypes or cffi get a handle to these libraries. "Add API to get the address of the SSL structure" then adds an API to an SSLSocket which returns the address of the corresponding "SSL" C structure. This address can be used by ctypes/cffi. One would probably want to expose SSL_CTX, SSL_SESSION and BIO too but I started with just SSL since that's what my code needs right now. "Add a small test program" is a small test program that uses the infrastructure from the two above commits to call C functions in libssl/libcrypto using both ctypes and cffi. It's a bit ugly but hopefully it's not too hard to understand. "Example of how to extend the ssl library using ctypes" is an example of how a Python module that extends the SSL library using ctypes could look. First get a handle to libssl using ctypes, set up ctypes with the correct API for the export_keying_material function, wrap it in a more Pythonic function and then extend SSLSocket with the new function. A simplified version looks like this: import ssl, ctypes ssl_lib = ctypes.CDLL(ssl._ssl.SSL_DLL_PATH) ssl_lib.SSL_export_keying_material.argtypes = ( ctypes.c_void_p, # SSL pointer ctypes.c_void_p, ctypes.c_size_t, # out pointer, out length ctypes.c_void_p, ctypes.c_size_t, # label buffer, label length ctypes.c_void_p, ctypes.c_size_t, # context, context length ctypes.c_int) # use context flag ssl_lib.SSL_export_keying_material.restype = ctypes.c_int def SSL_export_keying_material(self, label, key_len, context = None): c_key = ctypes.create_string_buffer(key_len) c_label = ctypes.create_string_buffer(len(label)) c_context = ctypes.create_string_buffer(context, len(context)) if ssl_lib.SSL_export_keying_material( self._sslobj.get_internal_addr(), c_key, key_len, c_label, len(label), c_context, len(context), 1); return bytes(c_key) ssl.SSLSocket.export_keying_material = SSL_export_keying_material There's a final commit "Expose more OPENSSL_ variables" which add some more constants to the ssl module which expose the cflags and build information from OpenSSL. This patch is not really necessary, but it might be a good idea to compare these constants with the corresponding constants retrieved using ctypes/cffi to make sure that exactly the same version of the openssl library is used. Does this seem like a good idea? As I said, I feel that it is a bit ugly, but it does mean that if someone wants to use some SSL_really_obscure_function in libcrypto or libssl they can do that without having to rebuild all of CPython themselves. Or if they want to integrate with some other C library that wants a raw pointer to a SSL socket. Hopefully this would reduce the burden on the ssl module maintainers a bit. Anyway, if you think this is a good approach I could clean up my patches, add support for SSL_CTX/SSL_SESSION/BIO, document all of this and make it into a proper pull request. /Christer ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to
[Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant
On Tue, 29 Nov 2022 at 23:46, Steven D'Aprano wrote: > > On Tue, Nov 29, 2022 at 08:51:09PM -, Yoni Lavi wrote: > > > It does make your argument invalid though, since it's based on this > > assumption that I was asking for a requirement on iteration order > > (e.g. like dict's iteration order = insertion order guarantee), which > > is not the case. > > Yoni, I think this answer is disingenious. I don't think it is disingenuous. There are just a lot of people talking past each other and not quite understanding what each person means because there is confusion about even the intended meaning of terms like "deterministic". I will expand here with enough detail that we should hopefully be able to avoid misunderstanding each other. There are probably other places where you could find mentions of this in the docs but I just took a quick look in the Python 3.5 docs (before hash randomisation) to find this mention of dictionary iteration order: https://docs.python.org/3.5/library/stdtypes.html#dictionary-view-objects What it says is """ Keys and values are iterated over in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions. """ The key point is the use of the term "non-random" which here is intended to mean that although no particular ordering is guaranteed you can expect to rerun the same program and get the same result deterministically. A different version or implementation of Python might give a different order but rerunning the same program twice without changing anything should give the same result even if that result depends in some way on the iteration order of some dictionaries. I can't immediately find a similar statement about sets but in practice the same behaviour applied to sets as well. Note carefully that it is this *narrow* form of determinism that Yoni is interested in. Of course there are some caveats to this and the obvious one is that this statement does not apply if there are some objects that use identity based hashing so this is not deterministic: class A: def __init__(self, data): self.data = data def __repr__(self): return 'A(%s)' % self.data a1 = A(1) a2 = A(2) for a in {a1, a2}: print(a) Running this gives: $ python3.5 t.py A(2) A(1) $ python3.5 t.py A(1) A(2) On the other hand if all of the hashes themselves are deterministic then the program as a whole will be as well so this is deterministic: class A: def __init__(self, data): self.data = data def __repr__(self): return 'A(%s)' % self.data def __hash__(self): return hash(self.data) def __eq__(self): return self.data == other.data a1 = A(1) a2 = A(2) for a in {a1, a2}: print(a) $ python3.5 t.py A(1) A(2) $ python3.5 t.py A(1) A(2) So we have two classes of hashable objects: 1. Those with deterministic hash 2. Those with non-deterministic hash A program that avoids depending on the iteration order of sets or dicts containing objects with non-deterministic hash could be deterministic. It is not the case that the program would depend on the iteration order for its *correctness* but just that the behaviour of the program is *reproducible* which is useful in various ways e.g.: - You could say to someone else "run this code with CPython 3.5 and you should be able to reproduce exactly what I see when I run the program". It is common practice e.g. in scientific programming to record things like random seeds so that someone else can precisely reproduce the results shown in a paper or some other work and this in general requires that it is at least possible to make everything deterministic. - When debugging it is useful to be able to reproduce an error condition precisely. Debugging non-deterministic failures can be extremely difficult. In the same way that you might want to reproduce correctly functioning code it is also very useful to be able to reproduce bugs. I can list more examples but really it shouldn't be necessary to justify from first principles why determinism in programming is usually a good thing. There can be reasons sometimes why determinism is undesired or cannot or should not be guaranteed. It should not be controversial though to say that all things being equal determinism is usually a desirable feature and should be preferred by default. I don't think that the 3.5 docs I quoted above used the words "non-random" casually: it was an intended feature and people were aware that breaking that behaviour would be problematic in many situations. Of course in Python 3.6 this determinism was broken with the introduction of hash randomisation for strings. It was considered that for security purposes it would be better to have some internal non-deterministic behaviour to guard against attackers. Specifically the hashes of three types (str, bytes and datetime) were made non-deterministic between subsequent CPython processes. The effect was