cisaacstern commented on PR #29802: URL: https://github.com/apache/beam/pull/29802#issuecomment-1867047007
I've renamed the issue to reflect my current proposed path forward, which it turns out is a [PR to _dask_](https://github.com/dask/dask/pull/10734) (not distributed). This PR appears to fix non-deterministic hashing of both strings _and `None`_ (and I believe all other types as well), whereas the previously-referenced distributed PR only fixed string hashes. > Please also add a unit test to the dask runner test suite that reproduces the original issue. 👍 `test_groupby_string_keys` added in https://github.com/apache/beam/pull/29802/commits/9067a54baf3741de3d144d77bfc2ed943b4e8236 replicates the current issue. This test passes locally for me with dask installed from the branch provided in https://github.com/apache/beam/pull/29802/commits/d6c35eb1e85136bbd06110acec9dd59770e76f35. We will need to wait for the associated PR to progress before this will pass in CI (given tight coupling of distributed/dask releases, see comment in setup.py). We can also asses what a reasonable upper bound might be at that point. > We should also add a precommit/postcommit that runs the dask runner tests (but that could be a separate task). Agreed! IIUC, this is what is being tracked in https://github.com/apache/beam/issues/25696? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
