cisaacstern commented on PR #29802:
URL: https://github.com/apache/beam/pull/29802#issuecomment-1867047007

   I've renamed the issue to reflect my current proposed path forward, which it 
turns out is a [PR to _dask_](https://github.com/dask/dask/pull/10734) (not 
distributed).
   
   This PR appears to fix non-deterministic hashing of both strings _and 
`None`_ (and I believe all other types as well), whereas the 
previously-referenced distributed PR only fixed string hashes.
   
   > Please also add a unit test to the dask runner test suite that reproduces 
the original issue.
   
   👍 `test_groupby_string_keys` added in 
https://github.com/apache/beam/pull/29802/commits/9067a54baf3741de3d144d77bfc2ed943b4e8236
 replicates the current issue. This test passes locally for me with dask 
installed from the branch provided in 
https://github.com/apache/beam/pull/29802/commits/d6c35eb1e85136bbd06110acec9dd59770e76f35.
 We will need to wait for the associated PR to progress before this will pass 
in CI (given tight coupling of distributed/dask releases, see comment in 
setup.py). We can also asses what a reasonable upper bound might be at that 
point.
   
   > We should also add a precommit/postcommit that runs the dask runner tests 
(but that could be a separate task).
   
   Agreed! IIUC, this is what is being tracked in 
https://github.com/apache/beam/issues/25696?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to