Monica Song created BEAM-10920:
----------------------------------

             Summary: Investigate python hash libraries
                 Key: BEAM-10920
                 URL: https://issues.apache.org/jira/browse/BEAM-10920
             Project: Beam
          Issue Type: Bug
          Components: dependencies, sdk-py-core
            Reporter: Monica Song


stats.ApproximateUnique has an optional mmh3 dependency [1] (mmh3 is roughly 
9xs faster than md5), but if that repository is problematic for users, we 
should look into alternatives.

Other options: sklearn.utils.murmurhash3_32

  [1][https://github.com/hajimes/mmh3,] [https://pypi.org/project/mmh3/2.0/]

 

cc: [~tvalentyn]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to