[ 
https://issues.apache.org/jira/browse/BEAM-10920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223966#comment-17223966
 ] 

Valentyn Tymofieiev commented on BEAM-10920:
--------------------------------------------

mmh3 currently does not release python wheels[1], which makes it challenging to 
install on some platforms since installation it needs a compiler (e.g. gcc). I 
reached out to the maintainer to see if they would consider adding wheels. 
Keeping it as optional  dependency for now sounds appropriate. Good to know 
that sklearn also has an implementation of this functionality.

https://pypi.org/project/mmh3/#files 

> Investigate python hash libraries
> ---------------------------------
>
>                 Key: BEAM-10920
>                 URL: https://issues.apache.org/jira/browse/BEAM-10920
>             Project: Beam
>          Issue Type: Bug
>          Components: dependencies, sdk-py-core
>            Reporter: Monica Song
>            Priority: P3
>
> stats.ApproximateUnique has an optional mmh3 dependency [1] (mmh3 is roughly 
> 9xs faster than md5), but if that repository is problematic for users, we 
> should look into alternatives.
> Other options: sklearn.utils.murmurhash3_32
>   [1][https://github.com/hajimes/mmh3,] [https://pypi.org/project/mmh3/2.0/]
>  
> cc: [~tvalentyn]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to