vicaya commented on issue #2069: Use pure python implementation of MurmurHash URL: https://github.com/apache/bookkeeper/pull/2069#issuecomment-492453989 @merlimat, where is your pymmh3 source repo? I just tried it. It works on python 2.7.x but fails on python 3.7.x: ``` Python 3.7.3 (default, Apr 8 2019, 12:02:14) [Clang 10.0.1 (clang-1001.0.46.3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pymmh3 >>> pymmh3.hash64("foo") Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: module 'pymmh3' has no attribute 'hash64' >>> ``` This is due to a bug in `pymmh3/__init__.py`: `from pymmh3 import *` missing a dot. A quick microbenchmark with `perf` shows that the pure python impl ranges from being 16x slower for a short message ("foo"), 44x slower for a medium message (42 chars), to 710x slower for a long message (1512 chars (The Gettysburg Address)) than mmh3: ``` $ python3 simple.py ..................... short mmh3: Mean +- std dev: 429 ns +- 26 ns ..................... short pymmh3: Mean +- std dev: 6.85 us +- 0.08 us ..................... medium mmh3: Mean +- std dev: 426 ns +- 8 ns ..................... medium pymmh3: Mean +- std dev: 18.6 us +- 0.4 us ..................... long mmh3: Mean +- std dev: 705 ns +- 14 ns ..................... long pymmh3: Mean +- std dev: 501 us +- 21 us ``` Since the bk client only impacts pulsar functions, which is deployed inside official broker containers, which would always have mmh3 installed, there should be no perf impact in usual deployment. The slow down only happens when people what to custom build smaller broker container images.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
