Yun Ni created SPARK-18334:
------------------------------
Summary: MinHash should use binary hash distance
Key: SPARK-18334
URL: https://issues.apache.org/jira/browse/SPARK-18334
Project: Spark
Issue Type: Bug
Reporter: Yun Ni
Priority: Trivial
MinHash currently is using the same `hashDistance` function as
RandomProjection. This does not make sense for MinHash because the Jaccard
distance of two sets is not relevant to the absolute distance of their hash
buckets indices.
This bug could affect accuracy of multi probing NN search for MinHash.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]