[
https://issues.apache.org/jira/browse/SPARK-19771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893097#comment-15893097
]
Yun Ni edited comment on SPARK-19771 at 3/2/17 9:55 PM:
--------------------------------------------------------
[~merlin]
(1) The computation cost is NumHashFunctions because we go through each index
only once. I don't know what's N in the memory overhead?
(2) The hash values are not necessarily 0, 1, -1.
(3) If we really want a hash function of Vector, why not use Vector.hashCode?
was (Author: yunn):
[~merlin]
(1) The computation cost is NumHashFunctions because we go through each index
only once. I don't know what's N in the memory overhead?
(2) The hash values are not necessarily {0, 1, -1}.
(3) If we really want a hash function of Vector, why not use Vector.hashCode?
> Support OR-AND amplification in Locality Sensitive Hashing (LSH)
> ----------------------------------------------------------------
>
> Key: SPARK-19771
> URL: https://issues.apache.org/jira/browse/SPARK-19771
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 2.1.0
> Reporter: Yun Ni
>
> The current LSH implementation only supports AND-OR amplification. We need to
> discuss the following questions before we goes to implementations:
> (1) Whether we should support OR-AND amplification
> (2) What API changes we need for OR-AND amplification
> (3) How we fix the approxNearestNeighbor and approxSimilarityJoin internally.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]