Derrick Burns created SPARK-6002:
------------------------------------
Summary: MLLIB should support the RandomIndexing transform
Key: SPARK-6002
URL: https://issues.apache.org/jira/browse/SPARK-6002
Project: Spark
Issue Type: Improvement
Components: MLlib
Affects Versions: 1.2.1
Reporter: Derrick Burns
MLLIB offers the HashingTF. However, this simple transform offers no
guarantees on the relationship between the input and the output.
Instead of the HashingTF, MLLIB should offer Random Indexing
(http://en.wikipedia.org/wiki/Random_indexing) which does offer such guarantees.
The K-means clusterer at
https://github.com/derrickburns/generalized-kmeans-clustering includes an
implementation of the Random Indexing transform.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]