Derrick Burns created SPARK-6002:
------------------------------------

             Summary: MLLIB should support the RandomIndexing transform
                 Key: SPARK-6002
                 URL: https://issues.apache.org/jira/browse/SPARK-6002
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
    Affects Versions: 1.2.1
            Reporter: Derrick Burns


MLLIB offers the HashingTF.  However, this simple transform offers no 
guarantees on the relationship between the input and the output. 

Instead of the HashingTF, MLLIB should offer Random Indexing 
(http://en.wikipedia.org/wiki/Random_indexing) which does offer such guarantees.

The K-means clusterer at 
https://github.com/derrickburns/generalized-kmeans-clustering includes an 
implementation of the Random Indexing transform.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to