Derrick Burns created SPARK-5405:
------------------------------------

             Summary: Spark clusterer should support high dimensional data
                 Key: SPARK-5405
                 URL: https://issues.apache.org/jira/browse/SPARK-5405
             Project: Spark
          Issue Type: New Feature
          Components: MLlib
    Affects Versions: 1.2.0
            Reporter: Derrick Burns


The MLLIB clusterer works well for low  (<200) dimensional data.  However, 
performance is linear with the number of dimensions.  So, for practical 
purposes, it is not very useful for high dimensional data.  

Depending on the data type, one can embed the high dimensional data into lower 
dimensional spaces in a distance-preserving way.  The Spark clusterer should 
support such embedding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to