[
https://issues.apache.org/jira/browse/SPARK-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Wendell resolved SPARK-1212.
------------------------------------
Resolution: Fixed
> Support sparse data in MLlib
> ----------------------------
>
> Key: SPARK-1212
> URL: https://issues.apache.org/jira/browse/SPARK-1212
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Affects Versions: 0.9.0
> Reporter: Xiangrui Meng
> Assignee: Xiangrui Meng
> Priority: Blocker
> Fix For: 1.0.0
>
>
> MLlib's NaiveBayes, SGD, and KMeans accept RDD[LabeledPoint] for training and
> RDD[Array[Double]] for prediction, where LabeledPoint is a wrapper of
> (Double, Array[Double]). Using Array[Double] could have good performance, but
> sparse data appears quite often in practice. So I created this JIRA to
> discuss the plan of adding sparse data support to MLlib and track its
> progress.
> The goal is to support sparse data for training and prediction in all
> existing algorithms in MLlib:
> * Gradient Descent
> * K-Means
> * Naive Bayes
> Previous discussions and pull requests:
> * https://github.com/mesos/spark/pull/736
--
This message was sent by Atlassian JIRA
(v6.2#6252)