[
https://issues.apache.org/jira/browse/SPARK-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-2372.
------------------------------
Resolution: Won't Fix
Sounds like a WontFix given the PR discussion
> Grouped Optimization/Learning
> -----------------------------
>
> Key: SPARK-2372
> URL: https://issues.apache.org/jira/browse/SPARK-2372
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Affects Versions: 1.0.1, 1.0.2, 1.1.0
> Reporter: Kyle Ellrott
>
> The purpose of this patch is the enable MLLib to better handle scenarios
> where the user would want to do learning on multiple feature/label sets at
> the same time. Rather then schedule each learning task separately, this patch
> lets the user create a single RDD with an Int key to represent the 'group'
> sets of entries belong to.
> This patch establishing the GroupedOptimizer trait, for which
> GroupedGradientDescent has been implemented. This systems differs from the
> original Optimizer trait in that the original optimize method accepted
> RDD[(Int, Vector)] the new GroupedOptimizer accepts RDD[(Int, (Double,
> Vector))].
> The difference is that the GroupedOptimizer uses a 'group' ID key in the RDD
> to multiplex multiple optimization operations in the same RDD.
> This patch also establishes the GroupedGeneralizedLinearAlgorithm trait, for
> which the 'run' method has had the RDD[LabeledPoint] input replaced with
> RDD[(Int,LabeledPoint)].
> This patch also provides a unit test and utility to take the results of
> MLUtils.kFold and turn it into a single grouped RDD, ready for simultaneous
> learning.
> https://github.com/apache/spark/pull/1292
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]