[
https://issues.apache.org/jira/browse/SPARK-21152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-21152:
---------------------------------
Labels: bulk-closed (was: )
> Use level 3 BLAS operations in LogisticAggregator
> -------------------------------------------------
>
> Key: SPARK-21152
> URL: https://issues.apache.org/jira/browse/SPARK-21152
> Project: Spark
> Issue Type: Sub-task
> Components: ML
> Affects Versions: 2.1.1
> Reporter: Seth Hendrickson
> Priority: Major
> Labels: bulk-closed
>
> In logistic regression gradient update, we currently compute by each
> individual row. If we blocked the rows together, we can do a blocked gradient
> update which leverages the BLAS GEMM operation.
> On high dimensional dense datasets, I've observed ~10x speedups. The problem
> here, though, is that it likely won't improve the sparse case so we need to
> keep both implementations around, and this blocked algorithm will require
> caching a new dataset of type:
> {code}
> BlockInstance(label: Vector, weight: Vector, features: Matrix)
> {code}
> We have avoided caching anything beside the original dataset passed to train
> in the past because it adds memory overhead if the user has cached this
> original dataset for other reasons. Here, I'd like to discuss whether we
> think this patch would be worth the investment, given that it only improves a
> subset of the use cases.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]