Seth Hendrickson commented on SPARK-17134:

This makes sense. In my initial testing I found that having to standardize the 
features in every iteration takes a non-trivial amount of time. Still, you 
mentioned the desire to not cache the standardized dataset since it can create 
unnecessary memory overhead. One solution is to allow the users to specify that 
there data has already been standardized, and then we don't have to perform the 
extra divisions in the update method. Alternatively, we could do as you suggest 
above, but store the coefficients in column major order in order to still 
maximize cache hits.

We'll need some testing for both cases to truly understand this.

> Use level 2 BLAS operations in LogisticAggregator
> -------------------------------------------------
>                 Key: SPARK-17134
>                 URL: https://issues.apache.org/jira/browse/SPARK-17134
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Seth Hendrickson
> Multinomial logistic regression uses LogisticAggregator class for gradient 
> updates. We should look into refactoring MLOR to use level 2 BLAS operations 
> for the updates. Performance testing should be done to show improvements.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to