[ https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15515529#comment-15515529 ]
Seth Hendrickson edited comment on SPARK-17134 at 9/23/16 6:09 AM: ------------------------------------------------------------------- This makes sense. In my initial testing I found that having to standardize the features in every iteration takes a non-trivial amount of time. Still, you mentioned the desire to not cache the standardized dataset since it can create unnecessary memory overhead. One solution is to allow the users to specify that their data has already been standardized, and then we don't have to perform the extra divisions in the update method. Alternatively, we could do as you suggest above, but store the coefficients in column major order in order to still maximize cache hits. We'll need some testing for both cases to truly understand this. was (Author: sethah): This makes sense. In my initial testing I found that having to standardize the features in every iteration takes a non-trivial amount of time. Still, you mentioned the desire to not cache the standardized dataset since it can create unnecessary memory overhead. One solution is to allow the users to specify that there data has already been standardized, and then we don't have to perform the extra divisions in the update method. Alternatively, we could do as you suggest above, but store the coefficients in column major order in order to still maximize cache hits. We'll need some testing for both cases to truly understand this. > Use level 2 BLAS operations in LogisticAggregator > ------------------------------------------------- > > Key: SPARK-17134 > URL: https://issues.apache.org/jira/browse/SPARK-17134 > Project: Spark > Issue Type: Sub-task > Components: ML > Reporter: Seth Hendrickson > > Multinomial logistic regression uses LogisticAggregator class for gradient > updates. We should look into refactoring MLOR to use level 2 BLAS operations > for the updates. Performance testing should be done to show improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org