[ 
https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428848#comment-15428848
 ] 

DB Tsai edited comment on SPARK-17134 at 8/19/16 9:21 PM:
----------------------------------------------------------

It may also worth to try the following. I see some performance improvement when 
the # of classes are high, and we can avoid doing normalization again and 
again. By doing so, the access pattern of coefficients array will not be 
sequential, and will change the locality of caching in CPU. As a result, I 
think the performance will be case by case. Maybe we can store the coefficients 
in a transpose matrix which may help the locality? 

We need more investigation to understand the problem.

{code:borderStyle=solid}
val margins = Array.ofDim[Double](numClasses)
features.foreachActive { (index, value) =>
  if (featuresStd(index) != 0.0 && value != 0.0) {
    var i = 0
    val temp = value / featuresStd(index)
    while ( i < numClasses) {
      margins(i) += coefficients(i * numFeaturesPlusIntercept + index) * temp
      i += 1
   }
  }
}

if (fitIntercept) {
  var i = 0
  val length = features.size
  while ( i < numClasses) {
    margins(i) += coefficients(i * numFeaturesPlusIntercept + length)
    i += 1
  }
}

val maxMargin = margins.max
val marginOfLabel = margins(label.toInt)
{code}



was (Author: dbtsai):
{code:borderStyle=solid}
val margins = Array.ofDim[Double](numClasses)
features.foreachActive { (index, value) =>
  if (featuresStd(index) != 0.0 && value != 0.0) {
    var i = 0
    val temp = value / featuresStd(index)
    while ( i < numClasses) {
      margins(i) += coefficients(i * numFeaturesPlusIntercept + index) * temp
      i += 1
   }
  }
}

if (fitIntercept) {
  var i = 0
  val length = features.size
  while ( i < numClasses) {
    margins(i) += coefficients(i * numFeaturesPlusIntercept + length)
    i += 1
  }
}

val maxMargin = margins.max
val marginOfLabel = margins(label.toInt)
{code}


> Use level 2 BLAS operations in LogisticAggregator
> -------------------------------------------------
>
>                 Key: SPARK-17134
>                 URL: https://issues.apache.org/jira/browse/SPARK-17134
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Seth Hendrickson
>
> Multinomial logistic regression uses LogisticAggregator class for gradient 
> updates. We should look into refactoring MLOR to use level 2 BLAS operations 
> for the updates. Performance testing should be done to show improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to