[ 
https://issues.apache.org/jira/browse/SPARK-11439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986684#comment-14986684
 ] 

Nakul Jindal commented on SPARK-11439:
--------------------------------------

[~holdenk] [~lewuathe] - A couple of places where there could be work savings :

1. 
[L144|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala#L144]
 - Here is where sparsity data is first populated. The index array and values 
array can be maintained and populated at this line. The problem is that this 
won't sit well with blas.ddot at line 
[L153|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala#L153].
 Either a new weights array would need to be created or the ddot function would 
need to be rewritten.

2. 
[L162|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala#L162]
 - If done here, we would essentially be doing what toSparse does internally. 

Either of these cases don't make sense to me.  
Suggestions on what direction to take?

> Optiomization of creating sparse feature without dense one
> ----------------------------------------------------------
>
>                 Key: SPARK-11439
>                 URL: https://issues.apache.org/jira/browse/SPARK-11439
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Kai Sasaki
>            Priority: Minor
>
> Currently, sparse feature generated in {{LinearDataGenerator}} needs to 
> create dense vectors once. It is cost efficient to prevent from generating 
> dense feature when creating sparse features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to