[ 
https://issues.apache.org/jira/browse/SPARK-11439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992828#comment-14992828
 ] 

Nakul Jindal commented on SPARK-11439:
--------------------------------------

I seem to be running into a problem.

1. 
[This|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala#L124-L165]
 is the current implementation.
2. [This|https://gist.github.com/nakul02/9341a9ed67cd192d98df] is the 
implementation that I tried first (and it passed all tests).
3. [This|https://gist.github.com/nakul02/4f5392c7d5997871da7b] is an improved 
implementation that doesn't form the "x" array, but it fails tests in suites -
* org.apache.spark.ml.regression.LinearRegressionSuite
* org.apache.spark.ml.evaluation.RegressionEvaluatorSuite

The difference between 2 and 3 is the way in which the random number generator 
is used. Could this possibly cause the tests to fail? Maybe I am doing 
something obviously stupid here. 
This is frustrating and any insight would help!



> Optiomization of creating sparse feature without dense one
> ----------------------------------------------------------
>
>                 Key: SPARK-11439
>                 URL: https://issues.apache.org/jira/browse/SPARK-11439
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Kai Sasaki
>            Priority: Minor
>
> Currently, sparse feature generated in {{LinearDataGenerator}} needs to 
> create dense vectors once. It is cost efficient to prevent from generating 
> dense feature when creating sparse features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to