[ 
https://issues.apache.org/jira/browse/SPARK-11439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994861#comment-14994861
 ] 

Nakul Jindal commented on SPARK-11439:
--------------------------------------

This is the piece of R code that is used as reference for the test :

predictions <- predict(fit, newx=features)
residuals <- label - predictions
mean(residuals^2) # MSE     
mean(abs(residuals)) # MAD
cor(predictions, label)^2# r^2

How do I create the "fit" object?

NOTE : I have no experience with R and have scrounged whatever little knowledge 
I could get by asking around and from the internet.

I tried this:

In a Spark REPL:
import org.apache.spark.mllib.util.LinearDataGenerator
val data = sc.parallelize(LinearDataGenerator.generateLinearInput(6.3, 
Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 10000, 42, 0.1), 2)
data.map(x=> x.label + ", " + x.features(0) + ", " + 
x.features(1)).coalesce(1).saveAsTextFile("path")

Then, in an R Shell:
library("glmnet")
d1 <- read.csv("path/part-00000", header=FALSE, stringsAsFactors=FALSE)
features <- as.matrix(data.frame(as.numeric(d1$V2), as.numeric(d1$V3)))
label <- as.numeric(d1$V1)
fit <- glmnet(features, label, family="gaussian", alpha = 0, lambda = 0)

I then used this fit object in the earlier snippet of R code. The results were 
too way off.
> mean(residuals^2)
[1] 10885.15
> 
> mean(abs(residuals))
[1] 103.959
> 
>  cor(predictions, label)^2
        [,1]
s0 0.9998749


So, I guess, that is not how you create the "fit" object.

How do you create the "fit" object?

     


> Optimization of creating sparse feature without dense one
> ---------------------------------------------------------
>
>                 Key: SPARK-11439
>                 URL: https://issues.apache.org/jira/browse/SPARK-11439
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Kai Sasaki
>            Priority: Minor
>
> Currently, sparse feature generated in {{LinearDataGenerator}} needs to 
> create dense vectors once. It is cost efficient to prevent from generating 
> dense feature when creating sparse features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to