[
https://issues.apache.org/jira/browse/SPARK-11439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994861#comment-14994861
]
Nakul Jindal commented on SPARK-11439:
--------------------------------------
This is the piece of R code that is used as reference for the test :
predictions <- predict(fit, newx=features)
residuals <- label - predictions
mean(residuals^2) # MSE
mean(abs(residuals)) # MAD
cor(predictions, label)^2# r^2
How do I create the "fit" object?
NOTE : I have no experience with R and have scrounged whatever little knowledge
I could get by asking around and from the internet.
I tried this:
In a Spark REPL:
import org.apache.spark.mllib.util.LinearDataGenerator
val data = sc.parallelize(LinearDataGenerator.generateLinearInput(6.3,
Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 10000, 42, 0.1), 2)
data.map(x=> x.label + ", " + x.features(0) + ", " +
x.features(1)).coalesce(1).saveAsTextFile("path")
Then, in an R Shell:
library("glmnet")
d1 <- read.csv("path/part-00000", header=FALSE, stringsAsFactors=FALSE)
features <- as.matrix(data.frame(as.numeric(d1$V2), as.numeric(d1$V3)))
label <- as.numeric(d1$V1)
fit <- glmnet(features, label, family="gaussian", alpha = 0, lambda = 0)
I then used this fit object in the earlier snippet of R code. The results were
too way off.
> mean(residuals^2)
[1] 10885.15
>
> mean(abs(residuals))
[1] 103.959
>
> cor(predictions, label)^2
[,1]
s0 0.9998749
So, I guess, that is not how you create the "fit" object.
How do you create the "fit" object?
> Optimization of creating sparse feature without dense one
> ---------------------------------------------------------
>
> Key: SPARK-11439
> URL: https://issues.apache.org/jira/browse/SPARK-11439
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Reporter: Kai Sasaki
> Priority: Minor
>
> Currently, sparse feature generated in {{LinearDataGenerator}} needs to
> create dense vectors once. It is cost efficient to prevent from generating
> dense feature when creating sparse features.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]