I'm working on LinearRegressionWithElasticNet using OWLQN now. This
will do the data standardization internally so it's transparent to
users. With OWLQN, you don't have to manually choose stepSize. Will
send out PR soon next week.

Sincerely,

DB Tsai
-------------------------------------------------------
Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai



On Thu, Jan 15, 2015 at 8:46 AM, devl.development
<[email protected]> wrote:
> From what I gather, you use LinearRegressionWithSGD to predict y or the
> response variable given a feature vector x.
>
> In a simple example I used a perfectly linear dataset such that x=y
> y,x
> 1,1
> 2,2
> ...
>
> 10000,10000
>
> Using the out-of-box example from the website (with and without scaling):
>
>  val data = sc.textFile(file)
>
>     val parsedData = data.map { line =>
>       val parts = line.split(',')
>      LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y
> and x
>
>     }
>     val scaler = new StandardScaler(withMean = true, withStd = true)
>       .fit(parsedData.map(x => x.features))
>     val scaledData = parsedData
>       .map(x =>
>       LabeledPoint(x.label,
>         scaler.transform(Vectors.dense(x.features.toArray))))
>
>     // Building the model
>     val numIterations = 100
>     val model = LinearRegressionWithSGD.train(parsedData, numIterations)
>
>     // Evaluate model on training examples and compute training error *
> tried using both scaledData and parsedData
>     val valuesAndPreds = scaledData.map { point =>
>       val prediction = model.predict(point.features)
>       (point.label, prediction)
>     }
>     val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
>     println("training Mean Squared Error = " + MSE)
>
> Both scaled and unscaled attempts give:
>
> training Mean Squared Error = NaN
>
> I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
> still comes up with the same thing.
>
> Is this not supposed to work for x and y or 2 dimensional plots? Is there
> something I'm missing or wrong in the code above? Or is there a limitation
> in the method?
>
> Thanks for any advice.
>
>
>
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to