You probably need to scale the values in the data set so that they are all of comparable ranges and translate them so that their means get to 0.

You can use pyspark.mllib.feature.StandardScaler(True, True) object for that.

On 28.5.2015. 6:08, Maheshakya Wijewardena wrote:

Hi,

I'm trying to use Sparks' *LinearRegressionWithSGD* in PySpark with the attached dataset. The code is attached. When I check the model weights vector after training, it contains `nan` values.
[nan,nan,nan,nan,nan,nan,nan,nan]
But for some data sets, this problem does not occur. What might be the reason 
for this?
Is this an issue with the data I'm using or a bug?
Best regards.
--
Pruthuvi Maheshakya Wijewardena
Software Engineer
WSO2 Lanka (Pvt) Ltd
Email: mahesha...@wso2.com <mailto:mahesha...@wso2.com>
Mobile: +94711228855/*
*/




---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to