How to perform basic statistics on a Json file to explore my numeric and non-numeric variables?

2015-07-30 Thread SparknewUser
I've imported a Json file which has this schema : sqlContext.read.json(filename).printSchema root |-- COL: long (nullable = true) |-- DATA: array (nullable = true) ||-- element: struct (containsNull = true) |||-- Crate: string (nullable = true)

How to read a Json file with a specific format?

2015-07-29 Thread SparknewUser
I'm trying to read a Json file which is like : [ {IFAM:EQR,KTM:143000640,COL:21,DATA:[{MLrate:30,Nrout:0,up:null,Crate:2} ,{MLrate:30,Nrout:0,up:null,Crate:2} ,{MLrate:30,Nrout:0,up:null,Crate:2} ,{MLrate:30,Nrout:0,up:null,Crate:2} ,{MLrate:30,Nrout:0,up:null,Crate:2}

How to get the best performance with LogisticRegressionWithSGD?

2015-05-27 Thread SparknewUser
I'm new to Spark and I'm getting bad performance with classification methods on Spark MLlib (worse than R in terms of AUC). I am trying to put my own parameters rather than the default parameters. Here is the method I want to use : train(RDDLabeledPoint input, int numIterations,

MLlib: how to get the best model with only the most significant explanatory variables in LogisticRegressionWithLBFGS or LogisticRegressionWithSGD ?

2015-05-22 Thread SparknewUser
I am new in MLlib and in Spark.(I use Scala) I'm trying to understand how LogisticRegressionWithLBFGS and LogisticRegressionWithSGD work. I usually use R to do logistic regressions but now I do it on Spark to be able to analyze Big Data. The model only returns weights and intercept. My problem