I've imported a Json file which has this schema : sqlContext.read.json("filename").printSchema root |-- COL: long (nullable = true) |-- DATA: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- Crate: string (nullable = true) | | |-- MLrate: string (nullable = true) | | |-- Nrout: string (nullable = true) | | |-- up: string (nullable = true) |-- IFAM: string (nullable = true) |-- KTM: long (nullable = true)
I'm new on Spark and I want to perform basic statistics like * getting the min, max, mean, median and std of numeric variables * getting the values frequencies for non-numeric variables. My questions are : - How to change the type of my variables in my schema, from 'string' to 'numeric' ? (Crate, MLrate and Nrout should be numeric variables) ? - How to do those basic statistics easily ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-perform-basic-statistics-on-a-Json-file-to-explore-my-numeric-and-non-numeric-variables-tp24077.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org