subject:"how to use sparkR or spark MLlib load csv file on hdfs then calculate covariance"

how to use sparkR or spark MLlib load csv file on hdfs then calculate covariance

2015-12-28 Thread zhangjp

hi all, I want to use sparkR or spark MLlib load csv file on hdfs then calculate covariance, how to do it . thks.

Re: how to use sparkR or spark MLlib load csv file on hdfs then calculate covariance

2015-12-28 Thread Yanbo Liang

Load csv file: df <- read.df(sqlContext, "file-path", source = "com.databricks.spark.csv", header = "true") Calculate covariance: cov <- cov(df, "col1", "col2") Cheers Yanbo 2015-12-28 17:21 GMT+08:00 zhangjp <592426...@qq.com>: > hi all, > I want to use sparkR or spark MLlib load csv

Re: how to use sparkR or spark MLlib load csv file on hdfs then calculate covariance

2015-12-28 Thread Andy Davidson

Hi Yanbo I use spark.csv to load my data set. I work with both Java and Python. I would recommend you print the first couple of rows and also print the schema to make sure your data is loaded as you expect. You might find the following code example helpful. You may need to programmatically set

Re: how to use sparkR or spark MLlib load csv file on hdfs then calculate covariance

2015-12-28 Thread Felix Cheung

Make sure you add the csv spark package as this example here so that the source parameter in R read.df would work: https://spark.apache.org/docs/latest/sparkr.html#from-data-sources _ From: Andy Davidson Sent: Monday, December 28,