hi all,
I want to use sparkR or spark MLlib load csv file on hdfs then calculate
covariance, how to do it .
thks.
Load csv file:
df <- read.df(sqlContext, "file-path", source = "com.databricks.spark.csv",
header = "true")
Calculate covariance:
cov <- cov(df, "col1", "col2")
Cheers
Yanbo
2015-12-28 17:21 GMT+08:00 zhangjp <592426...@qq.com>:
> hi all,
> I want to use sparkR or spark MLlib load csv
Hi Yanbo
I use spark.csv to load my data set. I work with both Java and Python. I
would recommend you print the first couple of rows and also print the schema
to make sure your data is loaded as you expect. You might find the following
code example helpful. You may need to programmatically set
Make sure you add the csv spark package as this example here so that the source
parameter in R read.df would work:
https://spark.apache.org/docs/latest/sparkr.html#from-data-sources
_
From: Andy Davidson
Sent: Monday, December 28,