Hi,

I have applied the following code on airquality dataset available in R ,
which has some missing values. I want to omit the rows which has NAs

library(SparkR) Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages"
"com.databricks:spark-csv_2.10:1.2.0" "sparkr-shell"')

sc <- sparkR.init("local",sparkHome =
"/Users/devesh/Downloads/spark-1.5.1-bin-hadoop2.6")

sqlContext <- sparkRSQL.init(sc)

path<-"/Users/devesh/work/airquality/"

aq <- read.df(sqlContext,path,source = "com.databricks.spark.csv",
header="true", inferSchema="true")

head(dropna(aq,how="any"))

I am getting the output as

Ozone Solar_R Wind Temp Month Day 1 41 190 7.4 67 5 1 2 36 118 8.0 72 5 2 3
12 149 12.6 74 5 3 4 18 313 11.5 62 5 4 5 NA NA 14.3 56 5 5 6 28 NA 14.9 66
5 6

The NAs still exist in the output. Am I missing something here?

-- 
Warm regards,
Devesh.

Reply via email to