Hi, Yes you are right.
I think the problem is with reading of csv files. read.df is not considering NAs in the CSV file So what would be a workable solution in dealing with NAs in csv files? On Mon, Jan 25, 2016 at 2:31 PM, Deborah Siegel <deborah.sie...@gmail.com> wrote: > Hi Devesh, > > I'm not certain why that's happening, and it looks like it doesn't happen > if you use createDataFrame directly: > aq <- createDataFrame(sqlContext,airquality) > head(dropna(aq,how="any")) > > If I had to guess.. dropna(), I believe, drops null values. I suppose its > possible that createDataFrame converts R's <NA> values to null, so dropna() > works with that. But perhaps read.df() does not convert R <NA>s to null, as > those are most likely interpreted as strings when they come in from the > csv. Just a guess, can anyone confirm? > > Deb > > > > > > > On Sun, Jan 24, 2016 at 11:05 PM, Devesh Raj Singh <raj.deves...@gmail.com > > wrote: > >> Hi, >> >> I have applied the following code on airquality dataset available in R , >> which has some missing values. I want to omit the rows which has NAs >> >> library(SparkR) Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" >> "com.databricks:spark-csv_2.10:1.2.0" "sparkr-shell"') >> >> sc <- sparkR.init("local",sparkHome = >> "/Users/devesh/Downloads/spark-1.5.1-bin-hadoop2.6") >> >> sqlContext <- sparkRSQL.init(sc) >> >> path<-"/Users/devesh/work/airquality/" >> >> aq <- read.df(sqlContext,path,source = "com.databricks.spark.csv", >> header="true", inferSchema="true") >> >> head(dropna(aq,how="any")) >> >> I am getting the output as >> >> Ozone Solar_R Wind Temp Month Day 1 41 190 7.4 67 5 1 2 36 118 8.0 72 5 2 >> 3 12 149 12.6 74 5 3 4 18 313 11.5 62 5 4 5 NA NA 14.3 56 5 5 6 28 NA 14.9 >> 66 5 6 >> >> The NAs still exist in the output. Am I missing something here? >> >> -- >> Warm regards, >> Devesh. >> > > -- Warm regards, Devesh.