Re: NA value handling in sparkR

2016-01-27 Thread Hyukjin Kwon
R data.frame. > https://eradiating.wordpress.com/2016/01/04/whats-new-in-sparkr-1-6-0/ > > > > _ > From: Devesh Raj Singh <raj.deves...@gmail.com> > Sent: Wednesday, January 27, 2016 3:19 AM > Subject: Re: NA value handling in sparkR > To:

Re: NA value handling in sparkR

2016-01-27 Thread Felix Cheung
/04/whats-new-in-sparkr-1-6-0/ _ From: Devesh Raj Singh <raj.deves...@gmail.com> Sent: Wednesday, January 27, 2016 3:19 AM Subject: Re: NA value handling in sparkR To: Deborah Siegel <deborah.sie...@gmail.com> Cc: <user@spark.apache.o

Re: NA value handling in sparkR

2016-01-27 Thread Devesh Raj Singh
Hi, While dealing with missing values with R and SparkR I observed the following. Please tell me if I am right or wrong? Missing values in native R are represented with a logical constant-NA. SparkR DataFrames represents missing values with NULL. If you use createDataFrame() to turn a local R

Re: NA value handling in sparkR

2016-01-26 Thread Deborah Siegel
While fitting the currently available sparkR models, such as glm for linear and logistic regression, columns which contains strings are one-hot encoded behind the scenes, as part of the parsing of the RFormula. Does that help, or did you have something else in mind? > Thank you so much for

Re: NA value handling in sparkR

2016-01-26 Thread Devesh Raj Singh
Hi, If we want to create dummy variables out of categorical columns for data manipulation purpose, how would we do it in sparkR? On Wednesday, January 27, 2016, Deborah Siegel wrote: > While fitting the currently available sparkR models, such as glm for > linear and

Re: NA value handling in sparkR

2016-01-25 Thread Deborah Siegel
Maybe not ideal, but since read.df is inferring all columns from the csv containing "NA" as type of strings, one could filter them rather than using dropna(). filtered_aq <- filter(aq, aq$Ozone != "NA" & aq$Solar_R != "NA") head(filtered_aq) Perhaps it would be better to have an option for

Re: NA value handling in sparkR

2016-01-25 Thread Devesh Raj Singh
Hi, Yes you are right. I think the problem is with reading of csv files. read.df is not considering NAs in the CSV file So what would be a workable solution in dealing with NAs in csv files? On Mon, Jan 25, 2016 at 2:31 PM, Deborah Siegel wrote: > Hi Devesh, > >

NA value handling in sparkR

2016-01-24 Thread Devesh Raj Singh
Hi, I have applied the following code on airquality dataset available in R , which has some missing values. I want to omit the rows which has NAs library(SparkR) Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.2.0" "sparkr-shell"') sc <-