Re: [datatable-help] Follow-up on subsetting data.table with NAs

Arunkumar Srinivasan Mon, 10 Jun 2013 00:11:39 -0700

Matthew,

Regarding your suggestion of changes regarding Frank's post here: 
http://stackoverflow.com/a/17008872/559784 I find it a bit more confusing and 
frankly not like sql.


You wrote: "If I haven't understood correctly feel free to correct, otherwise 
the change will get made eventually. It will need to be done in a way that 
considers compound expressions; e.g., DT[colA=="foo" & colB!="bar"] should 
exclude rows with NA in colA but include rows where colA is non-NA but colB is 
NA. Similarly, DT[colA!=colB] should include rows where either colA or colB is 
NA but not both. And perhaps DT[colA==colB] should include rows where bothcolA 
and colB are NA (which it doesn't currently, I believe)."

Even though sql (ex: sqldf) has a different way of handling NAs when compared 
to data.frame, it doesn't seem to find NA == NA. That is,

df <- data.frame(x = c(1:3,NA), y = c(NA,4:5,NA))
require(sqldf)

sqldf("select * from df where x == y")
# returns empty data.frame

sqldf("select * from df where x != y")
  x y
1 2 4
2 3 5


That is, at least in sqldf package, NA is not == NA and NA is not != NA which 
is very much in coherence with R's default NA == NA and NA != NA (both giving 
NA). But I don't think they it's considered FALSE here. It just acts like the 
"subset" function where all entries that were evaluated to NAs are simply 
dropped. But with data.table philosophy NA != NA should be evaluated to TRUE, 
which I don't think (from what I meagrely understand from sql) is what sql 
does. Please correct me if I've got it wrong.

I think it is clearer and simpler if "NAs are just dropped" after evaluating 
logical expressions. It would be also easy to document this and easier to 
grasp, imho. This would also explain Frank's post for NA rows being removed. 

And probably if there is more consensus an option for "na.rm = TRUE/FALSE" 
could be added?

Arun

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] Follow-up on subsetting data.table with NAs

Reply via email to