Re: [datatable-help] Follow-up on subsetting data.table with NAs

Arunkumar Srinivasan Mon, 10 Jun 2013 01:29:15 -0700

> Hm, good point.  Is data.table consistent with SQL already, for both == and 
> !=, and so no change needed?


Yes, I believe it's already consistent with SQL. However, the current 
interpretation of NA (documentation) being treated as FALSE is not needed / 
untrue, imho (Please see below).
 
> And it was correct for Frank to be mistaken.  

Yes, it seems like he was mistaken.
> Maybe just some more documentation and examples needed then.

It'd be much more appropriate if the documentation reflects the role of 
subsetting in data.table mimicking "subset" function (in order to be in line 
with SQL) by dropping NA evaluated logicals. From a couple of posts before, 
where I pasted the code where NAs are replaced to FALSE were not necessary as 
`irows <- which(i)` makes clear that `which` is being used to get indices and 
then subset, this fits perfectly well with the interpretation of NA in 
data.table. 
> Are you happy that DT[!(x==.)] and DT[x!=.] do treat NA inconsistently? :
> http://stackoverflow.com/questions/16239153/dtx-and-dtx-treat-na-in-x-inconsistently

 Ha, I like the idea behind the use of () in evaluating expressions. It's 
another nice layer towards simplicity in data.table. But I still think there 
should not be an inconsistency in equivalent logical operations to provide 
different results. If !(x== .) and x != . are indeed different, then I'd 
suppose replacing `!` with a more appropriate name as it's much easier to get 
confused otherwise. 

In essence, either !(x == .) must evaluate to (x != .) if the underlying 
meaning of these are the same, or the `!` in `!(x==.)` must be replaced to 
something that's more appropriate for what it's supposed to be. Personally, I 
prefer the former. It would greatly tighten the structure and consistency.
> "na.rm = TRUE/FALSE" sounds good to me.  I'd only considered nomatch before 
> in the context of joins, not logical subsets.

Yes, I find this option would give more control in evaluating expressions with 
ease in `i`, by providing both "subset" (default) and the typical data.frame 
subsetting (na.rm = FALSE).

Best regards,
 
Arun

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] Follow-up on subsetting data.table with NAs

Reply via email to