Re: [datatable-help] Follow-up on subsetting data.table with NAs

Arunkumar Srinivasan Mon, 10 Jun 2013 01:36:24 -0700

Hi Matthew,
My view (from the last reply) more or less reflects mnel's comments here: 
http://stackoverflow.com/questions/16239153/dtx-and-dtx-treat-na-in-x-inconsistently#comment23317096_16240143


Pasted here for convenience:
data.table is mimicing subset in its handling of NA values in logical i 
arguments. -- the only issue is the ! prefix signifying a not-join, not the way 
one might expect. Perhaps the not join prefix could have been NJ not ! to avoid 
this confusion -- this might be another discussion to have on the mailing list 
-- (I think it is a discussion worth having) 

Arun


On Monday, June 10, 2013 at 10:28 AM, Arunkumar Srinivasan wrote:

> > Hm, good point.  Is data.table consistent with SQL already, for both == and 
> > !=, and so no change needed?  
> > 
> 
> Yes, I believe it's already consistent with SQL. However, the current 
> interpretation of NA (documentation) being treated as FALSE is not needed / 
> untrue, imho (Please see below).
>  
> > And it was correct for Frank to be mistaken.  
> > 
> 
> Yes, it seems like he was mistaken.
> > Maybe just some more documentation and examples needed then.
> > 
> 
> It'd be much more appropriate if the documentation reflects the role of 
> subsetting in data.table mimicking "subset" function (in order to be in line 
> with SQL) by dropping NA evaluated logicals. From a couple of posts before, 
> where I pasted the code where NAs are replaced to FALSE were not necessary as 
> `irows <- which(i)` makes clear that `which` is being used to get indices and 
> then subset, this fits perfectly well with the interpretation of NA in 
> data.table. 
> > Are you happy that DT[!(x==.)] and DT[x!=.] do treat NA inconsistently? :
> > http://stackoverflow.com/questions/16239153/dtx-and-dtx-treat-na-in-x-inconsistently
> > 
> 
>  Ha, I like the idea behind the use of () in evaluating expressions. It's 
> another nice layer towards simplicity in data.table. But I still think there 
> should not be an inconsistency in equivalent logical operations to provide 
> different results. If !(x== .) and x != . are indeed different, then I'd 
> suppose replacing `!` with a more appropriate name as it's much easier to get 
> confused otherwise. 
> 
> In essence, either !(x == .) must evaluate to (x != .) if the underlying 
> meaning of these are the same, or the `!` in `!(x==.)` must be replaced to 
> something that's more appropriate for what it's supposed to be. Personally, I 
> prefer the former. It would greatly tighten the structure and consistency.
> > "na.rm = TRUE/FALSE" sounds good to me.  I'd only considered nomatch before 
> > in the context of joins, not logical subsets.
> > 
> 
> Yes, I find this option would give more control in evaluating expressions 
> with ease in `i`, by providing both "subset" (default) and the typical 
> data.frame subsetting (na.rm = FALSE).
> 
> Best regards,
>  
> Arun
>

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] Follow-up on subsetting data.table with NAs

Reply via email to