Re: [datatable-help] Follow-up on subsetting data.table with NAs

Arunkumar Srinivasan Mon, 10 Jun 2013 02:21:31 -0700

Matthew, 

> How about ~ instead of ! ?      I ruled out - previously to leave + and - 
> available for future use.  NJ() may be possible too.
Both "NJ()" and "~" are okay for me.


> That result makes perfect sense to me.   I don't think of !(x==.) being the 
> same as  x!=.    ! is simply a prefix.    It's all the rows that aren't 
> returned if the ! prefix wasn't there.
> > 
> 


I understand that `DT[!(x)]` does what `data.table` is designed to do 
currently. What I failed to mention was that if one were to consider 
implementing `!(x==.)` as the same as `x != .` then this behaviour has to be 
changed. Let's forget this point for a moment.

> That needs to be fixed.  But we're getting quite theoretical here and far 
> away from common use cases.  Why would we ever have row numbers of the table, 
> as a column of the table itself and want to select the rows by number not 
> mentioned in that column?

Probably I did not choose a good example. Suppose that I've a data.table and I 
want to get all rows where "x == 0". Let's say:

set.seed(45)
DT <- data.table( x = sample(c(0,5,10,15), 10, replace=TRUE), y = sample(15)) 

DF <- as.data.frame(DT)



To get all rows where x == 0, it could be done with DT[x == 0]. But it makes 
sense, at least in the context of data.frames, to do equivalently,

DF[!(DF$x), ] (or) DF[DF$x == 0, ]

All I want to say is, I expect `DT[!(x)]` should give the same result as `DT[x 
== 0]` (even though I fully understand it's not the intended behaviour of 
data.table), as it's more intuitive and less confusing. 

So, changing `!` to `~` or `NJ` is one half of the issue for me. The other is 
to replace the actual function of `!` in all contexts. I hope I came across 
with what I wanted to say, better this time.

Best,

Arun




On Monday, June 10, 2013 at 10:52 AM, Matthew Dowle wrote:

>  
> Hi,
> How about ~ instead of ! ?      I ruled out - previously to leave + and - 
> available for future use.  NJ() may be possible too.
> Matthew
>  
> On 10.06.2013 09:35, Arunkumar Srinivasan wrote:
> > Hi Matthew,
> > My view (from the last reply) more or less reflects mnel's comments here: 
> > http://stackoverflow.com/questions/16239153/dtx-and-dtx-treat-na-in-x-inconsistently#comment23317096_16240143
> >  
> > Pasted here for convenience:
> > data.table is mimicing subset in its handling of NA values in logical i 
> > arguments. -- the only issue is the ! prefix signifying a not-join, not the 
> > way one might expect. Perhaps the not join prefix could have been NJ not ! 
> > to avoid this confusion -- this might be another discussion to have on the 
> > mailing list -- (I think it is a discussion worth having) 
> > 
> > Arun 
> > 
> > On Monday, June 10, 2013 at 10:28 AM, Arunkumar Srinivasan wrote:
> > 
> > > > Hm, good point.  Is data.table consistent with SQL already, for both == 
> > > > and !=, and so no change needed?  
> > > > 
> > > > 
> > > 
> > > Yes, I believe it's already consistent with SQL. However, the current 
> > > interpretation of NA (documentation) being treated as FALSE is not needed 
> > > / untrue, imho (Please see below).
> > >  
> > > > And it was correct for Frank to be mistaken.  
> > > > 
> > > > 
> > > 
> > > Yes, it seems like he was mistaken.
> > > > Maybe just some more documentation and examples needed then.
> > > > 
> > > > 
> > > 
> > > It'd be much more appropriate if the documentation reflects the role of 
> > > subsetting in data.table mimicking "subset" function (in order to be in 
> > > line with SQL) by dropping NA evaluated logicals. From a couple of posts 
> > > before, where I pasted the code where NAs are replaced to FALSE were not 
> > > necessary as `irows <- which(i)` makes clear that `which` is being used 
> > > to get indices and then subset, this fits perfectly well with the 
> > > interpretation of NA in data.table. 
> > > > Are you happy that DT[!(x==.)] and DT[x!=.] do treat NA inconsistently? 
> > > > :
> > > > http://stackoverflow.com/questions/16239153/dtx-and-dtx-treat-na-in-x-inconsistently
> > > > 
> > > > 
> > > 
> > >  Ha, I like the idea behind the use of () in evaluating expressions. It's 
> > > another nice layer towards simplicity in data.table. But I still think 
> > > there should not be an inconsistency in equivalent logical operations to 
> > > provide different results. If !(x== .) and x != . are indeed different, 
> > > then I'd suppose replacing `!` with a more appropriate name as it's much 
> > > easier to get confused otherwise. 
> > > In essence, either !(x == .) must evaluate to (x != .) if the underlying 
> > > meaning of these are the same, or the `!` in `!(x==.)` must be replaced 
> > > to something that's more appropriate for what it's supposed to be. 
> > > Personally, I prefer the former. It would greatly tighten the structure 
> > > and consistency.
> > > > "na.rm = TRUE/FALSE" sounds good to me.  I'd only considered nomatch 
> > > > before in the context of joins, not logical subsets.
> > > > 
> > > > 
> > > 
> > > Yes, I find this option would give more control in evaluating expressions 
> > > with ease in `i`, by providing both "subset" (default) and the typical 
> > > data.frame subsetting (na.rm = FALSE).
> > > Best regards,
> > >  
> > > Arun
> > > 
> > > 
> > > 
> > 
> > 
> 
>  
>  
> 
>

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] Follow-up on subsetting data.table with NAs

Reply via email to