The problem with ~ is that it is using up a special character (of which there are only a few) for a case that does not occur much.
I can think of other things that ~ might be better used for. For example, perhaps ~ x could mean get(x). One aspect of data.table that tends to be difficult is when you don't know the variable name ahead of time and this woiuld give a way to specify it concisely. On Mon, Jun 10, 2013 at 5:21 AM, Arunkumar Srinivasan <[email protected]> wrote: > Matthew, > > How about ~ instead of ! ? I ruled out - previously to leave + and - > available for future use. NJ() may be possible too. > > Both "NJ()" and "~" are okay for me. > > That result makes perfect sense to me. I don't think of !(x==.) being the > same as x!=. ! is simply a prefix. It's all the rows that aren't > returned if the ! prefix wasn't there. > > I understand that `DT[!(x)]` does what `data.table` is designed to do > currently. What I failed to mention was that if one were to consider > implementing `!(x==.)` as the same as `x != .` then this behaviour has to be > changed. Let's forget this point for a moment. > > That needs to be fixed. But we're getting quite theoretical here and far > away from common use cases. Why would we ever have row numbers of the > table, as a column of the table itself and want to select the rows by number > not mentioned in that column? > > Probably I did not choose a good example. Suppose that I've a data.table and > I want to get all rows where "x == 0". Let's say: > > set.seed(45) > DT <- data.table( x = sample(c(0,5,10,15), 10, replace=TRUE), y = > sample(15)) > > DF <- as.data.frame(DT) > > To get all rows where x == 0, it could be done with DT[x == 0]. But it makes > sense, at least in the context of data.frames, to do equivalently, > > DF[!(DF$x), ] (or) DF[DF$x == 0, ] > > All I want to say is, I expect `DT[!(x)]` should give the same result as > `DT[x == 0]` (even though I fully understand it's not the intended behaviour > of data.table), as it's more intuitive and less confusing. > > So, changing `!` to `~` or `NJ` is one half of the issue for me. The other > is to replace the actual function of `!` in all contexts. I hope I came > across with what I wanted to say, better this time. > > Best, > > Arun > > > On Monday, June 10, 2013 at 10:52 AM, Matthew Dowle wrote: > > > > Hi, > > How about ~ instead of ! ? I ruled out - previously to leave + and - > available for future use. NJ() may be possible too. > > Matthew > > > > On 10.06.2013 09:35, Arunkumar Srinivasan wrote: > > Hi Matthew, > My view (from the last reply) more or less reflects mnel's comments here: > http://stackoverflow.com/questions/16239153/dtx-and-dtx-treat-na-in-x-inconsistently#comment23317096_16240143 > Pasted here for convenience: > data.table is mimicing subset in its handling of NA values in logical i > arguments. -- the only issue is the ! prefix signifying a not-join, not the > way one might expect. Perhaps the not join prefix could have been NJ not ! > to avoid this confusion -- this might be another discussion to have on the > mailing list -- (I think it is a discussion worth having) > > Arun > > On Monday, June 10, 2013 at 10:28 AM, Arunkumar Srinivasan wrote: > > Hm, good point. Is data.table consistent with SQL already, for both == and > !=, and so no change needed? > > Yes, I believe it's already consistent with SQL. However, the current > interpretation of NA (documentation) being treated as FALSE is not needed / > untrue, imho (Please see below). > > > And it was correct for Frank to be mistaken. > > Yes, it seems like he was mistaken. > > Maybe just some more documentation and examples needed then. > > It'd be much more appropriate if the documentation reflects the role of > subsetting in data.table mimicking "subset" function (in order to be in line > with SQL) by dropping NA evaluated logicals. From a couple of posts before, > where I pasted the code where NAs are replaced to FALSE were not necessary > as `irows <- which(i)` makes clear that `which` is being used to get indices > and then subset, this fits perfectly well with the interpretation of NA in > data.table. > > Are you happy that DT[!(x==.)] and DT[x!=.] do treat NA inconsistently? : > > http://stackoverflow.com/questions/16239153/dtx-and-dtx-treat-na-in-x-inconsistently > > Ha, I like the idea behind the use of () in evaluating expressions. It's > another nice layer towards simplicity in data.table. But I still think there > should not be an inconsistency in equivalent logical operations to provide > different results. If !(x== .) and x != . are indeed different, then I'd > suppose replacing `!` with a more appropriate name as it's much easier to > get confused otherwise. > In essence, either !(x == .) must evaluate to (x != .) if the underlying > meaning of these are the same, or the `!` in `!(x==.)` must be replaced to > something that's more appropriate for what it's supposed to be. Personally, > I prefer the former. It would greatly tighten the structure and consistency. > > "na.rm = TRUE/FALSE" sounds good to me. I'd only considered nomatch before > in the context of joins, not logical subsets. > > Yes, I find this option would give more control in evaluating expressions > with ease in `i`, by providing both "subset" (default) and the typical > data.frame subsetting (na.rm = FALSE). > Best regards, > > Arun > > > > > > > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
