Btw, since we're on the topic of join/not-join syntax does this break others' expectations or is it just me?
> dt = data.table(x = c(1,2,3)) > setkey(dt,x) > dt[J(1)] x 1: 1 > dt[!J(1)] x 1: 2 2: 3 *> dt[(!J(1))]* *Error in eval(expr, envir, enclos) : could not find function "J"* *> dt[(J(1))] * *Error in eval(expr, envir, enclos) : could not find function "J"* I understand why this happens internally, because the function "()" is read as the head of the expression tree, but it's still pretty weird. On Mon, Jun 10, 2013 at 9:55 AM, Frank Erickson <[email protected]> wrote: > I prefer ~ and/or NJ() over -. The not-join operation is different from > the subsetting operation usually associated with -. > > I don't know what characters are available for this sort of thing, but @x, > @(x,y) seems natural enough as syntax for a getter. > > > On Mon, Jun 10, 2013 at 9:35 AM, Matthew Dowle <[email protected]>wrote: > >> >> Hm, another good point. We need ~ for formulae, although I can't >> imagine a formula in i (only in j). But in both i and j we might want to >> get(x). >> >> I thought about ^ i.e. X[^Y] in the spirit of regular expression syntax, >> but ^ doesn't parse with a RHS only. Needs to be parsable as a prefix. >> >> - maybe then? Consistent with - meaning in R. I don't think I actually >> had a specific use in mind for - and +, to reserve them for, but at the >> time it just seemed a shame to use up one of -/+ without defining the >> other. If - does a not join, then, might + be more like merge() (i.e. >> returning the union of the rows in x and i by join). I think I had >> something like that in mind, but hadn't thought it through. >> >> Some might say it should be a new argument e.g. notjoin=TRUE, but my >> thinking there is readability, since we often have many lines in i, j and >> by in that order, and if the "notjoin=TRUE" followed afterwards it would be >> far away from the i argument to which it applies. If we incorporate >> merge() into X[Y] using X[+Y] then it might avoid adding yet more >> parameters, too. >> >> >> >> On 10.06.2013 15:02, Gabor Grothendieck wrote: >> >>> The problem with ~ is that it is using up a special character (of >>> which there are only a few) for a case that does not occur much. >>> >>> I can think of other things that ~ might be better used for. For >>> example, perhaps ~ x could mean get(x). One aspect of data.table that >>> tends to be difficult is when you don't know the variable name ahead >>> of time and this woiuld give a way to specify it concisely. >>> >>> On Mon, Jun 10, 2013 at 5:21 AM, Arunkumar Srinivasan >>> <[email protected]> wrote: >>> >>>> Matthew, >>>> >>>> How about ~ instead of ! ? I ruled out - previously to leave + and >>>> - >>>> available for future use. NJ() may be possible too. >>>> >>>> Both "NJ()" and "~" are okay for me. >>>> >>>> That result makes perfect sense to me. I don't think of !(x==.) being >>>> the >>>> same as x!=. ! is simply a prefix. It's all the rows that aren't >>>> returned if the ! prefix wasn't there. >>>> >>>> I understand that `DT[!(x)]` does what `data.table` is designed to do >>>> currently. What I failed to mention was that if one were to consider >>>> implementing `!(x==.)` as the same as `x != .` then this behaviour has >>>> to be >>>> changed. Let's forget this point for a moment. >>>> >>>> That needs to be fixed. But we're getting quite theoretical here and >>>> far >>>> away from common use cases. Why would we ever have row numbers of the >>>> table, as a column of the table itself and want to select the rows by >>>> number >>>> not mentioned in that column? >>>> >>>> Probably I did not choose a good example. Suppose that I've a >>>> data.table and >>>> I want to get all rows where "x == 0". Let's say: >>>> >>>> set.seed(45) >>>> DT <- data.table( x = sample(c(0,5,10,15), 10, replace=TRUE), y = >>>> sample(15)) >>>> >>>> DF <- as.data.frame(DT) >>>> >>>> To get all rows where x == 0, it could be done with DT[x == 0]. But it >>>> makes >>>> sense, at least in the context of data.frames, to do equivalently, >>>> >>>> DF[!(DF$x), ] (or) DF[DF$x == 0, ] >>>> >>>> All I want to say is, I expect `DT[!(x)]` should give the same result as >>>> `DT[x == 0]` (even though I fully understand it's not the intended >>>> behaviour >>>> of data.table), as it's more intuitive and less confusing. >>>> >>>> So, changing `!` to `~` or `NJ` is one half of the issue for me. The >>>> other >>>> is to replace the actual function of `!` in all contexts. I hope I came >>>> across with what I wanted to say, better this time. >>>> >>>> Best, >>>> >>>> Arun >>>> >>>> >>>> On Monday, June 10, 2013 at 10:52 AM, Matthew Dowle wrote: >>>> >>>> >>>> >>>> Hi, >>>> >>>> How about ~ instead of ! ? I ruled out - previously to leave + and >>>> - >>>> available for future use. NJ() may be possible too. >>>> >>>> Matthew >>>> >>>> >>>> >>>> On 10.06.2013 09:35, Arunkumar Srinivasan wrote: >>>> >>>> Hi Matthew, >>>> My view (from the last reply) more or less reflects mnel's comments >>>> here: >>>> >>>> http://stackoverflow.com/**questions/16239153/dtx-and-** >>>> dtx-treat-na-in-x-**inconsistently#**comment23317096_16240143<http://stackoverflow.com/questions/16239153/dtx-and-dtx-treat-na-in-x-inconsistently#comment23317096_16240143> >>>> Pasted here for convenience: >>>> data.table is mimicing subset in its handling of NA values in logical i >>>> arguments. -- the only issue is the ! prefix signifying a not-join, not >>>> the >>>> way one might expect. Perhaps the not join prefix could have been NJ >>>> not ! >>>> to avoid this confusion -- this might be another discussion to have on >>>> the >>>> mailing list -- (I think it is a discussion worth having) >>>> >>>> Arun >>>> >>>> On Monday, June 10, 2013 at 10:28 AM, Arunkumar Srinivasan wrote: >>>> >>>> Hm, good point. Is data.table consistent with SQL already, for both == >>>> and >>>> !=, and so no change needed? >>>> >>>> Yes, I believe it's already consistent with SQL. However, the current >>>> interpretation of NA (documentation) being treated as FALSE is not >>>> needed / >>>> untrue, imho (Please see below). >>>> >>>> >>>> And it was correct for Frank to be mistaken. >>>> >>>> Yes, it seems like he was mistaken. >>>> >>>> Maybe just some more documentation and examples needed then. >>>> >>>> It'd be much more appropriate if the documentation reflects the role of >>>> subsetting in data.table mimicking "subset" function (in order to be in >>>> line >>>> with SQL) by dropping NA evaluated logicals. From a couple of posts >>>> before, >>>> where I pasted the code where NAs are replaced to FALSE were not >>>> necessary >>>> as `irows <- which(i)` makes clear that `which` is being used to get >>>> indices >>>> and then subset, this fits perfectly well with the interpretation of NA >>>> in >>>> data.table. >>>> >>>> Are you happy that DT[!(x==.)] and DT[x!=.] do treat NA inconsistently? >>>> : >>>> >>>> >>>> http://stackoverflow.com/**questions/16239153/dtx-and-** >>>> dtx-treat-na-in-x-**inconsistently<http://stackoverflow.com/questions/16239153/dtx-and-dtx-treat-na-in-x-inconsistently> >>>> >>>> Ha, I like the idea behind the use of () in evaluating expressions. >>>> It's >>>> another nice layer towards simplicity in data.table. But I still think >>>> there >>>> should not be an inconsistency in equivalent logical operations to >>>> provide >>>> different results. If !(x== .) and x != . are indeed different, then I'd >>>> suppose replacing `!` with a more appropriate name as it's much easier >>>> to >>>> get confused otherwise. >>>> In essence, either !(x == .) must evaluate to (x != .) if the underlying >>>> meaning of these are the same, or the `!` in `!(x==.)` must be replaced >>>> to >>>> something that's more appropriate for what it's supposed to be. >>>> Personally, >>>> I prefer the former. It would greatly tighten the structure and >>>> consistency. >>>> >>>> "na.rm = TRUE/FALSE" sounds good to me. I'd only considered nomatch >>>> before >>>> in the context of joins, not logical subsets. >>>> >>>> Yes, I find this option would give more control in evaluating >>>> expressions >>>> with ease in `i`, by providing both "subset" (default) and the typical >>>> data.frame subsetting (na.rm = FALSE). >>>> Best regards, >>>> >>>> Arun >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ______________________________**_________________ >>>> datatable-help mailing list >>>> [email protected].**r-project.org<[email protected]> >>>> >>>> https://lists.r-forge.r-**project.org/cgi-bin/mailman/** >>>> listinfo/datatable-help<https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help> >>>> >>> >> ______________________________**_________________ >> datatable-help mailing list >> [email protected].**r-project.org<[email protected]> >> https://lists.r-forge.r-**project.org/cgi-bin/mailman/** >> listinfo/datatable-help<https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help> >> > > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
