> "Chris Neff" <[email protected]> wrote in message > news:caauy0rvusa6a-xa8cz64cpdkrbvaqsgztdzgkp3uzfm+ggn...@mail.gmail.com... > I think I understand the difference between DT$y <- TRUE and DT$y <- > rnorm(10) now, and that is that the first is just an atomic element > while the second is a vector. If instead I had did > DT$y <- 0.5 Clarification in other reply.
> I get a coercion warning. I suppose this is an okay feature if the > warning consistently shows up whenever coercion happens (even if it > coerces successfully with no loss of precision). It is just different > than data.frame and without the warning I didn't understand the logic. One coercion warning is missing. Will add. > I can see the speedup of not coercing, but from a users point of view > I expect DT$y <- 0.5 and DT$y <- rep(0.5, nrow(DT)) to behave > identically thanks to the magic of vector recycling. Same clarification in other reply. > Similarly if I've decided to do DT$y[4] <- .2 and before y was an > integer, I've clearly changed my mind about that aspect and want y to > be a numeric. I don't think that's clear at all. The most common case is DT$y[4] <- 1 In that case user has forgotten his "L", and all of a sudden his carefully chosen integer column gets coerced to double (automatically, silently, and slowly). You shouldn't change your mind on large data. Get the types right up front and stick to them. If you do change your mind, then it's made (ok I have deliberately made it) harder for you to change the type (which is the correct emphasis I think); i.e. explicity change your mind by creating a new large vector of the type you want and use := to "replace" the whole column. Clearer for the reader of your code that way (rather than a silent automatic column type change just because you forgot L). > However, I'm able to work with the way it is as long as I'm warned > about it. I can see this making terribly confusing bugs for people I'd say they're being hidden from what's actually happening at the moment in data.frame, and they need to get their types correct up front. Obviously data.frame could never be changed in this regard because too much code depends on those coercion choices. Happy to be wrong, but lets get the behaviour of := correct, now. Which is why all your feedback has been so great so quickly! I don't think I'm wrong, yet. > if they don't get a warning. Agreed. Yes the coercion to logical warning is missing. I'll make it like the coercion to integer warning. Also some documentation would help, wouldn't it ;) > -Chris On 4 August 2011 09:09, Chris Neff <[email protected]> wrote: > I've ran the following 3 different times in new sessions: > > install.packages("data.table", > repos="http://R-Forge.R-project.org",type="source") > > and still DT[,z:=5] does nothing. Is there something I check to make > sure that the latest version is loaded? > > > As for the coercion stuff, I feel that it feels somewhat inconsistent > right now. For instance: > >> DT <- data.table(x=1:10, y=1:10) > >> DT$y <- TRUE > >> sapply(DT, class) > > x y > "integer" "integer" > >> DT$y <- rnorm(10) >> sapply(DT, class) > x y > "integer" "numeric" > > So in the first case y silently coerces the logical to an integer > without warning, but in the second case y happily turns into a numeric > when need be. Why the difference? > > When I do something like DT$y <- foo, I expect that y should turn into > foo regardless of what y was before. If there is some reason why DT[, > y:=foo] should be different than DT$y <- foo, that is a secondary > matter, but I get mightily confused when DT$y <- foo doesn't behave > like data.frame. > > On 4 August 2011 08:50, Matthew Dowle <[email protected]> wrote: >> Still doesn't seem to be latest version: DT[,z:=5] should add column (and >> that's tested). >> Otherwise correct and intended behaviour (although an informative warning >> needs adding when 5 gets coerced to type of column (i.e. logical) - >> thanks >> for spotting). Remember as.logical(5) is TRUE without warning. So, try >> creating column with NA_integer_ or NA_real_ instead. Once the column >> type >> is set, that's it. Columns aren't coerced to match type of RHS, unlike >> data.frame [which if you think about it is a big hit if the data is >> large]. >> >> "Chris Neff" <[email protected]> wrote in message >> news:CAAuY0RXT7q+cm91PJ8KGkMwDApwFxM_EALb-Yu=p6ndp+le...@mail.gmail.com... >> Ignore this second one, restarting and refreshing my data.table >> install now gives the proper error message when I try that. Sorry I'm >> not used to being on the bleeding edge of these things and I forget to >> update. However the first question is still mainly relevant: >> >>> DT <- data.table(x=1:10, y=rep(1:2,5)) >>> DT[,z:=5] >> x y >> [1,] 1 1 >> [2,] 2 2 >> [3,] 3 1 >> [4,] 4 2 >> [5,] 5 1 >> [6,] 6 2 >> [7,] 7 1 >> [8,] 8 2 >> [9,] 9 1 >> [10,] 10 2 >>> DT[1:nrow(DT),z:=5] >> Error in `[.data.table`(DT, 1:nrow(DT), `:=`(z, 5)) : >> Attempt to add new column(s) and set subset of rows at the same >> time. Create the new column(s) first, and then you'll be able to >> assign to a subset. If i is set to 1:nrow(x) then please remove that >> (no need, it's faster without). >>> DT$z <- NA >>> DT[, z:=5] >> x y z >> [1,] 1 1 TRUE >> [2,] 2 2 TRUE >> [3,] 3 1 TRUE >> [4,] 4 2 TRUE >> [5,] 5 1 TRUE >> [6,] 6 2 TRUE >> [7,] 7 1 TRUE >> [8,] 8 2 TRUE >> [9,] 9 1 TRUE >> [10,] 10 2 TRUE >> >> >> >> The return on DT[,z:=5] when I haven't initialized DT$z yet is >> different, but still more uninformative than it is when I do >> DT[1:nrow(DT), z:=5]. And the DT$z <- NA issue is still there. >> >> Thanks! >> >> >> On 4 August 2011 08:18, Chris Neff <[email protected]> wrote: >>> A second question while I'm playing with it. It seems from the FRs >>> that it doesn't support multiple := in one select, but: >>> >>> DT <- data.table(x=1:10, y=rep(1:2,10)) >>> DT$a = 0 >>> DT$z = 0 >>> >>> DT[, list(a := y/sum(y), z := 5)] >>> >>> works just fine for me. An error gets thrown but afterwards the >>> columns are modified as intended. Why the error? >>> >>>> DT[,list(z:=5,a:=y/sum(y))] >>> z >>> [1] 5 >>> [1] TRUE >>> a >>> y/sum(y) >>> [1] TRUE >>> Error in data.table(`:=`(z, 5), `:=`(a, y/sum(y))) : >>> column or argument 1 is NULL >>>> DT >>> x y z a >>> [1,] 1 1 5 0.06666667 >>> [2,] 2 2 5 0.13333333 >>> [3,] 3 1 5 0.06666667 >>> [4,] 4 2 5 0.13333333 >>> [5,] 5 1 5 0.06666667 >>> [6,] 6 2 5 0.13333333 >>> [7,] 7 1 5 0.06666667 >>> [8,] 8 2 5 0.13333333 >>> [9,] 9 1 5 0.06666667 >>> [10,] 10 2 5 0.13333333 >>> >>> -Chris >>> >>> On 4 August 2011 08:12, Chris Neff <[email protected]> wrote: >>>> Hi all, >>>> >>>> If I do: >>>> >>>> DT <- data.table(x=1:10, y=rep(1:2,5)) >>>> >>>> Then try the following >>>> >>>> DT[, z:=5] >>>> >>>> I get: >>>> >>>>> DT[, z:=5] >>>> z >>>> [1] 5 >>>> [1] TRUE >>>> NULL >>>> >>>> and if I were to do DT <- DT[, z:=5], then DT gets set to NULL. >>>> Alternatively if I do >>>> >>>> DT[1:10, z:=5] >>>> >>>> I get >>>> >>>>> DT=DT[1:nrow(DT),z:=5] >>>> z >>>> [1] 5 >>>> [1] 1 2 3 4 5 6 7 8 9 10 >>>> Error in `:=`(z, 5) : >>>> Attempt to add new column(s) and set subset of rows at the same >>>> time. Create the new column(s) first, and then you'll be able to >>>> assign to a subset. If i is set to 1:nrow(x) then please remove that >>>> (no need, it's faster without). >>>> >>>> >>>> Which is more informative. So I do as it instructs: >>>> >>>> DT$z <- NA >>>> >>>> DT[, z:=5] >>>> >>>> And as output I get: >>>> >>>>> DT >>>> x y z >>>> [1,] 1 1 TRUE >>>> [2,] 2 2 TRUE >>>> [3,] 3 1 TRUE >>>> [4,] 4 2 TRUE >>>> [5,] 5 1 TRUE >>>> [6,] 6 2 TRUE >>>> [7,] 7 1 TRUE >>>> [8,] 8 2 TRUE >>>> [9,] 9 1 TRUE >>>> [10,] 10 2 TRUE >>>> >>>> >>>> Why isn't z 5 like assigned? I think it is because I assigned it as >>>> NA, and data table didn't know to change it to integer (although why >>>> it changed it to logical is another puzzle). If I instead do >>>> >>>> DT$z <- 0 >>>> >>>> DT[, z:=5] >>>> >>>> It works fine. >>>> >>>> So my two points are: >>>> >>>> A) Doing DT[,z:=5] should be as informative as doing DT[1:nrow(DT), >>>> z:=5] with the error message. >>>> >>>> B) What went wrong with the NA assignment I did? >>>> >>>> Thanks! >>>> Chris >>>> >>> >> >> >> >> _______________________________________________ >> datatable-help mailing list >> [email protected] >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >> > _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
