On 4 August 2011 10:18, Matthew Dowle <[email protected]> wrote: > >> "Chris Neff" <[email protected]> wrote in message >> news:caauy0rvusa6a-xa8cz64cpdkrbvaqsgztdzgkp3uzfm+ggn...@mail.gmail.com... >> I think I understand the difference between DT$y <- TRUE and DT$y <- >> rnorm(10) now, and that is that the first is just an atomic element >> while the second is a vector. If instead I had did >> DT$y <- 0.5 > Clarification in other reply. > >> I get a coercion warning. I suppose this is an okay feature if the >> warning consistently shows up whenever coercion happens (even if it >> coerces successfully with no loss of precision). It is just different >> than data.frame and without the warning I didn't understand the logic. > > One coercion warning is missing. Will add. > >> I can see the speedup of not coercing, but from a users point of view >> I expect DT$y <- 0.5 and DT$y <- rep(0.5, nrow(DT)) to behave >> identically thanks to the magic of vector recycling. > > Same clarification in other reply. > >> Similarly if I've decided to do DT$y[4] <- .2 and before y was an >> integer, I've clearly changed my mind about that aspect and want y to >> be a numeric. > > I don't think that's clear at all. The most common case is DT$y[4] <- 1 > In that case user has forgotten his "L", and all of a sudden his carefully > chosen integer column gets coerced to double (automatically, silently, > and slowly). You shouldn't change your mind on large data. Get the types > right up > front and stick to them. If you do change your mind, then it's made (ok I > have deliberately made it) > harder for you to change the type (which is the correct emphasis I > think); i.e. explicity change your mind by creating a new large vector > of the type you want and use := to "replace" the whole column. Clearer > for the reader of your code that way (rather than a silent automatic > column type change just because you forgot L). > >> However, I'm able to work with the way it is as long as I'm warned >> about it. I can see this making terribly confusing bugs for people > > I'd say they're being hidden from what's actually happening at the moment in > data.frame, > and they need to get their types correct up front. Obviously data.frame > could never > be changed in this regard because too much code depends on those coercion > choices. Happy > to be wrong, but lets get the behaviour of := correct, now. Which is why > all your > feedback has been so great so quickly! I don't think I'm wrong, yet.
Okay I think I do agree. It does make sense and I think I've just grown accustomed to doing bad things in R that it lets me get away with without telling me how bad it is. So as of now I agree it is working as intended (with the warning added :) ). >> if they don't get a warning. > > Agreed. Yes the coercion to logical warning is missing. I'll make it like > the coercion > to integer warning. Also some documentation would help, wouldn't it ;) > >> -Chris > > On 4 August 2011 09:09, Chris Neff <[email protected]> wrote: >> I've ran the following 3 different times in new sessions: >> >> install.packages("data.table", >> repos="http://R-Forge.R-project.org",type="source") >> >> and still DT[,z:=5] does nothing. Is there something I check to make >> sure that the latest version is loaded? >> >> >> As for the coercion stuff, I feel that it feels somewhat inconsistent >> right now. For instance: >> >>> DT <- data.table(x=1:10, y=1:10) >> >>> DT$y <- TRUE >> >>> sapply(DT, class) >> >> x y >> "integer" "integer" >> >>> DT$y <- rnorm(10) >>> sapply(DT, class) >> x y >> "integer" "numeric" >> >> So in the first case y silently coerces the logical to an integer >> without warning, but in the second case y happily turns into a numeric >> when need be. Why the difference? >> >> When I do something like DT$y <- foo, I expect that y should turn into >> foo regardless of what y was before. If there is some reason why DT[, >> y:=foo] should be different than DT$y <- foo, that is a secondary >> matter, but I get mightily confused when DT$y <- foo doesn't behave >> like data.frame. >> >> On 4 August 2011 08:50, Matthew Dowle <[email protected]> wrote: >>> Still doesn't seem to be latest version: DT[,z:=5] should add column (and >>> that's tested). >>> Otherwise correct and intended behaviour (although an informative warning >>> needs adding when 5 gets coerced to type of column (i.e. logical) - >>> thanks >>> for spotting). Remember as.logical(5) is TRUE without warning. So, try >>> creating column with NA_integer_ or NA_real_ instead. Once the column >>> type >>> is set, that's it. Columns aren't coerced to match type of RHS, unlike >>> data.frame [which if you think about it is a big hit if the data is >>> large]. >>> >>> "Chris Neff" <[email protected]> wrote in message >>> news:CAAuY0RXT7q+cm91PJ8KGkMwDApwFxM_EALb-Yu=p6ndp+le...@mail.gmail.com... >>> Ignore this second one, restarting and refreshing my data.table >>> install now gives the proper error message when I try that. Sorry I'm >>> not used to being on the bleeding edge of these things and I forget to >>> update. However the first question is still mainly relevant: >>> >>>> DT <- data.table(x=1:10, y=rep(1:2,5)) >>>> DT[,z:=5] >>> x y >>> [1,] 1 1 >>> [2,] 2 2 >>> [3,] 3 1 >>> [4,] 4 2 >>> [5,] 5 1 >>> [6,] 6 2 >>> [7,] 7 1 >>> [8,] 8 2 >>> [9,] 9 1 >>> [10,] 10 2 >>>> DT[1:nrow(DT),z:=5] >>> Error in `[.data.table`(DT, 1:nrow(DT), `:=`(z, 5)) : >>> Attempt to add new column(s) and set subset of rows at the same >>> time. Create the new column(s) first, and then you'll be able to >>> assign to a subset. If i is set to 1:nrow(x) then please remove that >>> (no need, it's faster without). >>>> DT$z <- NA >>>> DT[, z:=5] >>> x y z >>> [1,] 1 1 TRUE >>> [2,] 2 2 TRUE >>> [3,] 3 1 TRUE >>> [4,] 4 2 TRUE >>> [5,] 5 1 TRUE >>> [6,] 6 2 TRUE >>> [7,] 7 1 TRUE >>> [8,] 8 2 TRUE >>> [9,] 9 1 TRUE >>> [10,] 10 2 TRUE >>> >>> >>> >>> The return on DT[,z:=5] when I haven't initialized DT$z yet is >>> different, but still more uninformative than it is when I do >>> DT[1:nrow(DT), z:=5]. And the DT$z <- NA issue is still there. >>> >>> Thanks! >>> >>> >>> On 4 August 2011 08:18, Chris Neff <[email protected]> wrote: >>>> A second question while I'm playing with it. It seems from the FRs >>>> that it doesn't support multiple := in one select, but: >>>> >>>> DT <- data.table(x=1:10, y=rep(1:2,10)) >>>> DT$a = 0 >>>> DT$z = 0 >>>> >>>> DT[, list(a := y/sum(y), z := 5)] >>>> >>>> works just fine for me. An error gets thrown but afterwards the >>>> columns are modified as intended. Why the error? >>>> >>>>> DT[,list(z:=5,a:=y/sum(y))] >>>> z >>>> [1] 5 >>>> [1] TRUE >>>> a >>>> y/sum(y) >>>> [1] TRUE >>>> Error in data.table(`:=`(z, 5), `:=`(a, y/sum(y))) : >>>> column or argument 1 is NULL >>>>> DT >>>> x y z a >>>> [1,] 1 1 5 0.06666667 >>>> [2,] 2 2 5 0.13333333 >>>> [3,] 3 1 5 0.06666667 >>>> [4,] 4 2 5 0.13333333 >>>> [5,] 5 1 5 0.06666667 >>>> [6,] 6 2 5 0.13333333 >>>> [7,] 7 1 5 0.06666667 >>>> [8,] 8 2 5 0.13333333 >>>> [9,] 9 1 5 0.06666667 >>>> [10,] 10 2 5 0.13333333 >>>> >>>> -Chris >>>> >>>> On 4 August 2011 08:12, Chris Neff <[email protected]> wrote: >>>>> Hi all, >>>>> >>>>> If I do: >>>>> >>>>> DT <- data.table(x=1:10, y=rep(1:2,5)) >>>>> >>>>> Then try the following >>>>> >>>>> DT[, z:=5] >>>>> >>>>> I get: >>>>> >>>>>> DT[, z:=5] >>>>> z >>>>> [1] 5 >>>>> [1] TRUE >>>>> NULL >>>>> >>>>> and if I were to do DT <- DT[, z:=5], then DT gets set to NULL. >>>>> Alternatively if I do >>>>> >>>>> DT[1:10, z:=5] >>>>> >>>>> I get >>>>> >>>>>> DT=DT[1:nrow(DT),z:=5] >>>>> z >>>>> [1] 5 >>>>> [1] 1 2 3 4 5 6 7 8 9 10 >>>>> Error in `:=`(z, 5) : >>>>> Attempt to add new column(s) and set subset of rows at the same >>>>> time. Create the new column(s) first, and then you'll be able to >>>>> assign to a subset. If i is set to 1:nrow(x) then please remove that >>>>> (no need, it's faster without). >>>>> >>>>> >>>>> Which is more informative. So I do as it instructs: >>>>> >>>>> DT$z <- NA >>>>> >>>>> DT[, z:=5] >>>>> >>>>> And as output I get: >>>>> >>>>>> DT >>>>> x y z >>>>> [1,] 1 1 TRUE >>>>> [2,] 2 2 TRUE >>>>> [3,] 3 1 TRUE >>>>> [4,] 4 2 TRUE >>>>> [5,] 5 1 TRUE >>>>> [6,] 6 2 TRUE >>>>> [7,] 7 1 TRUE >>>>> [8,] 8 2 TRUE >>>>> [9,] 9 1 TRUE >>>>> [10,] 10 2 TRUE >>>>> >>>>> >>>>> Why isn't z 5 like assigned? I think it is because I assigned it as >>>>> NA, and data table didn't know to change it to integer (although why >>>>> it changed it to logical is another puzzle). If I instead do >>>>> >>>>> DT$z <- 0 >>>>> >>>>> DT[, z:=5] >>>>> >>>>> It works fine. >>>>> >>>>> So my two points are: >>>>> >>>>> A) Doing DT[,z:=5] should be as informative as doing DT[1:nrow(DT), >>>>> z:=5] with the error message. >>>>> >>>>> B) What went wrong with the NA assignment I did? >>>>> >>>>> Thanks! >>>>> Chris >>>>> >>>> >>> >>> >>> >>> _______________________________________________ >>> datatable-help mailing list >>> [email protected] >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >>> >> > > > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
