Dear Matthew, Thank you for the help. I think I could not find this solution alone.
Regards, Cedric -----Message d'origine----- De : Matthew Dowle [mailto:[email protected]] Envoyé : mercredi 8 février 2012 15:01 À : DUPREZ Cédric Cc : [email protected] Objet : Re: [datatable-help] Assignement by reference on a datatable subset Setting mult="first" helps here. data.table doesn't know whether keys are unique. When joining to all the columns of a key that you know is unique, setting mult="first" is faster (or mult="last" is the same). Also, when mult="first" (or "last"), that isn't considered by without by, and := then works. For example, > DT = data.table(a=1:2,b=1:4,key="a") > DT a b [1,] 1 1 [2,] 1 3 [3,] 2 2 [4,] 2 4 > DT[J(2),b:=5L] # one group is ok a b [1,] 1 1 [2,] 1 3 [3,] 2 5 [4,] 2 5 > DT[J(1:2),b:=6L] # two or more isn't implemented when mult="all" Error in `[.data.table`(DT, J(1:2), `:=`(b, 6L)) : combining bywithoutby with := in j is not yet implemented. > DT[J(1:2),b:=6L,mult="first"] # but, "first" works with := a b [1,] 1 6 [2,] 1 3 [3,] 2 6 [4,] 2 5 > Now we know that, > X = unique(DT[!is.na(val),list(id1,as.integer(val))]) > X id1 V2 [1,] n1 2 [2,] n1 7 [3,] n1 11 > DT[X,val:=id2,mult="first"] id1 id2 val [1,] n1 1 NA [2,] n1 2 2 [3,] n1 3 2 [4,] n1 4 2 [5,] n1 5 NA [6,] n1 6 NA [7,] n1 7 7 [8,] n1 8 7 [9,] n1 9 NA [10,] n1 10 NA [11,] n1 11 11 [12,] n1 12 11 [13,] n2 1 NA [14,] n2 2 NA [15,] n2 3 NA [16,] n2 4 NA (Thanks for the concise examples btw, helps a lot) > Dear all, > > I have a new question about data completion within a datatable. > > Having the following datatable: > DT <- data.table("id1" = c("n1", "n1", "n1", "n1", "n1", "n1", "n1", "n1", > "n1", "n1", "n1", "n1", "n2", "n2", "n2", "n2") > , 'id2'=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4) > , val=c(NA, NA, 2, 2, NA, NA, 7, 7, NA, NA, NA, 11, NA, NA, NA, NA) > , key = c("id1", "id2")) > > I get: > id1 id2 val > [1,] n1 1 NA > [2,] n1 2 NA > [3,] n1 3 2 > [4,] n1 4 2 > [5,] n1 5 NA > [6,] n1 6 NA > [7,] n1 7 7 > [8,] n1 8 7 > [9,] n1 9 NA > [10,] n1 10 NA > [11,] n1 11 NA > [12,] n1 12 11 > [13,] n2 1 NA > [14,] n2 2 NA > [15,] n2 3 NA > [16,] n2 4 NA > > The val column contains values of id2 per id1. > For each id2 referenced by a val value, I would like to complete its val > value if it is not the case, copying its id2. > In my example, the final datatable should look like this: > id1 id2 val > [1,] n1 1 NA > [2,] n1 2 2 > [3,] n1 3 2 > [4,] n1 4 2 > [5,] n1 5 NA > [6,] n1 6 NA > [7,] n1 7 7 > [8,] n1 8 7 > [9,] n1 9 NA > [10,] n1 10 NA > [11,] n1 11 11 > [12,] n1 12 11 > [13,] n2 1 NA > [14,] n2 2 NA > [15,] n2 3 NA > [16,] n2 4 NA > As you can see, val on lines 2 and 11 have been completed with the id2 > value. > > I tried like this: > DT2 <- unique(DT[!is.na(val), c("id1", "val"), with = F]) > DT2$id2 <- DT2$val > setkeyv(DT2, c("id1", "id2")) > DT[DT2, val:=val.1] > > But I get the following message: "combining bywithoutby with := in j is > not yet implemented." > > Here is the solution I finally found: > DT <- data.table("id1" = c("n1", "n1", "n1", "n1", "n1", "n1", "n1", "n1", > "n1", "n1", "n1", "n1", "n2", "n2", "n2", "n2"), 'id2'=c(1, 2, 3, 4, 5, 6, > 7, 8, 9, 10, 11, 12, 1, 2, 3, 4), val=c(NA, NA, 2, 2, NA, NA, 7, 7, NA, > NA, NA, 11, NA, NA, NA, NA), key = c("id1", "id2")) > noms <- names(DT) > cle <- key(DT) > DT2 <- unique(DT[!is.na(val), c("id1", "val"), with = F]) > DT2$id2 <- DT2$val > setkeyv(DT2, c("id1", "id2")) > X <- DT2[DT] > X[is.na(val.1), val.1:=val] > DT <- X[,list(id1, id2, val.1)] > setnames(DT, 3, "val") > setkeyv(DT, cle) > > Is there a faster way to complete my data? > > Thanks in advance for you help. > > Regards, > Cedric > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
