On Mon, Sep 23, 2013 at 9:42 PM, Matthew Dowle <[email protected]>wrote:
> > Hi, > Basically adding columns by reference to a data.table when it's a member > of a list of data.table, is really difficult to handle internally. I had > to special case internally to get around list() copying, so that the > binding can change inside the list on the shallow copy when [[ is used. A > for loop is the way to add columns by reference inside a list of > data.table, and that should work ok using [[. But doing that via lapply > and mapply is really stretching it. > That makes sense. I took a whack at it, but couldn't even come close. > Even catching user expectations in this area is difficult. Ideally we'd > catch mapply, yes, but really data.table likes to be rbindlist()-ed and > then ops to work on a single large data.table. > Agreed. In the application where this came up, I am dealing with a list of tables with different dims (hence not rbinding) > We can advice to the warning message not to use mapply or lapply to add > columns by reference to a list of data.table (use a for loop instead) ? > Perhaps a warning that modifications to the DT's in the list are likely to not have stuck and to use rbindlist when possible? > > Matthew > > > > On 22/09/13 03:02, Ricardo Saporta wrote: > > Matthew, > > I did notice the warning, but something doesnt add up: > > If the issue is simply that it is being copied when created, then > wouldnt we expect the same warning to arise when we try to modify the table > in using `mapply` or `lapply`? (the latter does not produce a warning. > > If on the otherhand, the issue pertains specifically to mapply (which I > assume it does), then why is it only a problem when we iterate over the > list directly, whereas iterating indirectly by using an index does not > produce any warnings. > > While overall, this is minor if one is aware of the issue, I think it > might allow for unnoticed bugs to creep into someones code. Specifically > if using mapply to modify a list of DTs and the user not realizing that the > modifications are not being held. > > That being said, I'm not sure how this could even be addressed if the > root is in mapply, but is it worth trying to address? > > Rick > > > On Fri, Sep 20, 2013 at 2:18 PM, Matthew Dowle <[email protected]>wrote: > >> Does this sentence from the warning help? >> >> >> " Also, in R<v3.1.0, list(DT1,DT2) copied the entire DT1 and DT2 (R's >> list() used to copy named objects); please upgrade to R>=v3.1.0 if that is >> biting. " >> >> Matthew >> >> >> On 20/09/13 19:01, Ricardo Saporta wrote: >> >> One warning per DT in the list >> (I added the line breaks) >> -Rick >> ============================================= >> Warning messages: >> >> 1: In `[.data.table`(DT, , `:=`(c("Col3", "Col4"), list(C3, C4))) : >> >> Invalid .internal.selfref detected and fixed by taking a copy of the >> whole table so that := can add this new column by reference. At an earlier >> point, this data.table has been copied by R (or been created manually using >> structure() or similar). Avoid key<-, names<- and attr<- which in R >> currently (and oddly) may copy the whole data.table. Use set* syntax >> instead to avoid copying: ?set, ?setnames and ?setattr. Also, in R<v3.1.0, >> list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to copy named >> objects); please upgrade to R>=v3.1.0 if that is biting. If this message >> doesn't help, please report to datatable-help so the root cause can be >> fixed. >> >> 2: In `[.data.table`(DT, , `:=`(c("Col3", "Col4"), list(C3, C4))) : >> >> Invalid .internal.selfref detected and fixed by taking a copy of the >> whole table so that := can add this new column by reference. At an earlier >> point, this data.table has been copied by R (or been created manually using >> structure() or similar). Avoid key<-, names<- and attr<- which in R >> currently (and oddly) may copy the whole data.table. Use set* syntax >> instead to avoid copying: ?set, ?setnames and ?setattr. Also, in R<v3.1.0, >> list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to copy named >> objects); please upgrade to R>=v3.1.0 if that is biting. If this message >> doesn't help, please report to datatable-help so the root cause can be >> fixed. >> ============================================= >> >> >> >> >> On Fri, Sep 20, 2013 at 12:49 PM, Matthew Dowle >> <[email protected]>wrote: >> >>> >>> Hi, >>> >>> What's the warning? >>> >>> Matthew >>> >>> >>> >>> On 20/09/13 14:48, Ricardo Saporta wrote: >>> >>> I've encountered the following issue iterating over a list of >>> data.tables. >>> The issue is only with mapply, not with lapply . >>> >>> >>> Given a list of data.table's, mapply'ing over the list directly >>> cannot modify in place. >>> >>> Also if attempting to add a new column, we get an "Invalid >>> .internal.selfref" warning. >>> Modifying an existing column does not issue a warning, but still fails >>> to modify-in-place >>> >>> WORKAROUND: >>> ---------- >>> The workaround is to iterate over an index to the list, then to >>> modify each data.table via list.of.DTs[[i]][ .. ] >>> >>> **Interestingly, this issue occurs with `mapply`, but not `lapply`.** >>> >>> >>> EXAMPLE: >>> -------- >>> # Given a list of DT's and two lists of vectors, >>> # we want to add the corresponding vectors as columns to the DT. >>> >>> ## ---------------- ## >>> ## SAMPLE DATA: ## >>> ## ---------------- ## >>> # list of data.tables >>> list.DT <- list( >>> DT1=data.table(Col1=111:115, Col2=121:125), >>> DT2=data.table(Col1=211:215, Col2=221:225) >>> ) >>> >>> # lists of columns to add >>> list.Col3 <- list(131:135, 231:235) >>> list.Col4 <- list(141:145, 241:245) >>> >>> >>> ## ------------------------------------ ## >>> ## Iterating over the list elements ## >>> ## adding a new column ## >>> ## ------------------------------------ ## >>> ## Will issue warning and ## >>> ## will fail to modify in place ## >>> ## ------------------------------------ ## >>> mapply ( >>> function(DT, C3, C4) >>> DT[, c("Col3", "Col4") := list(C3, C4)], >>> >>> list.DT, # iterating over the list >>> list.Col3, list.Col4, >>> SIMPLIFY=FALSE >>> ) >>> >>> ## Note the lack of change >>> list.DT >>> >>> >>> ## ------------------------------------ ## >>> ## Iterating over an index ## >>> ## ------------------------------------ ## >>> mapply ( >>> function(i, C3, C4) >>> list.DT[[i]] [, c("Col3", "Col4") := list(C3, C4)], >>> >>> seq(list.DT), # iterating over an index to the list >>> list.Col3, list.Col4, >>> SIMPLIFY=FALSE >>> ) >>> >>> ## Note each DT _has_ been modified >>> list.DT >>> >>> ## ------------------------------------ ## >>> ## Iterating over the list elements ## >>> ## modifying existing column ## >>> ## ------------------------------------ ## >>> ## No warning issued, but ## >>> ## Will fail to modify in place ## >>> ## ------------------------------------ ## >>> mapply ( >>> function(DT, C3, C4) >>> DT[, c("Col3", "Col4") := list(Col3*1e3, Col4*1e4)], >>> >>> list.DT, # iterating over the list >>> list.Col3, list.Col4, >>> SIMPLIFY=FALSE >>> ) >>> >>> ## Note the lack of change (compare with output from `mapply`) >>> list.DT >>> >>> ## ------------------------------------ ## >>> ## ## >>> ## `lapply` works as expected. ## >>> ## ## >>> ## ------------------------------------ ## >>> >>> ## NOW WITH lapply >>> lapply(list.DT, >>> function(DT) >>> DT[, newCol := LETTERS[1:5]] >>> ) >>> >>> ## Note the new column: >>> list.DT >>> >>> >>> >>> # ========================== # >>> >>> ## NON-WORKAROUNDS ## >>> ## >>> ## I also tried all of the following alternatives >>> ## in hopes of being able to iterate over the list >>> ## directly, using `mapply`. >>> ## None of these worked. >>> >>> # (1) Creating the DTs First, then creating the list from them >>> DT1 <- data.table(Col1=111:115, Col2=121:125) >>> DT2 <- data.table(Col1=211:215, Col2=221:225) >>> >>> list.DT <- list(DT1=DT1,DT2=DT2 ) >>> >>> >>> # (2) Same as 1, and using `copy()` in the call to `list()` >>> list.DT <- list(DT1=copy(DT1), >>> DT2=copy(DT2) ) >>> >>> # (3) lapply'ing `copy` and then iterating over that list >>> list.DT <- lapply(list.DT, copy) >>> >>> # (4) Not naming the list elements >>> list.DT <- list(DT1, DT2) >>> # and tried >>> list.DT <- list(copy(DT1), copy(DT2)) >>> >>> ## All of the above still failed to modify in place >>> ## (and also issued the same warning if trying to add a column) >>> ## when iterating using mapply >>> >>> mapply(function(DT, C3, C4) >>> DT[, c("Col3", "Col4") := list(C3, C4)], >>> list.DT, list.Col3, list.Col4, >>> SIMPLIFY=FALSE) >>> >>> >>> # ========================== # >>> >>> >>> Ricardo Saporta >>> Rutgers University, New Jersey >>> e: [email protected] >>> >>> >>> >>> _______________________________________________ >>> datatable-help mailing >>> [email protected]https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >>> >>> >>> >> >> > > > _______________________________________________ > datatable-help mailing > [email protected]https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
