Hi. Ok, thanks. Btw, just to check you saw the new rbindlist() function then.
> Hi Matthew, > > Sorry for not filing earlier -- the behavior is not a major annoyance as > my > data.tables are rather small this time around. > > The reason I'm using data.tables in a list, though that might seem odd, is > I'm > harvesting quantities of external data files that I eventually want to > combine into > one data.table, but before I can rbind() everything, I'm running lots of > validation > and cleaning tasks on the harvested files using lapply() and some indexing > magic. The > combination of data.table() and lapply() makes the syntax /really > /efficient. > > I'm afraid I can't provide further input into a possible workaround as the > alternatives you listed below sound all good to me! Hopefully others on > the list can > contribute. > > Best, --Mel. > > > On 8/15/2012 4:30 AM, Matthew Dowle wrote: >> Hi, >> >> That's interesting, thanks. I'm delighted the warning came up and that >> no >> crash happened. This is just what .internal.selfref was designed to >> catch. >> >> list() itself appears to be copying its NAM(2)-ed inputs. If you run the >> following, you should see the pointer addresses show that. >> >> X=data.table(a=1:3) >> .Internal(inspect(X)) >> .Internal(inspect(list(X))) # list() copies X >> >> The problem isn't just the copy, but that when R does that copy it >> collapses the over-allocated vector of column vector pointers (that >> data.table carefully created) down to just the columns used. Causing := >> a >> problem if it's then asked to add a column by reference (no free slots). >> >> Three possible dev solutions spring to mind : >> >> 1. Try again to return data.table as NAM(0) not NAM(2) [there's already >> a >> FR for that]. Assuming that list() only copies NAM(2) inputs. >> >> 2. Add a new function to data.table (reflist()?) that doesn't copy >> data.table inputs but works the same as base::list otherwise. >> >> 3. Get even more fancy inside [.data.table to inspect its caller. If >> that's L[[i]] then update L's pointer to the (new) re-over-allocated >> column pointer vector. The copy by list() would still happen but at >> least >> the column would be added. The next add column by reference after that >> would then work without warning. >> >> Please file a bug report, with a link to this thread. That way you'll >> get >> automatic updates when the status changes. Option 2 is most likely. >> >> Is list() of data.table really needed? Could it be one data.table with >> an >> extra first column, or an environment of data.table's perhaps? >> >> The more significant problem is that a list column containing >> data.tables >> is likely copying all those data.tables, then. Regardless of whether or >> not := is then used to add a column by reference to those embedded >> tables. >> >> Matthew >> >> >>> Hello, >>> >>> I just noticed an odd behavior with lists of data.tables: >>> >>> dt1 <- data.table(a=1:3, b=4:6, c=7:9) >>> dt2 <- data.table(a=10:12, b=13:15, c=16:18) >>> >>> # Combine in a list >>> myList <- list(dt1, dt2) >>> >>> # Adding a new column to first data.table -- this doesn't work >>> myList[[1]][, d := 4:6] >>> # a b c d >>> # 1: 1 4 7 4 >>> # 2: 2 5 8 5 >>> # 3: 3 6 9 6 >>> # Warning message: >>> # In `[.data.table`(myList[[1]], , `:=`(d, 4:6)) : >>> # Invalid .internal.selfref detected and fixed by taking a copy of >>> the >>> whole table, >>> so that := can add this new column by reference. At an earlier point, >>> this >>> data.table >>> has been copied by R. Avoid key<-, names<- and attr<- which in R >>> currently >>> (and oddly) >>> all copy the whole data.table. Use set* syntax instead to avoid >>> copying: >>> setkey(), >>> setnames() and setattr(). If this message doesn't help, please report >>> to >>> datatable-help so the root cause can be fixed. >>> >>> myList[[1]] >>> # a b c >>> # 1: 1 4 7 >>> # 2: 2 5 8 >>> # 3: 3 6 9 >>> >>> # I need to reassign -- this works >>> myList[[1]] <- myList[[1]][, d := 4:6] >>> >>> myList[[1]] >>> # a b c d >>> # 1: 1 4 7 4 >>> # 2: 2 5 8 5 >>> # 3: 3 6 9 6 >>> >>> # But on the other hand this works no problem >>> setcolorder(myList[[1]], 4:1) >>> myList[[1]] >>> # d c b a >>> # 1: 4 7 4 1 >>> # 2: 5 8 5 2 >>> # 3: 6 9 6 3 >>> >>> Is this normal behavior, seems a bit odd to me? >>> >>> Here is my session: >>> >>> > sessionInfo() >>> R version 2.15.1 (2012-06-22) >>> Platform: x86_64-redhat-linux-gnu (64-bit) >>> >>> locale: >>> [1] C >>> >>> attached base packages: >>> [1] stats graphics utils datasets grDevices methods base >>> >>> other attached packages: >>> [1] foreign_0.8-50 RJDBC_0.2-0 DBI_0.2-5 >>> [4] XLConnect_0.2-0 XLConnectJars_0.2-0 rJava_0.9-3 >>> [7] data.table_1.8.2 rj_1.1.0-4 >>> >>> loaded via a namespace (and not attached): >>> [1] rj.gd_1.1.0-1 tools_2.15.1 >>> >>> >>> Thanks very much for this fantastic package! >>> >>> --Mel. >>> >>> Melanie BACOU >>> International Food Policy Research Institute >>> Agricultural Economist, HarvestChoice >>> E-mail [email protected] <mailto:[email protected]> >>> Visit harvestchoice.org <http://www.harvestchoice.org/> >>> >>> _______________________________________________ >>> datatable-help mailing list >>> [email protected] >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >>> >> > > _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
