Hi,

This isn't a bug really.   A documentation or too low default issue maybe.

When all spare slots are used up, there is no choice but to make a shallow copy and create a new vector of column pointer slots. This is the pointer (address in RAM) which any variable names (symbols) point to. When this happens, data.table does a reasonable job of changing the symbol in calling scope too, but within a function within a function it's tricky. In your function, x is actually being updated by reference, but in local scope when the shallow copy happens ... when the spare slots are used up.

By default :

datatable.alloccol = quote(max(100L,ncol(DT)+64L))

Some people just change this to be a much larger number. That's the easiest. Just over-allocate massively :

options(datatable.alloccol = 10000)

If you have under 50 tables, this won't matter a jot. If you have 1000's of tables, then the spare space could become significant.

Assuming 64bit, 10000 * 8bytes / 1024^2 = 78KB. Knowing this allows you to choose the appropriate amount of over-allocation for your case. 50 tables * 78KB = 4MB = e.g. 0.01% of 32GB

Or, if you know you are about to add a lot of columns by reference via a function, you can increase the over-allocation of one table using the alloc.col function :

alloc.col(DT, 200)

In case the example was actually close to the real example, you can add a lot of columns in one step and the LHS of := can be an expression :

DT[, paste0('a', 1:101) := 1] # add 101 columns named "a1", "a2" ... "a101", all set to 1

and set() may be an easier alternative to := in this case, now that it can add columns as from v1.8.11

If there is a real world example where it really needs to be wrapped in a function in a function then that would be needed to see (or an example closer to reality) to convince (me at least) that we need to do better here.

HTH,
Matt



On 14/12/13 13:10, Arunkumar Srinivasan wrote:
Hi Huashan,
Great reproducible example! Would you mind filing a bug report here <https://r-forge.r-project.org/tracker/?func=browse&group_id=240&atid=975>?
Thank you,
Arun

On Saturday, December 14, 2013 at 2:30 AM, Huashan Chen wrote:

I just found out that when the column quota are reached, adding new columns
within a function will fail.

Blow are the testing code:

testF2=function(x){
add_var<-function(varname){
x[, `:=`(eval(substitute(varname)), 1), with=F]
}
sapply(paste0('a', 1:101), add_var)
}

dd=data.table(a=1:3)
truelength(dd)
testF2(dd)
dim(dd) # only 100 columns

dd[, new:=3]
dim(dd) # adding new column outside a function is OK.



--
View this message in context: http://r.789695.n4.nabble.com/Fail-to-add-new-columns-within-a-function-tp4682173.html Sent from the datatable-help mailing list archive at Nabble.com <http://Nabble.com>.
_______________________________________________
datatable-help mailing list
[email protected] <mailto:[email protected]>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help



_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to