Hi,
This isn't a bug really. A documentation or too low default issue maybe.
When all spare slots are used up, there is no choice but to make a
shallow copy and create a new vector of column pointer slots. This is
the pointer (address in RAM) which any variable names (symbols) point
to. When this happens, data.table does a reasonable job of changing
the symbol in calling scope too, but within a function within a
function it's tricky. In your function, x is actually being updated by
reference, but in local scope when the shallow copy happens ... when the
spare slots are used up.
By default :
datatable.alloccol = quote(max(100L,ncol(DT)+64L))
Some people just change this to be a much larger number. That's the
easiest. Just over-allocate massively :
options(datatable.alloccol = 10000)
If you have under 50 tables, this won't matter a jot. If you have
1000's of tables, then the spare space could become significant.
Assuming 64bit, 10000 * 8bytes / 1024^2 = 78KB. Knowing this allows
you to choose the appropriate amount of over-allocation for your
case. 50 tables * 78KB = 4MB = e.g. 0.01% of 32GB
Or, if you know you are about to add a lot of columns by reference via
a function, you can increase the over-allocation of one table using the
alloc.col function :
alloc.col(DT, 200)
In case the example was actually close to the real example, you can add
a lot of columns in one step and the LHS of := can be an expression :
DT[, paste0('a', 1:101) := 1] # add 101 columns named "a1", "a2" ...
"a101", all set to 1
and set() may be an easier alternative to := in this case, now that it
can add columns as from v1.8.11
If there is a real world example where it really needs to be wrapped in
a function in a function then that would be needed to see (or an example
closer to reality) to convince (me at least) that we need to do better here.
HTH,
Matt
On 14/12/13 13:10, Arunkumar Srinivasan wrote:
Hi Huashan,
Great reproducible example! Would you mind filing a bug report here
<https://r-forge.r-project.org/tracker/?func=browse&group_id=240&atid=975>?
Thank you,
Arun
On Saturday, December 14, 2013 at 2:30 AM, Huashan Chen wrote:
I just found out that when the column quota are reached, adding new
columns
within a function will fail.
Blow are the testing code:
testF2=function(x){
add_var<-function(varname){
x[, `:=`(eval(substitute(varname)), 1), with=F]
}
sapply(paste0('a', 1:101), add_var)
}
dd=data.table(a=1:3)
truelength(dd)
testF2(dd)
dim(dd) # only 100 columns
dd[, new:=3]
dim(dd) # adding new column outside a function is OK.
--
View this message in context:
http://r.789695.n4.nabble.com/Fail-to-add-new-columns-within-a-function-tp4682173.html
Sent from the datatable-help mailing list archive at Nabble.com
<http://Nabble.com>.
_______________________________________________
datatable-help mailing list
[email protected]
<mailto:[email protected]>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help