> Thanks for the quick and patient response, as it was indeed my own fault.
Hardly your fault. The 1000 over-allocation check hasn't come up before and it isn't documented. > In the interest of debugging I had set options(warn=3) Ah, that makes sense. Although 3 is the same as 2 afaik; i.e., 2 or larger means warnings are turned into errors. > as well as > options(datatable.verbose=TRUE); setting warn=0 does indeed allow my code > to run to satisfactory completion. Great, glad the workaround works. I'll still look at downgrading or removing that warning. > > I have to stop ignoring the subtle (to me) messages R throws; in this > case, "Error in ... (converted from warning)". > > So a followup general R question is, if I use options(warn=0), my .Rout > contains a line like "There were 46 warnings (use warnings() to see > them)". If instead I wrap the := statements with suppressWarnings(), I > don't get that. Is there a way to suppress the "There were n warnings" > message? Oops, I meant oldwarn=options(warn=-1) (or any negative value according to ?options) before the loop, to ignore the warnings. Then after the loop setback to the old value: options(warn=oldwarn). > > -----Original Message----- > From: Matthew Dowle [mailto:[email protected]] > Sent: Wednesday, August 08, 2012 5:45 AM > To: Kaupas, George > Cc: [email protected] > Subject: Re: [datatable-help] dancing with alloc.col > > > Oh and since you're looping the := or set(), then options(warn=0) before > the loop is probably faster than repeated calls to suppressWarnings(). > >> >> :) >> >> When the column allocation is full, there's a formula to decide how >> much to grow the allocation by. The check is there (iirc) to make sure >> that's not growing the table too much. If you have 1 million columns, >> you probably don't want to double that to 2 million, just to add 1 >> column. But if you do, then use alloc.col first. That was the >> thinking. But that thinking is biting in your case. >> >> Simplest might be to downgrade the warning to a message when verbosity >> is on, then. >> >> In the meantime, does wrapping with suppressWarning() work around it >> for now? Since in your case you know that over-allocating by more than >> 1000 is appropriate. >> >> suppressWarnings(DT[,newcol:=]) >> >> Thanks for reporting. Interesting use case. >> >> Matthew >> >>> I'm running into this "truelength is greater than 1000 items >>> over-allocated" warning/error as I use := to add columns to a >>> data.frame, >>> e.g.: >>> >>> tl (1346) is greater than 1000 items over-allocated (ncol = 308). If >>> you didn't set the datatable.alloccol option very large, please >>> report this to datatable-help including the result of sessionInfo(). >>> >>> The long preamble to this is a stackoverflow thread >>> (http://stackoverflow.com/questions/10015544) in which I needed to >>> update the contents of one data.table with the contents of another. >>> >>> The solution required the columns of both data.tables to match, hence >>> my pre-processing loop to add columns to each data.table to satisfy >>> the >>> identical(names(dt1),names(dt2)) criteria. I may have to re-architect >>> this depending on what is going on with this allocation business. >>> >>> If, for example, dt1 has 200 columns, and dt2 has 2000, and together >>> they have 2100 unique columns, I'm going to add 1900 columns to dt1. >>> If I set alloc.col to 2100 before my column-adding loop, I'll get >>> slapped because >>> 2100 is more than 1000 greater than the 200 columns present in dt1. >>> >>> So do I need to spoon-feed alloc.col? Every iteration through the >>> loop set it to length(dt1)+1 before adding a column? That seems >>> rather brutal. >>> Alternatively checking for the delta between truelength and length, >>> and how close that is to the magic 1000 number, and then only >>> adjusting the setting seems fragile. >>> >>> I did try to make sense of the help for alloc.col. Regarding the bit >>> about "if two or more variables are bound to the same data.table"; >>> the column addition is within a function, and only one variable >>> references the data.table, at least in the scope of the function. The >>> function calling that function has a variable for the data.table too, >>> so I don't know if that counts. Then there is mention of using copy >>> (not sure how that helps, and BTW the hyperlink for copy goes to the >>> page for setkey, which does mention copy, but suggests "See ?copy" >>> which just conjures up the setkey page again), setting alloc.col, or >>> changing datatable.alloccol (doesn't seem to help). >>> >>> The warning asked for sessionInfo; FWIW, here it is: >>> >>> R version 2.15.0 (2012-03-30) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods >>> [7] base >>> >>> other attached packages: >>> [1] data.table_1.8.2 >>> >>> Thanks >>> George >>> >>> _______________________________________________ >>> datatable-help mailing list >>> [email protected] >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatabl >>> e-help >> >> > > > _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
