Now fixed in v1.9.3 :

o The warning "internal TRUE value has been modified" with recently released R 3.1 when grouping a table containing a logical column and where all groups are just 1 row is now fixed and tests added. Thanks to James Sams for the reproducible example. The warning is issued by R and we have asked if it can be upgraded to error.

Matt


On 01/05/14 16:29, Matt Dowle wrote:

Reproduced, thanks for nice example. Not sure yet but what R 3.1 now does is store length 1 logical vectors once only, globally, for efficiency to avoid many new allocations for the common case of single TRUE or FALSE values passed around at C or R level (a nice and welcome change). Since data.table modifies vectors by reference, if that vector is length 1 a new data.table bug as from R 3.1 could be modifying R's internal value of TRUE or FALSE whenever length 1 logical vectors occur. Clearly a serious bug. The test suite immediately broke the day after the R-devel change was made (good) and was one reason data.table was in error state in CRAN checks for quite a while before R 3.1 shipped. It was typically tests of 1-row data.table's including a logical column and modifying that logical column that broke. We fixed that and put in checks to detect and warn if R's internal value has been been modified, just in case. Those changes were in v1.9.2 on CRAN. I think I wasn't 100% confident in the detection test (false positives) so made it a warning instead of an error. Now that R 3.1 is out and we haven't had any false positives, it should be an error.

The feature of this upc_table is that all the groups are size 1 :

> upc_table[, .N, by=list(upc, upc_ver_uc)][,max(N)]
[1] 1

If we change the example so that one group has more than 1 row, it works ok :

> upc_table = data.table(upc=c(1:99998,1,1), upc_ver_uc=rep(c(1,2), times=50000), is_PL=rep(c(T, F, F, T), each=25000), product_module_code=rep(1:4, times=25000), ignore.column=2:100001)
> upc_table[, .N, by=list(upc, upc_ver_uc)][,max(N)]
[1] 2
> upc = upc_table[, list(is_PL, product_module_code), keyby=list(upc, upc_ver_uc)]

So it seems the problem is in the single allocation of working memory for the largest group when that's just 1 and contains a logical column. Odd, I would have sworn we caught that! Will fix.

R-devel are planning to do more of this small-object-sharing for common single integer values e.g. 0-10, so we'll need to add more tests accordingly.

Thanks,
Matt



On 01/05/14 05:40, James Sams wrote:
I don't really know what this error message means. A quick example to show what I'm seeing:

> library(data.table)
data.table 1.9.3  For help type: help("data.table")
> upc_table = data.table(upc=1:100000, upc_ver_uc=rep(c(1,2), times=50000), is_PL=rep(c(T, F, F, T), each=25000), product_module_code=rep(1:4, times=25000), ignore.column=2:100001) > upc = upc_table[, list(is_PL, product_module_code), keyby=list(upc, upc_ver_uc)]
Warning message:
In `[.data.table`(upc_table, , list(is_PL, product_module_code), :
  internal TRUE value has been modified

When I continue using R, I eventually start getting more errors, such as:

Error in gettext(domain, unlist(args)) : invalid 'string' value
Error during wrapup: invalid 'string' value

and then terminal input/output becomes corrupted. I only start getting these error messages once I start using data.table; but the messages don't necessarily occur only with data.table functions.

I don't know if the last statement above is executing correctly or not. I'm rather confused as to what is going on. I was using a somewhat stale (maybe a couple of weeks old) svn version of data.table; but I see the same behavior with the latest data.table (r1263). I'm using CRAN's R 3.1 package for Ubuntu on 13.10 and 14.04.



> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods base

other attached packages:
[1] data.table_1.9.3

loaded via a namespace (and not attached):
[1] plyr_1.8.1    Rcpp_0.11.1   reshape2_1.4  stringr_0.6.2


_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to