Now fixed in v1.9.3 :
o The warning "internal TRUE value has been modified" with recently
released R 3.1
when grouping a table containing a logical column and where all
groups are just 1 row
is now fixed and tests added. Thanks to James Sams for the
reproducible example.
The warning is issued by R and we have asked if it can be upgraded
to error.
Matt
On 01/05/14 16:29, Matt Dowle wrote:
Reproduced, thanks for nice example. Not sure yet but what R 3.1 now
does is store length 1 logical vectors once only, globally, for
efficiency to avoid many new allocations for the common case of single
TRUE or FALSE values passed around at C or R level (a nice and welcome
change). Since data.table modifies vectors by reference, if that
vector is length 1 a new data.table bug as from R 3.1 could be
modifying R's internal value of TRUE or FALSE whenever length 1
logical vectors occur. Clearly a serious bug. The test suite
immediately broke the day after the R-devel change was made (good) and
was one reason data.table was in error state in CRAN checks for quite
a while before R 3.1 shipped. It was typically tests of 1-row
data.table's including a logical column and modifying that logical
column that broke. We fixed that and put in checks to detect and warn
if R's internal value has been been modified, just in case. Those
changes were in v1.9.2 on CRAN. I think I wasn't 100% confident in
the detection test (false positives) so made it a warning instead of
an error. Now that R 3.1 is out and we haven't had any false
positives, it should be an error.
The feature of this upc_table is that all the groups are size 1 :
> upc_table[, .N, by=list(upc, upc_ver_uc)][,max(N)]
[1] 1
If we change the example so that one group has more than 1 row, it
works ok :
> upc_table = data.table(upc=c(1:99998,1,1), upc_ver_uc=rep(c(1,2),
times=50000), is_PL=rep(c(T, F, F, T), each=25000),
product_module_code=rep(1:4, times=25000), ignore.column=2:100001)
> upc_table[, .N, by=list(upc, upc_ver_uc)][,max(N)]
[1] 2
> upc = upc_table[, list(is_PL, product_module_code), keyby=list(upc,
upc_ver_uc)]
So it seems the problem is in the single allocation of working memory
for the largest group when that's just 1 and contains a logical
column. Odd, I would have sworn we caught that! Will fix.
R-devel are planning to do more of this small-object-sharing for
common single integer values e.g. 0-10, so we'll need to add more
tests accordingly.
Thanks,
Matt
On 01/05/14 05:40, James Sams wrote:
I don't really know what this error message means. A quick example to
show what I'm seeing:
> library(data.table)
data.table 1.9.3 For help type: help("data.table")
> upc_table = data.table(upc=1:100000, upc_ver_uc=rep(c(1,2),
times=50000), is_PL=rep(c(T, F, F, T), each=25000),
product_module_code=rep(1:4, times=25000), ignore.column=2:100001)
> upc = upc_table[, list(is_PL, product_module_code), keyby=list(upc,
upc_ver_uc)]
Warning message:
In `[.data.table`(upc_table, , list(is_PL, product_module_code), :
internal TRUE value has been modified
When I continue using R, I eventually start getting more errors, such
as:
Error in gettext(domain, unlist(args)) : invalid 'string' value
Error during wrapup: invalid 'string' value
and then terminal input/output becomes corrupted. I only start
getting these error messages once I start using data.table; but the
messages don't necessarily occur only with data.table functions.
I don't know if the last statement above is executing correctly or
not. I'm rather confused as to what is going on. I was using a
somewhat stale (maybe a couple of weeks old) svn version of
data.table; but I see the same behavior with the latest data.table
(r1263). I'm using CRAN's R 3.1 package for Ubuntu on 13.10 and 14.04.
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.9.3
loaded via a namespace (and not attached):
[1] plyr_1.8.1 Rcpp_0.11.1 reshape2_1.4 stringr_0.6.2
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help