Great. Yes, bug.report(package="data.table") please. R-Forge is back online now. Thanks.
> Good catch; I did indeed snag 1.8.2 from R-Forge because I needed > something in that version but didn't see it on CRAN at the time; never > occurred to me the version would change. > > I uninstalled and installed the CRAN version. I get 717 from the tests. > However the merge behavior is the same; in one direction it succeeds but > changes the column names; in the other direction it fails in setcolorder. > > So I should open bug reports, then, eh? > >> test.data.table() > Running .../tests.Rraw > x = 10,000 sample from 100 strings (quick test to save load on CRAN > servers where tests run every day. In dev we increase n and m a lot for > meaningful times. > 0.001 : f=factor(x) [high up front cost, plus storage and maintenance of > levels] > 0.000 : sort.list(,'radix') on f > 0.000 : u=unique(x) > 0.000 : .Internal(order(u)) > 0.001 : sort.list(,'radix') on fsorted > -vs- > 0.000 : char group on x (ad hoc by) [slower than radix on f but without > up front cost] > 0.000 : char sort on x (setkey) [lower up front cost than factor(x)] > 0.000 : char group on xsorted (keyed by) [faster than sort.list(,'radix') > on fsorted, same result] > All 717 tests in test.data.table() completed ok in 15.697sec > >> DT1 = data.table(a=letters[1:5], "Illegal(name%)"=1:5, key="a") >> DT2 = data.table(a=letters[1:5], b=6L, key="a") >> merge(DT1,DT2) > a Illegal.name.. b > 1: a 1 6 > 2: b 2 6 > 3: c 3 6 > 4: d 4 6 > 5: e 5 6 >> merge(DT2,DT1) > Error in setcolorder(dt, c(setdiff(names(dt), end), end)) : > neworder is length 4 but x has 3 columns. > > -----Original Message----- > From: Matthew Dowle [mailto:[email protected]] > Sent: Wednesday, August 08, 2012 6:17 PM > To: Kaupas, George > Cc: [email protected] > Subject: RE: [datatable-help] can I count on data.table supporting > syntactically invalid column names? > > > Then somehow you don't have the CRAN version of v1.8.2 installed. By any > chance did you install 1.8.2 from R-Forge in the few days a slightly > earlier version of 1.8.2 existed on R-Forge? R-Forge also happened to be > stale in that time window. The first submission of 1.8.2 to CRAN was > reverted due to some difficulties, so it needed a 2nd attempt and took > longer than usual. > > Please uninstall data.table and reinstall from any CRAN mirror (not > R-Forge) to make sure. A difference between 714 and 717 indicates an > installation problem of data.table, not R itself. test.data.table() v1.8.2 > must return 717 precisely. > > Another way would be to include the SVN rev number in the package version. > But I haven't found a way to do that for packages yet. R itself does that > of course, but I don't know how for packages. Since all changes in > data.table are accompanied by new tests, the current approach is using the > number of tests. And actually running all the tests on your hardware etc > is a stronger test everything is working as intended. > > >> The test.data.table() routine returns 714, not 717. >> >> I'm running data.table 1.8.2. >> >> The only thing not bleeding edge (I think) is R itself which is at >> 2.15.0. >> >> A search for "merge" on r-forge gets two hits, neither are related; a >> search for setcolorder gets no hits. Should I file a bug report (or >> two)? >> >> Here's my output from test.data.table() and sessionInfo(): >> >>> test.data.table() >> Running .../tests.Rraw >> Loading required package: hexbin >> Loading required package: grid >> Loading required package: lattice >> x = 10,000 sample from 100 strings (quick test to save load on CRAN >> servers where tests run every day. In dev we increase n and m a lot >> for meaningful times. >> 0.002 : f=factor(x) [high up front cost, plus storage and maintenance >> of levels] >> 0.000 : sort.list(,'radix') on f >> 0.000 : u=unique(x) >> 0.000 : .Internal(order(u)) >> 0.000 : sort.list(,'radix') on fsorted >> -vs- >> 0.000 : char group on x (ad hoc by) [slower than radix on f but >> without up front cost] >> 0.000 : char sort on x (setkey) [lower up front cost than factor(x)] >> 0.000 : char group on xsorted (keyed by) [faster than >> sort.list(,'radix') on fsorted, same result] All 714 tests in >> test.data.table() completed ok in 15.272sec >> >>> sessionInfo() >> R version 2.15.0 (2012-03-30) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] grid stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] hexbin_1.26.0 lattice_0.20-6 nlme_3.1-103 ggplot2_0.9.1 >> [5] reshape_0.8.4 plyr_1.7.1 data.table_1.8.2 >> >> loaded via a namespace (and not attached): >> [1] colorspace_1.1-1 dichromat_1.2-4 digest_0.5.2 >> labeling_0.1 >> [5] MASS_7.3-17 memoise_0.1 munsell_0.3 >> proto_0.3-9.2 >> [9] RColorBrewer_1.0-5 reshape2_1.2.1 scales_0.2.1 >> stringr_0.6.1 >> >> -----Original Message----- >> From: Matthew Dowle [mailto:[email protected]] >> Sent: Wednesday, August 08, 2012 4:49 AM >> To: Kaupas, George >> Cc: [email protected] >> Subject: Re: [datatable-help] can I count on data.table supporting >> syntactically invalid column names? >> >> >> Meant to write 2nd paragraph as follows : >> >>> >>> Hi. Yes you should be able to rely on that. It's useful to have >>> special characters in column names for latex formatting, and spaces >>> are allowed too. There are tests for these things. If you need to >>> refer to such column names as variables, then it's up to you to wrap >>> with ``; e.g., by=`Illegal(name%)`+1. >>> >>> So yes, if you find problems with special characters, please report >>> as bugs, and suggest where the documentation needs improving would be >>> great. >>> >>> I seem to remember a bug fix in this regard, and in particular in >>> merge (so my first thought is to ask you if you've recently upgraded >>> to 1.8.2 and if test.data.table returns 717), but as you say R-Forge >>> is currently down for maintenance... >>> >>> That neworder error looks familiar too. Are you sure you have 1.8.2 >>> running in memory? (Run test.data.table() to see if it returns 717). >>> >>> Matthew >>> >>>> I'm taking advantage of a feature in data.table which lets me get >>>> away with naming columns with characters that would not survive a >>>> call to make.names(), e.g.: >>>> >>>>> DT1 = data.table(a=letters[1:5], "Illegal(name%)"=1:5, key="a") >>>>> DT1 >>>> a Illegal(name%) >>>> 1: a 1 >>>> 2: b 2 >>>> 3: c 3 >>>> 4: d 4 >>>> 5: e 5 >>>> >>>> (The the dcast function from the reshape2 package will also create >>>> columns named "illegally".) >>>> >>>> But when using merge.data.table, I get two side-effects; either the >>>> merge works, but the column names appear to be run through >>>> make.names(), or the merge fails in setcolorder(): >>>> >>>>> DT1 = data.table(a=letters[1:5], "Illegal(name%)"=1:5, key="a") >>>>> DT2 = data.table(a=letters[1:5], b=6L, key="a") >>>> >>>>> merge(DT1,DT2) >>>> a Illegal.name.. b >>>> 1: a 1 6 >>>> 2: b 2 6 >>>> 3: c 3 6 >>>> 4: d 4 6 >>>> 5: e 5 6 >>>> >>>>> merge(DT2,DT1) >>>> Error in setcolorder(dt, c(setdiff(names(dt), end), end)) : >>>> neworder is length 4 but x has 3 columns. >>>> >>>> I can't get to datatable.r-forge.r-project.org - getting a 504. >>>> >>>> So... should I NOT rely on being able to use special characters in >>>> column names? >>>> >>>> Thanks >>>> George >>>> >>>>> sessionInfo() >>>> R version 2.15.0 (2012-03-30) >>>> Platform: x86_64-unknown-linux-gnu (64-bit) [1] data.table_1.8.2 >> >> > > > _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
