[datatable-help] Fixed 'grp.1' annoyance when j returns a subset of .SD

Matthew Dowle Sat, 22 Jan 2011 08:56:32 -0800

All,

Now that 1.5.2 is on CRAN I've made this change which may introduce
incompatibilities for existing code.


This is in 1.5.3 on R-Forge to be tried out for a while, discussed, and
rolled back or done differently if need be. Please let us know if this
causes any difficulties.

Tests 104, 144 and 229 have been changed accordingly.

A side effect of this change is that it should be faster for two
reasons: i) the grp columns were redundant in .SD (repeated values) so
taking them out leaves less for dogroups to do and ii) less need to take
a subset of columns of .SD merely to remove the grp columns to do
something on the others (code that does that may be up to twice as fast
when the subset of .SD is removed). 

>From NEWS :

  o    .SD no longer includes 'by' columns, FR#978. This resolves
        the long standing annoyance of duplicated 'by' columns
        when the j expression returns a subset of rows from .SD. 
        For example, the following query no longer contains
        a redundant 'colA.1' duplicate.
            DT[,.SD[2],by=colA] #  2nd row of each group
        Any existing code that uses .SD may require simple
        changes to remove workarounds. If columns have been
        referenced by name, as recommended where possible, then
        no changes to existing code should be required.

Matthew


_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

[datatable-help] Fixed 'grp.1' annoyance when j returns a subset of .SD

Reply via email to