Hi,

You write: There was some discussion of an .EACHI facility for data.table. Not 
sure what happened about that but I have an example that might be useful: 
http://stackoverflow.com/questions/24472254/splitting-a-column-by-factor-within-a-data-frame/24472571#24472571

by=.EACHI was implemented to remove the implicit “by-without-by” feature during 
joins. And that has been implemented quite sometime back - check the first FR 
implemented in the README following which Matt also posted on the mailing list 
asking for feedback.

You write: which shows the code where DT has columns v1, v2 and v3: DT[, 
split(v2, v1), by = names(DT)] ```

A small comment on this solution per-se. This calls split for each row! I’d 
approach this a little different:

## 1.9.3
rbindlist(setDT(dd)[, {  
              ans = list(v2);  
              setattr(ans, 'names', v1);  
              list(list(ans))
              }, by = list(v1=as.character(v1))
           ]$V1,  
fill=TRUE)

#     a  b
# 1:  1 NA
# 2:  2 NA
# 3:  6 NA
# 4: NA  3
# 5: NA  4
# 6: NA  5
We can then add this back to dd by reference. Personally I’ve never had to call 
split on a data.table.

You write: It works well if the rows of DT are unique but if they are not then 
one must do something ugly like appending a uniquifying column of 1:nrow(DT), 
say, and then including that in by and then finally removing it again at the 
end.

This suggests two features:

The ability to tell it to do the by by row
The ability to selectively omit by variables from the output ```
Not sure I follow this entirely, but by= does accept expressions. So, you could 
do:

dd[, split(v2,v1), by=1:nrow(dd)]
#    nrow  a  b
# 1:    1  1 NA
# 2:    2  2 NA
# 3:    3  6 NA
# 4:    4 NA  3
# 5:    5 NA  4
# 6:    6 NA  5
You write: (By the way, is there an intention to move to the issue system on 
github for things like this?)

The entire issues from R-Forge have been already moved to github, including 
feature requests. And since then users have filed new FRs/bugs here. So, yes, 
you can file FRs directly, although in this case, I think the feature already 
exists (IIUC)?



Arun

From: Gabor Grothendieck [email protected]
Reply: Gabor Grothendieck [email protected]
Date: June 29, 2014 at 10:59:22 PM
To: [email protected] 
[email protected]
Subject:  [datatable-help] by row  

There was some discussion of an .EACHI facility for data.table. Not  
sure what happened about that but I have an example that might be  
useful:  

http://stackoverflow.com/questions/24472254/splitting-a-column-by-factor-within-a-data-frame/24472571#24472571
  

which shows the code where DT has columns v1, v2 and v3:  

DT[, split(v2, v1), by = names(DT)]  

It works well if the rows of DT are unique but if they are not then  
one must do something ugly like appending a uniquifying column of  
1:nrow(DT), say, and then including that in by and then finally  
removing it again at the end.  

This suggests two features:  

1. The ability to tell it to do the by by row  
2. The ability to selectively omit by variables from the output  

For example, if one could use a pseudo column .I and if -.I meant do  
not include it in the output then one could write:  

DT[, split(v2, v1), by = c(names(DT), -.I)]  

Other syntaxes may be thought of too and the main suggestion here is  
the possible need for these features rather than the specific syntax.  

(By the way, is there an intention to move to the issue system on  
github for things like this?)  

--  
Statistics & Software Consulting  
GKX Group, GKX Associates Inc.  
tel: 1-877-GKX-GROUP  
email: ggrothendieck at gmail.com  
_______________________________________________  
datatable-help mailing list  
[email protected]  
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help  
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to