Hi,
You write: There was some discussion of an .EACHI facility for data.table. Not
sure what happened about that but I have an example that might be useful:
http://stackoverflow.com/questions/24472254/splitting-a-column-by-factor-within-a-data-frame/24472571#24472571
by=.EACHI was implemented to remove the implicit “by-without-by” feature during
joins. And that has been implemented quite sometime back - check the first FR
implemented in the README following which Matt also posted on the mailing list
asking for feedback.
You write: which shows the code where DT has columns v1, v2 and v3: DT[,
split(v2, v1), by = names(DT)] ```
A small comment on this solution per-se. This calls split for each row! I’d
approach this a little different:
## 1.9.3
rbindlist(setDT(dd)[, {
ans = list(v2);
setattr(ans, 'names', v1);
list(list(ans))
}, by = list(v1=as.character(v1))
]$V1,
fill=TRUE)
# a b
# 1: 1 NA
# 2: 2 NA
# 3: 6 NA
# 4: NA 3
# 5: NA 4
# 6: NA 5
We can then add this back to dd by reference. Personally I’ve never had to call
split on a data.table.
You write: It works well if the rows of DT are unique but if they are not then
one must do something ugly like appending a uniquifying column of 1:nrow(DT),
say, and then including that in by and then finally removing it again at the
end.
This suggests two features:
The ability to tell it to do the by by row
The ability to selectively omit by variables from the output ```
Not sure I follow this entirely, but by= does accept expressions. So, you could
do:
dd[, split(v2,v1), by=1:nrow(dd)]
# nrow a b
# 1: 1 1 NA
# 2: 2 2 NA
# 3: 3 6 NA
# 4: 4 NA 3
# 5: 5 NA 4
# 6: 6 NA 5
You write: (By the way, is there an intention to move to the issue system on
github for things like this?)
The entire issues from R-Forge have been already moved to github, including
feature requests. And since then users have filed new FRs/bugs here. So, yes,
you can file FRs directly, although in this case, I think the feature already
exists (IIUC)?
Arun
From: Gabor Grothendieck [email protected]
Reply: Gabor Grothendieck [email protected]
Date: June 29, 2014 at 10:59:22 PM
To: [email protected]
[email protected]
Subject: [datatable-help] by row
There was some discussion of an .EACHI facility for data.table. Not
sure what happened about that but I have an example that might be
useful:
http://stackoverflow.com/questions/24472254/splitting-a-column-by-factor-within-a-data-frame/24472571#24472571
which shows the code where DT has columns v1, v2 and v3:
DT[, split(v2, v1), by = names(DT)]
It works well if the rows of DT are unique but if they are not then
one must do something ugly like appending a uniquifying column of
1:nrow(DT), say, and then including that in by and then finally
removing it again at the end.
This suggests two features:
1. The ability to tell it to do the by by row
2. The ability to selectively omit by variables from the output
For example, if one could use a pseudo column .I and if -.I meant do
not include it in the output then one could write:
DT[, split(v2, v1), by = c(names(DT), -.I)]
Other syntaxes may be thought of too and the main suggestion here is
the possible need for these features rather than the specific syntax.
(By the way, is there an intention to move to the issue system on
github for things like this?)
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help