Re: [datatable-help] variable column names

Matthew Dowle Fri, 26 Apr 2013 09:00:31 -0700

dt[, sum(behavior) > 0, by=user]

   user    V1
1:    3  TRUE
2:    4 FALSE

dt[, any(behavior), by=user]     # same

   user    V1
1:    3  TRUE
2:    4 FALSE

dt[, list(behavior = any(behavior)), by=user] # how to same withoutsetnames afterwards

   user behavior
1:    3     TRUE
2:    4    FALSE

fields <- c("country","language")
dt[, list(behavior = any(behavior)), by=c("user",fields)] # by maybe character vector of column names

   user country language behavior
1:    3       2        5     TRUE
2:    3       2        6     TRUE
3:    4       1        6    FALSE
4:    4       2        6    FALSE


HTH
Matthew



On 26.04.2013 16:45, Sam Steingold wrote:

I am still missing something:

--8<---------------cut here---------------start------------->8---

dt <- data.table(user=c(rep(4, 5),rep(3, 5)),behavior=c(rep(FALSE,5),rep(TRUE,5)),

country=c(rep(1,4),rep(2,6)),language=c(rep(6,6),rep(5,4)),

                 event=1:10, key=c("user","country","language"))

dt

    user behavior country language event
 1:    3     TRUE       2        5     7
 2:    3     TRUE       2        5     8
 3:    3     TRUE       2        5     9
 4:    3     TRUE       2        5    10
 5:    3     TRUE       2        6     6
 6:    4    FALSE       1        6     1
 7:    4    FALSE       1        6     2
 8:    4    FALSE       1        6     3
 9:    4    FALSE       1        6     4
10:    4    FALSE       2        6     5

  users <- dt[, sum(behavior) > 0, by=user]

Finding groups (bysameorder=TRUE) ... done in 0secs. bysameorder=TRUE
and o__ is length 0
Detected that j uses these columns: behavior
Optimization is on but j left unchanged as 'sum(behavior) > 0'
Starting dogroups ... done dogroups in 0 secs

users

   user    V1
1:    3  TRUE
2:    4 FALSE

setnames(users, "V1", "behavior")

--8<---------------cut here---------------end--------------->8---

Now I want to do the same thing as in

http://stackoverflow.com/questions/16200815/summarize-a-data-table-with-unreliable-data
for both fields

fields <- c("country","language")


here is what I tried so far:

--8<---------------cut here---------------start------------->8---
dt[, .N, .SDcols=fields, by=eval(list("user",fields))]
Error in `[.data.table`(dt, , .N, .SDcols = fields, by =
eval(list("user",  :
  The items in the 'by' or 'keyby' list are length (1,2). Each must
be same length as rows in x or number of rows returned by i (10).
Calls: [ -> [.data.table
--8<---------------cut here---------------end--------------->8---

the idea is to do something like

--8<---------------cut here---------------start------------->8---

dt.out <- dt[, .N, by=list(user,country)][,list(country[which.max(N)], max(N)/sum(N)), by=user]setnames(dt.out, c("V1", "V2"), paste0("country",c(".name",".support")))
users <- users[dt.out]

   user behavior country.name country.support
1:    3     TRUE            2             1.0
2:    4    FALSE            1             0.8
--8<---------------cut here---------------end--------------->8---

except that I do not want to have the literal "country" and"language"

and that I am sure there is a way to avoid copying users in

users <- users[dt.out]

by a ":=" trick.

Thanks.

* Matthew Dowle <[email protected]> [2013-04-24 21:54:17+0100]:
where ... is eval(myid)
iigc
Or:
DT[,lapply(.SD,sum),by=...,.SDcols=myvars]


_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] variable column names

Reply via email to