The shuffling can form a different number of groups can't it? table(c(1,1,2,2), c(3,3,4,4)) # 2 groups table(c(2,2,1,1), c(3,3,4,4)) # 2 groups table(c(2,1,2,1), c(3,3,4,4)) # 4 groups
> Thanks Matthew > > I am not sure I understand the code (actually, I am sure I do not :-( . > More specifically, I would expect the two expressions below to yield > tables > of the same dimension (basically all combinations of wdpaint and pnnid): > > aa <- SPFdt[, .N, by=list(sample(wdpaint,replace=FALSE),pnvid)] > dim(aa) >> 254 3 > bb <- SPFdt[, .N, by=list(wdpaint,pnvid) > dim(bb) >> 170 3 > > What I am looking for is creating a cross table of pnvid and wdpaint, > i.e., > the frequency or number of occurrences of each combination of pnvid and > wdpaint. Shuffling wdpaint should give in that case a different frequency > distribution, like in the example below: > > table(c(1,1,2,2), c(3,3,4,4)) > table(c(2,2,1,1), c(3,3,4,4)) > > Basically what I want to do is run X permutations on a data set which I > will then use to create a confidence interval on the frequency > distribution > of sample points over wdpaint and pnvid > > Cheers, > > Paulo > > > > > > On Tue, Jun 19, 2012 at 3:30 PM, Matthew Dowle > <[email protected]>wrote: > >> >> Hi, >> >> Welcome to the list. >> >> Rather than picking a column and calling length() on it, .N is a little >> more convenient (and faster if that column isn't otherwise used, as in >> this example). Search ?data.table for the string ".N" to find out more. >> >> And to group by expressions of column names, wrap with list(). So, >> >> SPF[, .N, by=list(sample(wdpaint,replace=FALSE),pnvid)] >> >> But that won't calculate any different statistics, just return the >> groups >> in a different order. Seems like just an example, rather than the real >> task, iiuc, which is fine of course. >> >> Matthew >> >> >> > Hi, I am new to this package and not sure how to implement the >> sample() >> > function with data.table. >> > >> > I have a data frame SPF with three columns cat, pnvid and wdpaint. The >> > pnvid variables has values 1:3, the wdpaint has values 1:10. I am >> > interested in the count of all combinations of wdpaint and pnvid in my >> > data >> > set, which can be calculated using table or tapply (I use the latter >> in >> > the >> > example code below). >> > >> > Normally I would use something like: >> > >> > *c <- tapply(SPF$cat, list(as.factor(SPF$pnvid), >> as.factor(SPF$wdpaint), >> > function(x) length(x))* >> > >> > If I understand correctly, I would use the below when working with >> data >> > tables: >> > >> > *f <- SPF[,length(cat),by="wdpaint,pnvid"]* >> > >> > But what if I want to reshuffle the column wdpaint first? When using >> > tapply, it would be something along the lines of: >> > >> > *a <- list(as.factor(SPF$pnvid), as.factor(sample(SPF$wdpaint, >> > replace=F))) >> > c <- tapply(SPF$cat, a, function(x) length(x))* >> > >> > >> > But how to do this with data.table? >> > >> > Paulo >> > _______________________________________________ >> > datatable-help mailing list >> > [email protected] >> > >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >> >> >> > _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
