Re: [R] bootstrapping quantile regression
A possiblie solution might be to use the survey package. You could specify that the data is clustered using the svydesign function, and then speciy the replicate weights using the as.svrepdesign function. And then, it would be possible to use the withReplicates function to bootstrap the clusters A copy of Thomas Lumley's book - Complex Surveys - would probably help with this Hope this helps -- View this message in context: http://r.789695.n4.nabble.com/bootstrapping-quantile-regression-tp4647948p4648032.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weighted mean by week
The plyr package is very helpful for this: library(plyr) ddply(x ,.(myweek), summarize, m1=weighted.mean(var1,myweight), m2=weighted.mean(var2,myweight)) -- View this message in context: http://r.789695.n4.nabble.com/weighted-mean-by-week-tp4636814p4636816.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weighted mean by week
If there are many variables, I'd then suggest the data.table package: library(data.table) dt=data.table(x) dt[,lapply(.SD, function(x)weighted.mean(x,myweight)), keyby=c('group', 'myweek')] The '.SD' is an abbreviation for all the variables in the data table (excluding the grouping variables). There's an .SDcols= 'variables of interest' option if you want to limit the dozens of variables to only some of them. Or, in the data.table(x) statement, you could limit the created data table to only the variables your interested in. As an added benefit, the data.table approach is amazingly fast (particularly when there are numerous grouping categories) -- View this message in context: http://r.789695.n4.nabble.com/weighted-mean-by-week-tp4636814p4636825.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weighted mean by week
Honestly, I wasn't sure what you wanted to do with 'group'. Here it is with the 'group' variable deleted library(data.table) dt=data.table(x[,-1]) dt[,lapply(.SD, function(x)weighted.mean(x,myweight)), keyby='myweek'] -- View this message in context: http://r.789695.n4.nabble.com/weighted-mean-by-week-tp4636814p4636828.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Amelia error
I've had a little experience using the package, Amelia. Are you sure that your nominal variables - race, south, etc - are in your ad04 data frame ? David Freedman -- View this message in context: http://r.789695.n4.nabble.com/Amelia-error-tp4568455p4568600.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trojan in R 2.13.0?
2.13.0 looks fine with VIPRE david freedman atlanta -- View this message in context: http://r.789695.n4.nabble.com/Trojan-in-R-2-13-0-tp3450084p3450784.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] for loop to merge .csvs
you might try merge_recurse(list(DF1,DF2,DF3,DF4,DF17)) in the reshape package -- View this message in context: http://r.789695.n4.nabble.com/for-loop-to-merge-csvs-tp3298549p3298565.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using summaryBy with weighted data
You might use the plyr package to get group-wise weighted means library(plyr) ddply(mydata,~group,summarise, b=mean(weights), c=weighted.mean(response,weights)) hth david freedman -- View this message in context: http://r.789695.n4.nabble.com/Using-summaryBy-with-weighted-data-tp3220761p3221212.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to this SAS transport file in R?
I've had good success with the read.xport function in the SASxport (not foreign) package. But couldn't you just use something like mydata-read.xport('http/demo_c.xpt') The transport file looks like it's one of the NHANES demographic files. -- View this message in context: http://r.789695.n4.nabble.com/How-to-this-SAS-transport-file-in-R-tp3073969p3074009.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data.frame and formula classes of aggregate
Hi - I apologize for the 2nd post, but I think my question from a few weeks ago may have been overlooked on a Friday afternoon. I might be missing something very obvious, but is it widely known that the aggregate function handles missing values differently depending if a data frame or a formula is the first argument ? For example, (d- data.frame(sex=rep(0:1,each=3), wt=c(100,110,120,200,210,NA),ht=c(10,20,NA,30,40,50))) x1- aggregate(d, by = list(d$sex), FUN = mean); names(x1)[3:4]- c('mean.dfcl.wt','mean.dfcl.ht') x2- aggregate(cbind(wt,ht)~sex,FUN=mean,data=d); names(x2)[2:3]- c('mean.formcl.wt','mean.formcl.ht') cbind(x1,x2)[,c(2,3,6,4,7)] The output from the data.frame class has an NA if there are missing values in the group for the variable with missing values. But, the formula class output seems to delete the entire row (missing and non-missing values) if there are any NAs. Wouldn't one expect that the 2 forms (data frame vs formula) of aggregate would give the same result? thanks very much david freedman, atlanta -- View this message in context: http://r.789695.n4.nabble.com/data-frame-and-formula-classes-of-aggregate-tp3063668p3063668.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.frame and formula classes of aggregate
Thanks for the information. There was a discussion of different results obtained with the formula and data.frame methods for a paired t-test -- there are many threads, but one is at http://r.789695.n4.nabble.com/Paired-t-tests-td2325956.html#a2326291 david freedman -- View this message in context: http://r.789695.n4.nabble.com/data-frame-and-formula-classes-of-aggregate-tp3063668p3064177.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aggregate with missing values, data.frame vs formula
It seems that the formula and data.frame forms of aggregate handle missing values differently. For example, (d=data.frame(sex=rep(0:1,each=3), wt=c(100,110,120,200,210,NA),ht=c(10,20,NA,30,40,50))) x1=aggregate(d, by = list(d$sex), FUN = mean); names(x1)[3:4]=c('list.wt','list.ht') x2=aggregate(cbind(wt,ht)~sex,FUN=mean,data=d); names(x2)[2:3]=c('form.wt','form.ht') cbind(x1,x2) Group.1 sex list.wt list.ht sex form.wt form.ht 1 0 0 110 NA 0 105 15 2 1 1 NA 401 205 35 So, the data.frame form deletes gives an NA if there are missing values in the group for the variable with missing values. But, the formula form deletes the entire row (missing and non-missing values) if any of the values are missing. Is this what was intended or the best option ? thanks, david freedman -- View this message in context: http://r.789695.n4.nabble.com/aggregate-with-missing-values-data-frame-vs-formula-tp3041198p3041198.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Svy function doesn't work nested in user-defined function
I'm not sure that this is the problem, but are you certain that the variable 'con' is in audit ? You check outside the function just really tells you that X and SEX are in audit. hth, david freedman (good to see another CDCer using R) -- View this message in context: http://r.789695.n4.nabble.com/Svy-function-doesn-t-work-nested-in-user-defined-function-tp2224732p2224843.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] percent by subset
how about: d=data.frame(ht=rnorm(20,60,20),sex=sample(0:1,20,rep=T)); d with(d,by(ht,sex,mean)) with(d,by(ht,sex,function(x)mean(x=60))) hth, david freedman -- View this message in context: http://r.789695.n4.nabble.com/percent-by-subset-tp2123057p2123079.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] binary logistic regression taking account clustering
The bootcov function for models fit using lrm in the rms package might also be an option hth -- View this message in context: http://r.789695.n4.nabble.com/binary-logistic-regression-taking-account-clustering-tp2122255p2122311.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help wiht model.matrix -- how to keep missing values?
I'm not sure what to do with model.matrix, but you might look at some of the code for ols in the rms package. The rms package (from frank harrell), has several options for keeping the NAs in the output from regression models so that residuals are correctly aligned. hth david freedman -- View this message in context: http://r.789695.n4.nabble.com/help-wiht-model-matrix-how-to-keep-missing-values-tp2072248p2073573.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] get means of elements of 5 matrices in a list
I've got a list of 5 matrices that are each 5 x 6. I'd like to end up with a 5 x 6 matrix that contains the mean value of the 5 original matrices. I can do this by brute force, but there must be a better way than making each matrix into a vector and then remaking a matrix thanks very much for any help david freedman ll=list(structure(c(9.7, 17.6, 20.8, 24.1, 33.8, 14.5, 25.7, 29.8, 33.6, 44.8, 21.8, 32.6, 37.5, 40.9, 53.3, 16.7, 26.1, 29.5, 32.7, 42.6, 26.2, 34.3, 37, 39.8, 47.1, 31.9, 40.3, 43.3, 46.2, 54.1 ), .Dim = 5:6), structure(c(9.4, 17.7, 20.7, 24.1, 33.7, 14.5, 25.7, 29.8, 33.6, 44.8, 21.8, 32.7, 37.5, 40.9, 53.1, 16, 26, 29.5, 32.7, 42.7, 25.6, 34.1, 37, 39.8, 47.1, 31.9, 40.3, 43.3, 46.1, 54.1), .Dim = 5:6), structure(c(9.6, 17.7, 20.8, 24.4, 34.3, 14.5, 25.7, 29.8, 33.6, 44.8, 21.8, 32.6, 37.5, 40.9, 53.2, 16.7, 26.1, 29.5, 32.8, 42.8, 26.2, 34.2, 36.8, 39.9, 47.1, 31.9, 40.3, 43.3, 46.1, 54.8), .Dim = 5:6), structure(c(9.7, 17.6, 20.7, 24.1, 33.8, 14.5, 25.7, 29.8, 33.6, 44.8, 21.8, 32.6, 37.5, 41.1, 52.6, 16.7, 26.1, 29.5, 32.8, 42.8, 26.1, 34.3, 37, 40, 47.1, 31.9, 40.3, 43.3, 46.2, 54.9), .Dim = 5:6), structure(c(8.6, 17.6, 20.7, 24.1, 33.8, 14.5, 25.6, 29.8, 33.6, 44.8, 21.8, 32.6, 37.5, 40.9, 52.5, 16, 26, 29.4, 32.8, 42.8, 25.6, 34.2, 37, 40.1, 47.1, 31.9, 40.3, 43.1, 46.1, 54.1), .Dim = 5:6)) ll x=rbind(as.vector(ll[[1]]),as.vector(ll[[2]]),as.vector(ll[[3]]),as.vector(ll[[4]]),as.vector(ll[[5]])); x x2=apply(x,2,mean); matrix(x2,byrow=F,nrow=5); -- View this message in context: http://r.789695.n4.nabble.com/get-means-of-elements-of-5-matrices-in-a-list-tp2067722p2067722.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] get means of elements of 5 matrices in a list
thanks very much for the help - all of these suggestions were much better than what I was doing -- View this message in context: http://r.789695.n4.nabble.com/get-means-of-elements-of-5-matrices-in-a-list-tp2067722p2068329.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ellipse that Contains 95% of the Observed Data
for a picture of the bagplot, try going to http://www.statmethods.net/graphs/boxplot.html -- View this message in context: http://n4.nabble.com/Ellipse-that-Contains-95-of-the-Observed-Data-tp1694538p1695236.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply syntax
sorry - I use many abbreviations and I try to remove them before I post questions/answers - 'set' is my abb. for subset david On 3/28/2010 8:27 PM, Jeff Brown [via R] wrote: What is the function set()? Is that a typo? When I type ?set I get nothing, and when I try to evaluate that code R tells me it can't find the function. View message @ http://n4.nabble.com/tapply-syntax-tp1692503p1694586.html To unsubscribe from Re: tapply syntax, click here (link removed) =. -- View this message in context: http://n4.nabble.com/tapply-syntax-tp1692503p1694626.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply syntax
how about: d1=data.frame(pat=c(rep('a',3),'b','c',rep('d',2),rep('e',2),'f'),var=c(1,2,3,1,2,2,3,2,4,4)) ds=set(d1,var %in% c(2,3)) with(ds,tapply(var,pat,FUN=length)) hth, David Freedman, CDC, Atlanta -- View this message in context: http://n4.nabble.com/tapply-syntax-tp1692503p1692553.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] for() loop
Hi - do you want sum(datjan$V4) OR sum(unique(datjan$V4)) ? there's also a cumsum function that might be useful hth, David Freedman, Atlanta Schmidt Martin wrote: Hello I'm working with R since a few month and have still many trivial questions - I guess! Here is what I want: I have this matrix: dim(datjan) [1] 899 4 The first 10 rows looks as follows: datjan[1:10,] V1 V2 V3 V4 1 1961 1 1 24 2 1961 1 2 24 3 1961 1 3 24 4 1961 1 4 24 5 1961 1 5 24 6 1961 1 6 27 7 1961 1 7 27 8 1961 1 8 27 9 1961 1 9 27 10 1961 1 10 27 I tried now to create a for() loop, which gives me the sum of the 30 different classes (1:30!) in [,4]. for(i in 1:30){ sum(datjan[,4]==i) } R is then actually calculating the sum of i which certainly doesn't exist and results in a 0 value t1-sum(datjan[,4]==1) t2-sum(datjan[,4]==2) .until '30' This way its working, but I won't find a end by doing all this by hand, because there are many other matrix waiting. So, how can I make work that loop?? thanks for helping me __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://n4.nabble.com/for-loop-tp1592713p1593235.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ordering columns in a data frame
I'm not sure what's incorrect about your result, but the following works: d=data.frame(a=sample(letters[1:5],10,rep=T),b=rnorm(10),c=sample(1:10,10)); d d[order(d$a,d$c),] or, you can use orderBy: lib(doBy) orderBy(~a+b,data=d) #use a - sign to sort in descending sequence Did you leave off the tilde in your orderBy example? hth David Freedman, CDC Atlanta -- View this message in context: http://n4.nabble.com/ordering-columns-in-a-data-frame-tp1587294p1587318.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with Hmisc, cut2, split and quantile
try as.numeric(read_data$DEC) this should turn it into a numeric variable that you can work with hth David Freedman CDC, Atlanta Guy Green wrote: Hi Peter others, Thanks (Peter) - that gets me really close to what I was hoping for. The one problem I have is that the cut approach breaks the data into intervals based on the absolute value of the Target data, rather than their frequency. In other words, if the data ranged from 0 to 50, the data would be separated into 0-5, 5-10 and so on, regardless of the frequency within those categories. However I want to get the data into deciles. The code that does this (incorporating Peter's) is: read_data=read.table(C:/Sample table.txt, head = T) read_data$DEC - with(read_data, cut(Target, breaks=10, labels=1:10)) L - split(read_data, read_data$DEC) This means that I can get separate data frames, such as L$'10', which comes out tidy, but only containing 2 data items (the sample has 63 rows, so each decile should have 6+ data items): ActualTarget DEC 9 0.572 0.3778386 10 31 0.2990.3546606 10 If I try to adjust this to get deciles using cut2(), I can break the data into deciles as follows: read_data=read.table(C:/Sample table.txt, head = T) read_data$DEC - with(read_data, cut2(read_data$Target, g=10), labels=1:10) L - split(read_data, read_data$DEC) However this time, while the data is broken into even data frames, the labels for the separate data frames are unuseable, e.g.: $`[ 0.26477, 0.37784]` ActualTarget DEC 6 0.243 0.2650960[ 0.26477, 0.37784] 9 0.572 0.3778386[ 0.26477, 0.37784] 10 -0.049 0.3212681[ 0.26477, 0.37784] 15 0.780 0.2778518[ 0.26477, 0.37784] 31 0.299 0.3546606[ 0.26477, 0.37784] 33 0.105 0.2647676[ 0.26477, 0.37784] Could anyone suggest a way of rearranging this to make the labels useable again? Sample data is reattached http://n4.nabble.com/file/n1585427/Sample_table.txt Sample_table.txt . Thanks, Guy Peter Ehlers wrote: On 2010-03-08 8:47, Guy Green wrote: Hello, I have a set of data with two columns: Target and Actual. A http://n4.nabble.com/file/n1584647/Sample_table.txt Sample_table.txt is attached but the data looks like this: Actual Target -0.125 0.016124906 0.135 0.120799865 ... ... ... ... I want to be able to break the data into tables based on quantiles in the Target column. I can see (using cut2, and also quantile) how to get the barrier points between the different quantiles, and I can see how I would achieve this if I was just looking to split up a vector. However I am trying to break up the whole table based on those quantiles, not just the vector. However I would like to be able to break the table into ten separate tables, each with both Actual and Target data, based on the Target data deciles: top_decile = ...(top decile of read_data, based on Target data) next_decile = ...and so on... bottom_decile = ... I would just add a factor variable indicating to which decile a particular observation belongs: dat$DEC - with(dat, cut(Target, breaks=10, labels=1:10)) If you really want to have separate data frames you can then split on the decile: L - split(dat, dat$DEC) -Peter Ehlers -- Peter Ehlers University of Calgary -- View this message in context: http://n4.nabble.com/Help-with-Hmisc-cut2-split-and-quantile-tp1584647p1585503.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to add a variable to a dataframe whose values are conditional upon the values of an existing variable
there's a recode function in the Hmisc package, but it's difficult (at least for me) to find documentation for it library(Hmisc) week - c('SAT', 'SUN', 'MON', 'FRI'); recode(week,c('SAT', 'SUN', 'MON', 'FRI'),1:4) HTH -- View this message in context: http://n4.nabble.com/How-to-add-a-variable-to-a-dataframe-whose-values-are-conditional-upon-the-values-of-an-existing-vare-tp1571214p1572261.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot of odds ratios obtained from a logistic model
You might want to look at the plot.Predict function in the rms package - it allows you to plot the logits or probablities vs the predictor variable at specified levels of other covariates (if any) in the model. There are many examples in http://cran.r-project.org/web/packages/rms/rms.pdf David Freedman -- View this message in context: http://n4.nabble.com/Plot-of-odds-ratios-obtained-from-a-logistic-model-tp1471496p1471566.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] replace a for loop with lapply or relative
Dear helpers. I often need to make dichotomous variables out of continuous ones (yes, I realize the problems with throwing away much of the information), but I then like to check the min and max of each category. I have the following simple code to do this that cuts each variable (x1,x2,x3) at the 90th percentile, and then prints the min and max of each category: d=data.frame(x1=rnorm(100),x2=runif(100)); d=transform(d,x3=x1-x2) d[,4:6]=data.frame(sapply(d,function(v)as.numeric(v=quantile(v,0.9; names(d)[4:6]=c('x1high','x2high','x3high') head(d) for (i in 1:3){print(do.call(rbind,by(d[,i],d[,i+3],function(x)(c(min(x),max(x))} Is there a way to replace the ugly for loop in the last line with some type of apply function that would know that my continuous and indicator variable are 3 variables apart in the dataframe? Thanks very much David Freedman -- View this message in context: http://n4.nabble.com/replace-a-for-loop-with-lapply-or-relative-tp1469453p1469453.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply for function taking of 1 argument?
also, library(plyr) ddply(d,~grp,function(df) weighted.mean(df$x,df$w)) -- View this message in context: http://n4.nabble.com/tapply-for-function-taking-of-1-argument-tp1460392p1461428.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quartiles and Inter-Quartile Range
please do some reading - I *think* the main difference is that the x and y axes are reversed, but I really don't know what SAS prints out there are a many 'defaults' that are rather arbitrary - sometimes SAS uses 1, while R uses another ??qqnorm brings up a list of functions, including stats::qqnorm ?stats::qqnorm brings up the help page for the function On Mon, Jan 25, 2010 at 12:41 PM, eeramalho [via R] ml-node+1289588-934349...@n4.nabble.comml-node%2b1289588-934349...@n4.nabble.com wrote: Thank you David. I got it now. Maybe you can help again. I am typing the following code to draw a QQ plot but the line does not look right? *** qqnorm(cbiomass) qqline(cbiomass) *** the graph looks very different from the graph generated by SAS. Thank you again. Emiliano. -- Date: Fri, 22 Jan 2010 21:32:26 -0800 From: [hidden email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1289588i=0 To: [hidden email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1289588i=1 Subject: Re: Quartiles and Inter-Quartile Range It's looks like you think that type=2 are the 'true' quantiles, but the default method in R is type=7 You might want to look at ?stats::quantile hth david freedman -- View message @ http://n4.nabble.com/Quartiles-and-Inter-Quartile-Range-tp1145817p1213199.html To unsubscribe from Quartiles and Inter-Quartile Range, click here. -- Quer fazer um álbum Ãncrivel? Conheça o Windows Live Fotos clicando aqui.http://www.eutenhomaisnowindowslive.com.br/?utm_source=MSN_Hotmailutm_medium=Taglineutm_campaign=InfuseSocial -- View message @ http://n4.nabble.com/Quartiles-and-Inter-Quartile-Range-tp1145817p1289588.html To unsubscribe from Re: Quartiles and Inter-Quartile Range, click here (link removed) =. -- Natalia and/or David -- View this message in context: http://n4.nabble.com/Quartiles-and-Inter-Quartile-Range-tp1145817p1289653.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quartiles and Inter-Quartile Range
and SAS give one a choice of 5 option, and i'm fairly sure that it used a different default than does R (although one of the 5 corresponds to the sas default) see pctldef on http://www.technion.ac.il/docs/sas/proc/z0146803.htm my simple brain thinks of thinks of the problem as 'how does one calculate the median of 4 values?' david freedman Girish A.R. [via R] wrote: Interestingly, Hmisc::describe() and summary() seem to be using one Type, and stats::fivenum() seems to be using another Type. fivenum(cbiomass) [1] 910.0 1039.0 1088.5 1156.5 1415.0 summary(cbiomass) Min. 1st Qu. MedianMean 3rd Qu.Max. 91010481088110411391415 describe(cbiomass)$counts n missing unique Mean .05 .10 .25 .50 .75 12 0 12 1104 920.5 938.3 1047.5 1088.5 1138.8 .90 .95 1248.7 1327.0 cheers, -Girish View message @ http://n4.nabble.com/Quartiles-and-Inter-Quartile-Range-tp1145817p1248728.html To unsubscribe from Re: Quartiles and Inter-Quartile Range, click here (link removed) =. -- View this message in context: http://n4.nabble.com/Quartiles-and-Inter-Quartile-Range-tp1145817p1264043.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quartiles and Inter-Quartile Range
It's looks like you think that type=2 are the 'true' quantiles, but the default method in R is type=7 You might want to look at ?stats::quantile hth david freedman -- View this message in context: http://n4.nabble.com/Quartiles-and-Inter-Quartile-Range-tp1145817p1213199.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mutliple sets of data in one dataset....Need a loop?
You'll probably want to look at the 'by' function d=data.frame(sex=rep(1:2,50),x=rnorm(100)) d$y=d$x+rnorm(100) head(d) cor(d) by(d[,-1],d['sex'],function(df)cor(df)) You might also want to look at the doBy package -- View this message in context: http://n4.nabble.com/Mutliple-sets-of-data-in-one-dataset-Need-a-loop-tp1018503p1018616.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SPLUS Seqtrial vs. R Packages for sequential clinical trials designs
You'll probably want to take a look at the CRAN Task View, http://cran.r-project.org/web/views/ClinicalTrials.html david freedman Paul Miller wrote: Hello Everyone,  Iâm a SAS user who has recently become interested in sequential clinical trials designs. Iâve discovered that the SAS based approaches for these designs are either too costly or are âexperimental.â So now Iâm looking for alternative software. Two programs that seem promising are SPLUS Seqtrial and R.  I recently obtained a 30 day trial for the SPLUS Seqtrial add-on and have worked my way through most of the examples in the manual. Iâve also gotten access to R, installed a package called gsDesign, and worked through most of the examples in its documentation.  Although I donât yet have a good understanding of the various approaches to sequential clinical trials designs, I thought that the gsDesign package seemed very impressive. I also understand that there are several other R packages that relate to sequential clinical trials designs, such as AGSDest , GroupSeq, ldbounds, MChtest, PwrGSD, and Seqmon. Some of these seem fairly comprehensive while others seem to focus on a single approach.  My questions center on the adequacy of SPLUS Seqtrial and the R Packages. I was wondering if there is anyone out there who would be familiar enough with these to comment on their relative merits. Will SPLUS Seqtrial or the R packages allow me to do all the designs Iâm ever likely to need? If I pay for SPLUS Seqtrial, will I get anything that I canât get using the various R packages? Are any of the R packages comprehensive? Or would it at least be possible to cover all the types of designs that are commonly used by employing a variety of R packages? What kind of validation work generally goes into an R package and how would this likely compare to the sort of validation work that has gone into Seqtrial?  There may be other questions that I should be asking but havenât thought of.  At any rate, if some of you would be willing to share some advice or insights Iâd greatly appreciate it.  Thanks,  Paul __ The new Internet Explorer® 8 - Faster, safer, easier. Optimized for Y er/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://n4.nabble.com/SPLUS-Seqtrial-vs-R-Packages-for-sequential-clinical-trials-designs-tp966071p966780.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Partial correlations and p-values
you might look at partial.r in the psych package dadrivr wrote: I'm trying to write code to calculate partial correlations (along with p-values). I'm new to R, and I don't know how to do this. I have searched and come across different functions, but I haven't been able to get any of them to work (for example, pcor and pcor.test from the ggm package). In the following example, I am trying to compute the correlation between x and y, while controlling for z (partial correlation): x - c(1,20,14,7,9) y - c(5,6,7,9,10) z - c(13,27,16,5,4) What function can I append to this to find this partial correlation? Many thanks! -- View this message in context: http://n4.nabble.com/Partial-correlations-and-p-values-tp908641p949283.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to duplicate each row in a data.frame?
I *think* this is from from 'StatsRUs' - how about as.data.frame(lapply(df,function(x)rep(x,n))) hth, david freedman pengyu.ut wrote: I want to duplicate each line in 'df' 3 times. But I'm confused why 'z' is a 6 by 4 matrix. Could somebody let me know what the correct way is to duplicate each row of a data.frame? df=expand.grid(x1=c('a','b'),x2=c('u','v')) n=3 z=apply(df,1 ,function(x){ result=do.call(rbind,rep(list(x),n)) result } ) z __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://n4.nabble.com/How-to-duplicate-each-row-in-a-data-frame-tp948830p948839.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted descriptives by levels of another variables
In addition to using the survey package (and the svyby function), I've found that many of the 'weighted' functions, such as wtd.mean, work well with the plyr package. For example, wtdmean=function(df)wtd.mean(df$obese,df$sampwt); ddply(mydata, ~cut2(age,c(2,6,12,16)),'wtdmean') hth, david freedman Andrew Miles-2 wrote: I've noticed that R has a number of very useful functions for obtaining descriptive statistics on groups of variables, including summary {stats}, describe {Hmisc}, and describe {psych}, but none that I have found is able to provided weighted descriptives of subsets of a data set (ex. descriptives for both males and females for age, where accurate results require use of sampling weights). Does anybody know of a function that does this? What I've looked at already: I have looked at describe.by {psych} which will give descriptives by levels of another variable (eg. mean ages of males and females), but does not accept sample weights. I have also looked at describe {Hmisc} which allows for weights, but has no functionality for subdivision. I tried using a by() function with describe{Hmisc}: by(cbind(my, variables, here), division.variable, describe, weights=weight.variable) but found that this returns an error message stating that the variables to be described and the weights variable are not the same length: Error in describe.vector(xx, nam[i], exclude.missing = exclude.missing, : length of weights must equal length of x In addition: Warning message: In present !is.na(weights) : longer object length is not a multiple of shorter object length This comes because the by() function passes down a subset of the variables to be described to describe(), but not a subset of the weights variable. describe() then searches the whatever data set is attached in order to find the weights variables, but this is in its original (i.e. not subsetted) form. Here is an example using the ChickWeight dataset that comes in the datasets package. data(ChickWeight) attach(ChickWeight) library(Hmisc) #this gives descriptive data on the variables Time and Chick by levels of Diet) by(cbind(Time, Chick), Diet, describe) #trying to add weights, however, does not work for reasons described above wgt=rnorm(length(Chick), 12, 1) by(cbind(Time, Chick), Diet, describe, weights=wgt) Again, my question is, does anybody know of a function that combines both the ability to provided weighted descriptives with the ability to subdivide by the levels of some other variable? Andrew Miles Department of Sociology Duke University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://old.nabble.com/Weighted-descriptives-by-levels-of-another-variables-tp26354665p26355885.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted descriptives by levels of another variables
In addition to using the survey package (and the svyby function), I've found that many of the 'weighted' functions, such as wtd.mean, work well with the plyr package. For example, wtdmean=function(df)wtd.mean(df$obese,df$sampwt); ddply(mydata, ~cut2(age,c(2,6,12,16)),'wtdmean') hth, david freedman Andrew Miles-2 wrote: I've noticed that R has a number of very useful functions for obtaining descriptive statistics on groups of variables, including summary {stats}, describe {Hmisc}, and describe {psych}, but none that I have found is able to provided weighted descriptives of subsets of a data set (ex. descriptives for both males and females for age, where accurate results require use of sampling weights). Does anybody know of a function that does this? What I've looked at already: I have looked at describe.by {psych} which will give descriptives by levels of another variable (eg. mean ages of males and females), but does not accept sample weights. I have also looked at describe {Hmisc} which allows for weights, but has no functionality for subdivision. I tried using a by() function with describe{Hmisc}: by(cbind(my, variables, here), division.variable, describe, weights=weight.variable) but found that this returns an error message stating that the variables to be described and the weights variable are not the same length: Error in describe.vector(xx, nam[i], exclude.missing = exclude.missing, : length of weights must equal length of x In addition: Warning message: In present !is.na(weights) : longer object length is not a multiple of shorter object length This comes because the by() function passes down a subset of the variables to be described to describe(), but not a subset of the weights variable. describe() then searches the whatever data set is attached in order to find the weights variables, but this is in its original (i.e. not subsetted) form. Here is an example using the ChickWeight dataset that comes in the datasets package. data(ChickWeight) attach(ChickWeight) library(Hmisc) #this gives descriptive data on the variables Time and Chick by levels of Diet) by(cbind(Time, Chick), Diet, describe) #trying to add weights, however, does not work for reasons described above wgt=rnorm(length(Chick), 12, 1) by(cbind(Time, Chick), Diet, describe, weights=wgt) Again, my question is, does anybody know of a function that combines both the ability to provided weighted descriptives with the ability to subdivide by the levels of some other variable? Andrew Miles Department of Sociology Duke University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://old.nabble.com/Weighted-descriptives-by-levels-of-another-variables-tp26354665p26355886.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using something like the by command, but on rows instead of columns
Some variation of the following might be want you want: df=data.frame(sex=sample(1:2,100,replace=T),snp.1=rnorm(100),snp.15=runif(100)) df$snp.1[df$snp.11.0]-NA; #put some missing values into the data x=grep('^snp',names(df)); x #which columns that begin with 'snp' apply(df[,x],2,summary) #or apply(df[,x],2,FUN=function(x)mean(x,na=T)) hth, david Josh B-3 wrote: Hello R Forum users, I was hoping someone could help me with the following problem. Consider the following toy dataset: AccessionSNP_CRY2SNP_FLCPhenotype 1NAA0.783143079 2BQA0.881714811 3BQA0.886619488 4AQB0.416893034 5AQB0.621392903 6ASB0.031719125 7ASNA0.652375037 Accession = individual plants, arbitrarily identified by unique numbers SNP_ = individual genes. SNP_CRY2 = the CRY2 gene. The plants either have the BQ, AQ, or AS genotype at the CRY2 gene. NA = missing data. SNP_FLC = the FLC gene. The plants either have the A or B genotype at the FLC gene. NA = missing data. Phenotype = a continuous variable of interest. I have a much larger number of columns corresponding to genes (i.e., more columns with the SNP_ prefix) in my real dataset. For each gene in turn (i.e., each SNP_ column), I would like to find the phenotypic variance for all of the plants with non-missing data. Note that the plants with missing genotype data (NA) differ for each gene (each SNP_ column). Would one of you be able to offer some specific code that could do this operation? Please rest assured that I am not a student trying to elicit help with a homework assignment. I am a post-doc with limited R skills, working with a large genetic dataset. Thanks very much in advance to a wonderful online community. Sincerely, Josh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://old.nabble.com/Using-something-like-the-%22by%22-command%2C-but-on-rows-instead-of-columns-tp26273840p26274373.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Within-group correlation confidence intervals
you should save your 3 variables into a new *dataframe* d-mydata[,c(iq,education,achievement)] and then the command would be by(d,d$sex,function(df) cor.test(df$educ,df$achiev)) but you could also just use by(mydata,mydata$sex,function(df) cor.test(df$educ,df$achiev)) david freedman jlwoodard wrote: I'm trying to obtain within-group correlations on a subset of variables. I first selected my variables using the following command: mydata$x-mydata[c(iq,education,achievement)] I'd like to look at correlations among those variables separately for men and women. My gender variable in mydata is coded 1 (women) and 0 (men). I have successfully used the following to get within group correlations and p values: by(x,gender,function(x) rcorr(as.matrix(x))) However, I'm also interested in getting confidence intervals for the correlations as well, using cor.test. I tried the following without success. by(x,gender,function(x) cor.test(as.matrix(x))) Even if I just use 2 variables (e.g., IQ and education), I get exactly the same output for men and women with this command: by(x,gender,function(x) cor.test(iq,education)) I'm still in the learning stages with the by and cor.test functions, so I assume I'm using it incorrectly. Is it possible to get the correlation confidence intervals for each group using this approach? Many thanks in advance! John -- View this message in context: http://www.nabble.com/Within-group-correlation-confidence-intervals-tp25509629p25530117.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] could not find function Varcov after upgrade of R?
I've had the same problem with predict.Design, and have sent an email to the maintainer of the Design package at Vanderbilt University. I wasn't even able to run the examples given on the help page of predict.Design - I received the same error about Varcov that you did. I *think* it's a problem with the package, rather than R 2.9.2, and I hope the problem will soon be fixed. I was able to use predict.Design with 2.9.2 until I updated the Design package a few days ago. david freedman zhu yao wrote: I uses the Design library. take this example: library(Design) n - 1000 set.seed(731) age - 50 + 12*rnorm(n) label(age) - Age sex - factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4))) cens - 15*runif(n) h - .02*exp(.04*(age-50)+.8*(sex=='Female')) dt - -log(runif(n))/h label(dt) - 'Follow-up Time' e - ifelse(dt = cens,1,0) dt - pmin(dt, cens) units(dt) - Year dd - datadist(age, sex) options(datadist='dd') Srv - Surv(dt,e) f - cph(Srv ~ rcs(age,4) + sex, x=TRUE, y=TRUE) cox.zph(f, rank) # tests of PH anova(f) # Error in anova.Design(f) : could not find function Varcov Yao Zhu Department of Urology Fudan University Shanghai Cancer Center No. 270 Dongan Road, Shanghai, China 2009/9/12 Ronggui Huang ronggui.hu...@gmail.com I cannot reproduce the problem you mentioned. ctl - c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt - c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group - gl(2,10,20, labels=c(Ctl,Trt)) weight - c(ctl, trt) anova(lm.D9 - lm(weight ~ group)) sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=Chinese (Simplified)_People's Republic of China.936;LC_CTYPE=Chinese (Simplified)_People's Republic of China.936;LC_MONETARY=Chinese (Simplified)_People's Republic of China.936;LC_NUMERIC=C;LC_TIME=Chinese (Simplified)_People's Republic of China.936 attached base packages: [1] stats graphics grDevices utils datasets methods base 2009/9/12 zhu yao mailzhu...@gmail.com: After upgrading R to 2.9.2, I can't use the anova() fuction. It says could not find function Varcov . What's wrong with my computer? Help needed, thanks! Yao Zhu Department of Urology Fudan University Shanghai Cancer Center No. 270 Dongan Road, Shanghai, China [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- HUANG Ronggui, Wincent Doctoral Candidate Dept of Public and Social Administration City University of Hong Kong Home page: http://asrr.r-forge.r-project.org/rghuang.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/could-not-find-function-%22Varcov%22-after-upgrade-of-R--tp25412881p25414017.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] could not find function Varcov after upgrade of R?
Thanks for the reminder - actually I think I've become sloppy about the 'T' (and the order of loading the Hmisc and Design packages), but that doesn't seem to be the problem. Also, all of my packages have been updated david Carlos Alzola wrote: Did you type library(Hmisc,T) before loading Design? Carlos -- From: David Freedman 3.14da...@gmail.com Sent: Saturday, September 12, 2009 8:26 AM To: r-help@r-project.org Subject: Re: [R] could not find function Varcov after upgrade of R? I've had the same problem with predict.Design, and have sent an email to the maintainer of the Design package at Vanderbilt University. I wasn't even able to run the examples given on the help page of predict.Design - I received the same error about Varcov that you did. I *think* it's a problem with the package, rather than R 2.9.2, and I hope the problem will soon be fixed. I was able to use predict.Design with 2.9.2 until I updated the Design package a few days ago. david freedman zhu yao wrote: I uses the Design library. take this example: library(Design) n - 1000 set.seed(731) age - 50 + 12*rnorm(n) label(age) - Age sex - factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4))) cens - 15*runif(n) h - .02*exp(.04*(age-50)+.8*(sex=='Female')) dt - -log(runif(n))/h label(dt) - 'Follow-up Time' e - ifelse(dt = cens,1,0) dt - pmin(dt, cens) units(dt) - Year dd - datadist(age, sex) options(datadist='dd') Srv - Surv(dt,e) f - cph(Srv ~ rcs(age,4) + sex, x=TRUE, y=TRUE) cox.zph(f, rank) # tests of PH anova(f) # Error in anova.Design(f) : could not find function Varcov Yao Zhu Department of Urology Fudan University Shanghai Cancer Center No. 270 Dongan Road, Shanghai, China 2009/9/12 Ronggui Huang ronggui.hu...@gmail.com I cannot reproduce the problem you mentioned. ctl - c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt - c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group - gl(2,10,20, labels=c(Ctl,Trt)) weight - c(ctl, trt) anova(lm.D9 - lm(weight ~ group)) sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=Chinese (Simplified)_People's Republic of China.936;LC_CTYPE=Chinese (Simplified)_People's Republic of China.936;LC_MONETARY=Chinese (Simplified)_People's Republic of China.936;LC_NUMERIC=C;LC_TIME=Chinese (Simplified)_People's Republic of China.936 attached base packages: [1] stats graphics grDevices utils datasets methods base 2009/9/12 zhu yao mailzhu...@gmail.com: After upgrading R to 2.9.2, I can't use the anova() fuction. It says could not find function Varcov . What's wrong with my computer? Help needed, thanks! Yao Zhu Department of Urology Fudan University Shanghai Cancer Center No. 270 Dongan Road, Shanghai, China [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- HUANG Ronggui, Wincent Doctoral Candidate Dept of Public and Social Administration City University of Hong Kong Home page: http://asrr.r-forge.r-project.org/rghuang.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/could-not-find-function-%22Varcov%22-after-upgrade-of-R--tp25412881p25414017.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/could-not-find-function-%22Varcov%22-after-upgrade-of-R--tp25412881p25416444.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
A better subject for your question might have been helpful. There are many options for hist and truehist in the MASS package, but this might help: x=rnorm(100, mean=5, sd=3000) hist(x, prob=T) x2=density(x) lines(x2$x,x2$y) KABELI MEFANE wrote: Dear All I hope you can help me with this small problem. I want to draw a normal distribution line to this data: p-rnorm(100, mean=5, sd=3000) hist(p) Kabeli [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/%28no-subject%29-tp25418417p25418513.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculate weighted mean for each group
After you fix your data frame and if you don't using 2 packages, you might try something like: lib(plyr) #for 'by' processing lib(Hmisc) # for its wtd.mean function d=data.frame(x=c(15,12,3,10,10),g=c(1,1,2,2,3),w=c(2,1,5,2,5)) ; d ddply(d,~g,function(df) wtd.mean(df$x,df$w)) milton ruser wrote: try your first reproducible line first :-) On Thu, Jul 23, 2009 at 5:18 PM, Alexis Maluendas avmaluend...@gmail.comwrote: Hi R experts, I need know how calculate a weighted mean by group in a data frame. I have tried with aggragate() function: data.frame(x=c(15,12,3,10,10),g=c(1,1,1,2,2,3,3),w=c(2,3,1,5,5,2,5)) - d aggregate(d$x,by=list(d$g),weighted.mean,w=d$w) Generating the following error: Error en FUN(X[[1L]], ...) : 'x' and 'w' must have the same length Thanks in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Calculate-weighted-mean-for-each-group-tp24634873p24635837.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can I use mcnemar.test for 3*3 tables (or is there a bug in the command?)
There is a function mh_test in the coin package. library(coin) mh_test(tt) The documentation states, The null hypothesis of independence of row and column totals is tested. The corresponding test for binary factors x and y is known as McNemar test. For larger tables, Stuart’s W0 statistic (Stuart, 1955, Agresti, 2002, page 422, also known as Stuart-Maxwell test) is computed. hth, david freedman Tal Galili wrote: Hello all, I wish to perform a mcnemar test for a 3 by 3 matrix. By running the slandered R command I am getting a result but I am not sure I am getting the correct one. Here is an example code: (tt - as.table(t(matrix(c(1,4,1, 0,5,5, 3,1,5), ncol = 3 mcnemar.test(tt, correct=T) #And I get: McNemar's Chi-squared test data: tt McNemar's chi-squared = 7.6667, df = 3, p-value = *0.05343* Now I was wondering if the test I just performed is the correct one. From looking at the Wikipedia article on mcnemar ( http://en.wikipedia.org/wiki/McNemar's_test), it is said that: The Stuart-Maxwell testhttp://ourworld.compuserve.com/homepages/jsuebersax/mcnemar.htm is different generalization of the McNemar test, used for testing marginal homogeneity in a square table with more than two rows/columns From searching for a Stuart-Maxwell testhttp://ourworld.compuserve.com/homepages/jsuebersax/mcnemar.htm in google, I found an algorithm here: http://www.m-hikari.com/ams/ams-password-2009/ams-password9-12-2009/abbasiAMS9-12-2009.pdf From running this algorithm I am getting a different P value, here is the (somewhat ugly) code I produced for this: get.d - function(xx) { length1 - dim(xx)[1] ret1 - margin.table(xx,1) - margin.table(xx,2) return(ret1) } get.s - function(xx) { the.s - xx for( i in 1:dim(xx)[1]) { for(j in 1:dim(xx)[2]) { if(i == j) { the.s[i,j] - margin.table(xx,1)[i] + margin.table(xx,2)[i] - 2*xx[i,i] } else { the.s[i,j] - -(xx[i,j] + xx[j,i]) } } } return(the.s) } chi.statistic - t(get.d(tt)[-3]) %*% solve(get.s(tt)[-3,-3]) %*% get.d(tt)[-3] paste(the P value:, pchisq(chi.statistic, 2)) #and the result was: the P value: 0.268384371053358 So to summarize my questions: 1) can I use mcnemar.test for 3*3 (or more) tables ? 2) if so, what test is being performed ( Stuart-Maxwellhttp://ourworld.compuserve.com/homepages/jsuebersax/mcnemar.htm) ? 3) Do you have a recommended link to an explanation of the algorithm employed? Thanks, Tal -- -- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: http://www.r-statistics.com/ http://www.talgalili.com http://www.biostatistics.co.il [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Can-I-use-%22mcnemar.test%22-for-3*3-tables-%28or-is-there-a-bug-in-the-command-%29-tp24556414p24556693.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help With Fleiss Kappa
If you apply the function to a simple dataframe or show your code, you might be able to get more accurate help. I've used the IRR package in the past and haven't noticed any problems (maybe I overlooked them ?) david freedman mehdi ebrahim wrote: Hi All, I am using fleiss kappa for inter rater agreement. Are there any know issues with Fleiss kappa calculation in R? Even when I supply mock data with total agreement among the raters I do not get a kappa value of 1. instead I am getting negative values. I am using the irr package version 0.70 Any help is much appreciated. Thanks and Regards M [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Help-With-Fleiss-Kappa-tp24456963p24461821.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stratified data summaries
You might also want to look at the doBy package - one function is summaryBy: summaryBY(var1 + var2 ~ patient_type, data=d, FUN=summary) david freedman Hayes, Rachel M wrote: Hi All, I'm trying to automate a data summary using summary or describe from the HMisc package. I want to stratify my data set by patient_type. I was hoping to do something like: Describe(myDataFrame ~ patient_type) I can create data subsets and run the describe function one at a time, but there's got to be a better way. Any suggestions? Rachel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Stratified-data-summaries-tp24419184p24449565.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiplication of data frame with vector
Would the scale function work for this? Something like new=scale(df, center=T) HTH, david freedman cir p wrote: Dear group: sorry for my beginners question, but I'm rather new to R and was searching high and low without success: I have a data frame (df) with variables in the rows and observations in the columns like (the actual data frame has 15 columns and 1789 rows): early1 early2 early3 early4 early5 M386T1000 57056 55372 58012 55546 57309 M336T9011063 10312 10674 10840 11208 M427T9112064 11956 12692 12340 11924 M429T91 4594 3890 4096 4019 4204 M447T9026553 27647 26889 26751 26929 Now I'm trying to transform each value column-wise to make columns to all have the same mean with: df * mean(mean(df)) / mean(df). I just can't get my head around this: mean(df) gives me the correct column means vector, and mean(mean(df)) gives me the correct total mean. The above operation works correctly for individual rows, i.e. if I do df[1,]*mean(mean(df))/mean(df) Can someone tell me what I am doing wrong?? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Multiplication-of-data-frame-with-vector-tp24368878p24372764.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SAS-like method of recoding variables?
Frank, would you feel comfortable giving us the reference to the NEJM article with the 'missing vs ' error ? I'm sure that things like this happen fairly often, and I'd like to use this example in teaching thanks, david freedman Frank E Harrell Jr wrote: Dieter Menne wrote: P.Dalgaard wrote: IF TYPE='TRUCK' and count=12 THEN VEHICLES=TRUCK+((CAR+BIKE)/2.2); vehicles - ifelse(TYPE=='TRUCK' count=12, TRUCK+((CAR+BIKE)/2.2), NA) Read both versions to an audience, and you will have to admit that this is one of the cases where SAS is superior. Here's a case where SAS is clearly not superior: IF type='TRUCK' AND count12 THEN vehicles=truck+(car+bike)/2.2; If count is missing, the statement is considered TRUE and the THEN is executed. This is because SAS considers a missing as less than any number. This resulted in a significant error, never corrected, in a widely cited New England Journal of Medicine paper. Frank Dieter -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/SAS-like-method-of-recoding-variables--tp24152845p24165879.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] long format - find age when another variable is first 'high'
Dear R, I've got a data frame with children examined multiple times and at various ages. I'm trying to find the first age at which another variable (LDL-Cholesterol) is = 130 mg/dL; for some children, this may never happen. I can do this with transformBy and ddply, but with 10,000 different children, these functions take some time on my PCs - is there a faster way to do this in R? My code on a small dataset follows. Thanks very much, David Freedman d-data.frame(id=c(rep(1,3),rep(2,2),3),age=c(5,10,15,4,7,12),ldlc=c(132,120,125,105,142,160)) d$high.ldlc-ifelse(d$ldlc=130,1,0) d library(plyr) d2-ddply(d,~id,transform,plyr.minage=min(age[high.ldlc==1])); library(doBy) d2-transformBy(~id,da=d2,doby.minage=min(age[high.ldlc==1])); d2 -- View this message in context: http://www.nabble.com/long-format---find-age-when-another-variable-is-first-%27high%27-tp23706393p23706393.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data summary and some automated t.tests.
You might want to try using a non-parametric test, such as wilcox.test. How about some modification of the following: d=data.frame(grp=rep(1:2,e=5),replicate(10,rnorm(100))); head(d) lapply(d[,-1],function(.column)wilcox.test(.column~grp,data=d)) David Freedman stephen sefick wrote: Up and down are the treatments. These are replicates within date for percent cover of habiat. This is habitat data for a stream restoration - up is the unrestored and dn is the restored. I have looked at the density plots and they do not look gaussian - you are absolutely right. Even log(n+1) transformed they do not look Gaussian. Is there some other way that I would test for a difference that you can think of? My thoughts were to run a Permutation t.test, but I am very new to permutations, and don't know if this applies. The other thing that I was thinking was to use a npmanova (adonis in vegan) to test if the centroids of the habitat classifications were different. I am in the process of working up my thesis data for publication in a journal (there are other very interesting pieces to the data set that I am working with, and this is one of the last things that I need to wrap up before I can start editing/rewriting my masters work). Any thoughts would be greatly appreciated. thanks, Stephen Sefick 2009/5/16 Uwe Ligges lig...@statistik.tu-dortmund.de: stephen sefick wrote: I would like to preform a t.test to each of the measured variables (sand.silt etc.) I am a big fan of applying t.test()s, but in this case: Are you really sure? The integers and particularly boxplot(x) do not indicate very well that the variables are somehow close to Gaussian ... with a mean and sd for each of the treatments And what is the treatment??? Best, Uwe Ligges (up or down), and out put this as a table I am having a hard time starting- maybe it is to close to lunch. Any suggestions would be greatly appreciated. Stephen Sefick x - (structure(list(sample. = structure(c(1L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 2L, 3L, 4L, 5L, 6L, 1L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 2L, 3L, 4L, 5L, 6L, 25L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 26L, 25L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 26L, 27L, 25L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 26L, 15L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 16L, 15L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 16L, 36L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 37L, 36L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 37L, 38L), .Label = c(0805-r1, 0805-r10, 0805-r11, 0805-r12, 0805-r13, 0805-r14, 0805-r2, 0805-r3, 0805-r4, 0805-r5, 0805-r6, 0805-r7, 0805-r8, 0805-r9, 0805-u1, 0805-u10, 0805-u2, 0805-u3, 0805-u4, 0805-u5, 0805-u6, 0805-u7, 0805-u8, 0805-u9, 1005-r1, 1005-r10, 1005-r11, 1005-r2, 1005-r3, 1005-r4, 1005-r5, 1005-r6, 1005-r7, 1005-r8, 1005-r9, 1005-u1, 1005-u10, 1005-u11, 1005-u2, 1005-u3, 1005-u4, 1005-u5, 1005-u6, 1005-u7, 1005-u8, 1005-u9 ), class = factor), date = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c(10/1/05, 8/29/05), class = factor), Replicate = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L ), site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(dn, up ), class = factor), sand.silt = c(20L, 45L, 90L, 21L, 80L, 77L, 30L, 80L, 36L, 9L, 62L, 71L, 20L, 65L, 10L, 70L, 50L, 80L, 90L, 97L, 94L, 82L, 30L, 10L, 65L, 80L, 90L, 70L, 10L, 50L, 60L, 40L, 10L, 45L, 10L, 10L, 15L, 10L, 8L, 35L, 10L, 40L, 10L, 10L, 28L, 5L, 45L, 35L, 2L, 10L, 40L, 2L, 70L, 40L, 20L, 30L, 50L, 60L, 10L, 100L, 98L, 98L, 90L, 87L, 87L, 40L, 97L, 92L, 70L, 50L, 81L, 35L, 70L, 89L, 28L, 28L, 82L, 81L, 33L, 80L, 40L, 40L, 60L, 30L, 5L, 50L, 70L, 75L, 85L, 95L, 93L, 80L, 80L, 60L, 82L, 60L, 5L, 70L, 80L, 40L), gravel = c(8L, 45L, 7L, 5L, 10L, 5L, 35L, 7L, 45L, 60L, 0L, 0L, 5L, 8L, 25L, 0L, 45L, 15L, 0L, 1L, 2L, 5L, 6L, 15L, 10L, 5L, 3L, 10L
Re: [R] Help with loops
I'm not quite sure what you want to do, but this might help: d=data.frame(replicate(40, rnorm(20))) d$sample=rep(c('a','b','c','d'),each=5) lib(doBy) summaryBy(.~sample,da=d) David Freedman Amit Patel-7 wrote: Hi I am trying to create a loop which averages replicates in my data. The original data has many rows. and consists of 40 column zz[,2:41] plus row headings in zz[,1] I am trying to average each set of values (i.e. zz[1,2:3] averaged and placed in average_value[1,2] and so on. below is my script but it seems to be stuck in an endless loop Any suggestions?? for (i in 1:length(average_value[,1])) { average_value[i] - i^100; print(average_value[i]) #calculates Meanss #Sample A average_value[i,2] - rowMeans(zz[i,2:3]) average_value[i,3] - rowMeans(zz[i,4:5]) average_value[i,4] - rowMeans(zz[i,6:7]) average_value[i,5] - rowMeans(zz[i,8:9]) average_value[i,6] - rowMeans(zz[i,10:11]) #Sample B average_value[i,7] - rowMeans(zz[i,12:13]) average_value[i,8] - rowMeans(zz[i,14:15]) average_value[i,9] - rowMeans(zz[i,16:17]) average_value[i,10] - rowMeans(zz[i,18:19]) average_value[i,11] - rowMeans(zz[i,20:21]) #Sample C average_value[i,12] - rowMeans(zz[i,22:23]) average_value[i,13] - rowMeans(zz[i,24:25]) average_value[i,14] - rowMeans(zz[i,26:27]) average_value[i,15] - rowMeans(zz[i,28:29]) average_value[i,16] - rowMeans(zz[i,30:31]) #Sample D average_value[i,17] - rowMeans(zz[i,32:33]) average_value[i,18] - rowMeans(zz[i,34:35]) average_value[i,19] - rowMeans(zz[i,36:37]) average_value[i,20] - rowMeans(zz[i,38:39]) average_value[i,21] - rowMeans(zz[i,40:41]) } thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Help-with-loops-tp23558647p23560599.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] by-group processing
sorry about the mistake - the -data$Type doesn't work: the '-' sign isn't valid for factors. I *thought* I had checked this before submitting a response ! HufferD wrote: On Thursday, May 07, 2009 7:45 PM, David Freedman wrote: ...how about: d=data[order(data$ID,-data$Type),] d[!duplicated(d$ID),] Does the -data$Type argument to the order function work? -- David - David Huffer, Ph.D. Senior Statistician CSOSA/Washington, DC david.huf...@csosa.gov __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/by-group-processing-tp23417208p23453688.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] by-group processing
how about: d=data[order(data$ID,-data$Type),] d[!duplicated(d$ID),] Max Webber wrote: Given a dataframe like data ID Type N 1 45900A 1 2 45900B 2 3 45900C 3 4 45900D 4 5 45900E 5 6 45900F 6 7 45900I 7 8 49270A 1 9 49270B 2 10 49270E 3 18 46550A 1 19 46550B 2 20 46550C 3 21 46550D 4 22 46550E 5 23 46550F 6 24 46550I 7 containing an identifier (ID), a variable type code (Type), and a running count of the number of records per ID (N), how can I return a dataframe of only those records with the maximum value of N for each ID? For instance, data ID Type N 7 45900I 7 10 49270E 3 24 46550I 7 I know that I can use tapply ( data $ N , data $ ID , max ) 45900 46550 49270 7 7 3 to get the values of the maximum N for each ID, but how is it that I can find the index of these values to subsequently use to subscript data? -- maxine-webber __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/by-group-processing-tp23417208p23437592.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems producing a simple plot
you might also look at matplot d=data.frame(x=1:10,y1=sort(rnorm(10)),y2=sort(rnorm(10))) matplot(d[,1],d[,2:3],type='l',lty=1:2) David Freedman Steve Murray-3 wrote: Dear R Users, I have a data frame of the nature: head(aggregate_1986) Latitude Mean Annual Simulated Runoff per 1° Latitudinal Band 1 -55574.09287 2 -54247.23078 3 -53103.40756 4 -52 86.1 5 -51 45.21980 6 -50 55.92928 Mean Annual Observed Runoff per 1° Latitudinal Band 1491.9525 2319.6592 3237.8222 4179.5391 5105.2968 6136.6124 I am hoping to plot columns 2 and 3 against Latitude. I understand that you have to do this by plotting one column at a time, so I have been starting by attempting the following, but receiving errors: plot(aggregate_1986[1] ~ aggregate_1986[2], type=l) Error in model.frame.default(formula = aggregate_1986[1] ~ aggregate_1986[2], : invalid type (list) for variable 'aggregate_1986[1]' Or... plot(aggregate_1986[1],aggregate_1986[2], type=l) Error in stripchart.default(x1, ...) : invalid plotting method I'm obviously doing something fundamentally wrong (!). I'd be grateful for any suggestions as to how I should go about this. Many thanks, Steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Problems-producing-a-simple-plot-tp23347296p23348181.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sequence number for 'long format'
Dear R-help list, I've got a data set in long format - each subject can have several (varying in number) measurements, with each record representing one measurement. I want to assign a sequence number to each measurement, starting at 1 for a person's first measurement. I can do this with the by function, but there must be an easier way. Here's my code - id is id number, age is the age of the person, and seq is the sequence variable that I've created. Thanks very much for the help. david freedman, atlanta ds=data.frame(list(id = c(1L, 1L, 1L, 1L, 8L, 8L, 16L, 16L, 16L, 16L, 16L, 19L, 32L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 79L, 79L, 80L, 80L, 80L, 80L, 85L, 86L, 96L, 96L, 96L, 103L, 103L, 106L, 106L, 106L, 106L, 106L, 106L, 106L, 140L, 140L, 144L, 144L, 144L, 144L, 144L, 144L, 144L, 146L, 146L, 146L, 146L, 160L, 160L, 160L, 160L, 160L, 160L, 164L, 164L, 176L, 176L, 176L, 176L, 176L, 176L, 176L, 176L, 181L, 190L, 192L, 192L, 192L, 192L, 192L, 192L, 197L, 197L, 197L, 224L, 224L, 224L, 229L, 232L, 232L, 232L, 232L, 232L, 232L, 232L, 249L, 249L), age = c(6.6054794521, 9.301369863, 22.638356164, 31.961670089, 17.15890411, 25.106091718, 8.197260274, 11.295890411, 14.191780822, 22.43394935, 28.6, 6.6794520548, 10.824657534, 10.479452055, 13.432876712, 15.408219178, 17.643835616, 19.268493151, 22.624657534, 26.139726027, 35.493497604, 37.6, 15.895890411, 23.351129363, 13.810958904, 16.783561644, 17.95890411, 22.430136986, 12.021902806, 14.904859685, 7.4219178082, 10.060273973, 15.802739726, 17.328767123, 31.028062971, 8.3945205479, 10.350684932, 13.783561644, 17.843835616, 21.816438356, 27.901437372, 34.3, 10.517808219, 18.18630137, 11.378082192, 14.794520548, 16.77260274, 23.101369863, 27.912328767, 34.316221766, 40.2, 8.6054794521, 11.561643836, 14.863013699, 17.835616438, 8.0219178082, 9, 9.9726027397, 10.690410959, 13.032876712, 30.138261465, 7.0602739726, 10.438356164, 8.9232876712, 9.9589041096, 10.915068493, 12.263013699, 14.257534247, 17.326027397, 18.454794521, 21.334246575, 45.190965092, 8.5643835616, 12.197260274, 15.405479452, 17.106849315, 27.843835616, 34.417522245, 39.9, 6.7890410959, 10.21369863, 15.857534247, 10.147945205, 13.473972603, 36.06844627, 17.331506849, 14.980821918, 15.939726027, 16.939726027, 17.619178082, 18.698630137, 37.084188912, 43.3, 7.7068493151, 10.726027397))) head(ds,10) x=with(ds,by(ds,list(id),FUN=function(dc)1:length(dc$age))); x[1:20]; ds$seq=unlist(x); head(ds,20) -- View this message in context: http://www.nabble.com/sequence-number-for-%27long-format%27-tp23338043p23338043.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] piecewise linear regression
Hi - I'd like to construct and plot the percents by year in a small data set (d) that has values between 1988 and 2007. I'd like to have a breakpoint (buy no discontinuity) at 1996. Is there a better way to do this than in the code below? d year percent se 1 198830.6 0.32 2 198931.5 0.31 3 199030.9 0.30 4 199130.6 0.28 5 199229.3 0.25 6 199430.3 0.26 7 199629.9 0.24 8 199828.4 0.22 9 200027.8 0.22 10 200126.1 0.20 11 200225.1 0.19 12 200324.4 0.19 13 200423.7 0.19 14 200525.1 0.18 15 200623.9 0.20 dput(d) structure(list(year = c(1988L, 1989L, 1990L, 1991L, 1992L, 1994L, 1996L, 1998L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L), percent = c(30.6, 31.5, 30.9, 30.6, 29.3, 30.3, 29.9, 28.4, 27.8, 26.1, 25.1, 24.4, 23.7, 25.1, 23.9, 23.9), se = c(0.32, 0.31, 0.3, 0.28, 0.25, 0.26, 0.24, 0.22, 0.22, 0.2, 0.19, 0.19, 0.19, 0.18, 0.2, 0.18)), .Names = c(year, percent, se), class = data.frame, row.names = c(NA, -16L)) with(d,plot(year,percent,pch=16,xlim=c(1988,2007))) m=lm(percent~ year + I(year-1996):I(year = 1996), weights=1/se, subset=year=1988, da=d); points(d$year,predict(m,dafr(year=d$year)),type='l',lwd=2,col='red') thanks very much David Freedman -- View this message in context: http://www.nabble.com/piecewise-linear-regression-tp22388118p22388118.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] beginner's question: group of regressors by name vector?
The predictors and outcomes in lm can be matrices, so you could use something like the following: x.mat=cbind(x1=rnorm(20),x2=rnorm(20)) y.mat=cbind(y1=rnorm(20),y2=rnorm(20)) lm(y.mat~x.mat) David Freedman ivowel wrote: dear r-experts: there is probably a very easy way to do it, but it eludes me right now. I have a large data frame with, say, 26 columns named a through z. I would like to define sets of regressors from this data frame. something like myregressors=c(b, j, x) lm( l ~ myregressors, data=... ) is the best way to create new data frames that contain all the variables I want, then use ., and then destroy them again? or am I overlooking something obvious? sincerely, /iaw [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/beginner%27s-question%3A-group-of-regressors-by-name-vector--tp21979180p21979495.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R equivalent of SAS Cochran-Mantel-Haenszel tests?
How about 'cmh_test' in the coin package? From the PDF: The null hypothesis of the independence of y and x is tested, block defines an optional factor for stratification. chisq_test implements Pearson’s chi-squared test, cmh_test the Cochran-Mantel-Haenzsel test and lbl_test the linear-by-linear association test for ordered data. David Freedman Michael Friendly wrote: In SAS, for a two-way (or 3-way, stratified) table, the CMH option in SAS PROC FREQ gives 3 tests that take ordinality of the factors into account, for both variables, just the column variable or neither. Is there an equivalent in R? The mantelhaen.test in stats gives something quite different (a test of conditional independence for *nominal* factors in a 3-way table). e.g. I'd like to reproduce: *-- CMH tests; proc freq data=sexfun order=data; weight count; tables husband * wife / cmh chisq nocol norow; run; The FREQ Procedure Summary Statistics for Husband by Wife Cochran-Mantel-Haenszel Statistics (Based on Table Scores) StatisticAlternative HypothesisDF Value Prob 1Nonzero Correlation1 10.01420.0016 2Row Mean Scores Differ 3 12.56810.0057 3General Association9 16.76890.0525 -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Streethttp://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/R-equivalent-of-SAS-Cochran-Mantel-Haenszel-tests--tp21914012p21918306.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Roc curves confidence intervals
Would you be interested in the cross-validation that's on pp 3 and 4 of the ROCR package PDF? plot(perf,lwd=3,avg=vertical,spread.estimate=boxplot,add=TRUE) there are various options for the 'spread.estimate' David Freedman marc bernard-2 wrote: Dear all, I am looking for an R package that allows me to calculate and plot the confidence intervals for the roc curve using for example some bootstrapping. I tried ROCR who seems doing such work but i couldn't find the right option in it. Many thanks Bests Marc _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Roc-curves-confidence-intervals-tp21850816p21851991.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] overlay plot question
You could change the second 'plot' to 'points' David Freedman David Kaplan-2 wrote: Greetings all, I have two logistic plots coming from two calls to plogis. The code is .x - seq(-7.6, 7.6, length=100) plot(.x, plogis(.x, location=0, scale=1), xlab=x, ylab=Density, main=Logistic Distribution: location = 0, scale = 1, type=l) abline(h=0, col=gray) .y - seq(-7.6, 7.6, length=100) plot(.x, plogis(.x, location=2, scale=4), xlab=x, ylab=Density, main=Logistic Distribution: location = 2, scale = 4, type=l) abline(h=0, col=gray) remove(.x) remove(.y) I would like to overlay these on one plot. Notice here the y-axis is different. But I would like to axis to be 0 to 1 as in the first plot. Any suggestions would be greatly appreciated. Thanks in advance, David -- === David Kaplan, Ph.D. Professor Department of Educational Psychology University of Wisconsin - Madison Educational Sciences, Room, 1061 1025 W. Johnson Street Madison, WI 53706 email: dkap...@education.wisc.edu homepage: http://www.education.wisc.edu/edpsych/default.aspx?content=kaplan.html Phone: 608-262-0836 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/overlay-plot-question-tp21841060p21843564.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with subsetting in compating one column with a vector
Do you really want to use '==' ? How about '%in%', as in findings1-subset(findings,SUBJECTSID %in% ==SUBJECTS1$SUBJECTSID, select=c(SUBJECTSID,ORGNUMRES)) David Freedman dvkirankumar wrote: Hi all, I got one problem withsubset()function hear i executed: findings1-subset(findings,SUBJECTSID==SUBJECTS1$SUBJECTSID,select=c(SUBJECTSID,ORGNUMRES)) hear SUBJECTS1$SUBJECTSID vector contains nearly 65 values the problem is after comparing and subsetting its not giving all the values related to that instead its giving randam values and giving warning that: Warning message: In SUBJECTSID == SUBJECTS1$SUBJECTSID : longer object length is not a multiple of shorter object length can any one suggest what I can do to retreave all the related data of subset and store in one object thanks in advance regards; kiran [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/problem-with-subsetting-in-compating-one-column-with-a-vector-tp21805448p21808625.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lapply and aggregate function
You might want to look at the doBy package For (1), you could use summaryBy(value~Light+Feed,data=myD, FUN=mean) and for (2), the transformBy function would be helpful David Freedman Patrick Hausmann wrote: Dear list, I have two things I am struggling... # First set.seed(123) myD - data.frame( Light = sample(LETTERS[1:2], 10, replace=T), Feed = sample(letters[1:5], 20, replace=T), value=rnorm(20) ) # Mean for Light myD$meanLight - unlist( lapply( myD$Light, function(x) mean( myD$value[myD$Light == x]) ) ) # Mean for Feed myD$meanFeed - unlist( lapply( myD$Feed, function(x) mean( myD$value[myD$Feed == x]) ) ) myD # I would like to get a new Var meanLightFeed # holding the Group-Mean for each combination (eg. A:a = 0.821581) # by(myD$value, list(myD$Light, myD$Feed), mean)[[1]] # Second set.seed(321) myD - data.frame( Light = sample(LETTERS[1:2], 10, replace=T), value=rnorm(20) ) w1 - tapply(myD$value, myD$Light, mean) w1 # w1 # A B # 0.4753412 -0.2108387 myfun - function(x) (myD$value w1[x] myD$value w1[x] * 1.5) I would like to have a TRUE/FALSE-Variable depend on the constraint in myfun for each level in Light... As always - thanks for any help!! Patrick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/lapply-and-aggregate-function-tp21811834p21812057.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to prevent duplications of data within a loop
or how about lm(myData$response~as.matrix(myData[2:4])) hth, david Juliet Hannah wrote: Hi All, I had posted a question on a similar topic, but I think it was not focused. I am posting a modification that I think better accomplishes this. I hope this is ok, and I apologize if it is not. :) I am looping through variables and running several regressions. I have reason to believe that the data is being duplicated because I have been monitoring the memory use on unix. How can I avoid this? Here is an example of how I am going about this. For lm, I also tried model=FALSE, but this did not seem to do the job. Any ideas? Thanks for your time. Regards, Juliet # create data set.seed(1) response - rnorm(50) var1 - rnorm(50) var2 - rnorm(50) var3 - rnorm(50) myData - data.frame(response,var1,var2,var3) var.names - names(myData)[2:4] numVars - length(var.names) betas - rep(-1,numVars) names(betas) - var.names #run regression on var1 through var3. for (Var_num in 1:numVars) { col.name - var.names[Var_num] mylm - lm(response ~ get(col.name),data=myData,model=FALSE) betas[Var_num] - coef(mylm)[2] } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/how-to-prevent-duplications-of-data-within-a-loop-tp21637431p21642232.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there any function can be used to compare two probit models made from same data?
Hi - wouldn't it be possible to bootstrap the difference between the fit of the 2 models? For example, if one had a *linear* regression problem, the following script could be used (although I'm sure that it could be improved): library(MASS); library(boot) #create intercorrelated data Sigma - matrix(c(1,.5,.4, .5,1,.8, .4,.8,1),3,3) Sigma dframe-as.data.frame(mvrnorm(n-200, rep(0, 3), Sigma)) names(dframe)-c('disease','age','ht') #age and ht are predictors of 'disease' head(dframe); cor(dframe) #bootstrap the difference between models containing the 2 predictors model.fun - function(data, indices) { dsub-dframe[indices,] m1se-summary(lm(disease~age,data=dsub))$sigma; m2se-summary(lm(disease~ht,da=dsub))$sigma; diff-m1se-m2se; #diff is the difference in the SEs of the 2 models } eye - boot(dframe,model.fun, R=200); class(eye); names(eye); des(an(eye$t)) boot.ci(eye,conf=c(.95,.99),type=c('norm')) Ben Bolker wrote: jingjiang yan jingjiangyan at gmail.com writes: hi, people How can we compare two probit models brought out from the same data? Let me use the example used in An Introduction to R. Consider a small, artificial example, from Silvey (1970). On the Aegean island of Kalythos the male inhabitants suffer from a congenital eye disease, the effects of which become more marked with increasing age. Samples of islander males of various ages were tested for blindness and the results recorded. The data is shown below: Age: 20 35 45 55 70 No. tested: 50 50 50 50 50 No. blind: 6 17 26 37 44 now, we can use the age and the blind percentage to produce a probit model and get their coefficients by using glm function as was did in An Introduction to R My question is, let say there is another potential factor instead of age affected the blindness percentage. for example, the height of these males. Using their height, and their relevant blindness we can introduce another probit model. If I want to determine which is significantly better, which function can I use to compare both models? and, in addition, compared with the Null hypothesis(i.e. the same blindness for all age/height) to prove this model is effective? You can use a likelihood ratio test (i.e. anova(model1,model0) to compare either model to the null model (blindness is independent of both age and height). The age model and height model are non-nested, and of equal complexity. You can tell which one is *better* by comparing log-likelihoods/deviances, but cannot test a null hypothesis of significance. Most (but not all) statisticians would say you can compare non-nested models by using AIC, but you don't get a hypothesis-test/p-value in this way. Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Is-there-any-function-can-be-used-to-compare-two-probit-models-made-from-same-data--tp21614487p21625839.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] should I use rbind in my example?
Are you sure that you want to loop over variables for each subset, and do this within each subset? A simple way to generate summary statistics (across variables and within categories) is the summaryBy function in the doBy package: d=data.frame(wt=rnorm(100,100,10),ht=rnorm(100,2,1),sex=gl(2,1,100)) library(doBy) summaryBy(wt+ht~sex,da=d,FUN=c(mean,sd)) David Freedman SNN wrote: Hi, This is just for print out so it looks nice. what I have is a loop that loops over my variables and calculates the mean and the sd for these variables. Then I need to rbind them so I can stack them up in one file. the problem is that only the fist header appears and the rest do not. this is because I used rbind. I am new into R , can you tell me how you put them in a 'list'?. Thanks jholtman wrote: What do you want to do with it? Is this just for printing out? What other types of transformations are you intending to do? Why not just put them in a 'list' and then write your own specialized print routine. On Wed, Jan 21, 2009 at 10:30 AM, SNN s.nan...@yahoo.com wrote: Hi, I need to rbind two data frames. Each one has a header . after the rbind I would like to keep the header for each and have the two data frames separated by a line. Is this possible to do in R? For example weight_meanweight_sd.dev F 14.3 4.932883 M 34.7 10.692677 hight_meanhight_sd.dev F 35.0 7.071068 M 34.7 10.692677 -- View this message in context: http://www.nabble.com/should-I-use-rbind-in-my-example--tp21585464p21585464.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/should-I-use-rbind-in-my-example--tp21585464p21590200.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditional weighted quintiles
You might want to look at the 'quantreg' package, written by Roger Koenker, in CRAN. The associated vignette has many examples. Abuzer Bakis wrote: Dear All, I am economist and working on poverty / income inequality. I need descriptive statitics like the ratio of education expentitures between different income quintiles where each household has a different weight. After a bit of google search I found 'Hmisc' and 'quantreg' libraries for weighted quantiles. The problem is that these packages give me only weighted quintiles; but what I need is conditional weighted quintiles. The below example illustrates what I mean. x - data.frame(id=c(1:5),income=c(10,10,20,30,50), education=c(0,5,5,0,0),weight=c(3,2,3,1,1)) x library(Hmisc) wtd.quantile(x$income,weights=x$weight) wtd.quantile(x$education,weights=x$weight) I would like to see the expenditure of each quintile conditional on income, i.e. the education expenditures of the 5th quintile equal to zero. Thanks in advance, -- ozan bakis --__--__-- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - David Freedman Atlanta -- View this message in context: http://www.nabble.com/conditional-weighted-quintiles-tp21536671p21543626.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply within a data.frame: a simpler alternative?
You might take a look at the transformBy function in the doBy package For example, new.df=transformBy(~group,data=my.df, new=y/max(y)) David Freedman baptiste auguie-2 wrote: Dear list, I have a data.frame with x, y values and a 3-level factor group, say. I want to create a new column in this data.frame with the values of y scaled to 1 by group. Perhaps the example below describes it best: x - seq(0, 10, len=100) my.df - data.frame(x = rep(x, 3), y=c(3*sin(x), 2*cos(x), cos(2*x)), # note how the y values have a different maximum depending on the group group = factor(rep(c(sin, cos, cos2), each=100))) library(reshape) df.melt - melt(my.df, id=c(x,group)) # make a long format df.melt - df.melt[ order(df.melt$group) ,] # order the data.frame by the group factor df.melt$norm - do.call(c, tapply(df.melt$value, df.melt$group, function(.v) {.v / max(.v)})) # calculate the normalised value per group and assign it to a new column library(lattice) xyplot(norm + value ~ x,groups=group, data=df.melt, auto.key=T) # check that it worked This procedure works, but it feels like I'm reinventing the wheel using hammer and saw. I tried to use aggregate, by, ddply (plyr package), but I coudn't find anything straight-forward. I'll appreciate any input, Baptiste _ Baptiste Auguié School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - David Freedman Atlanta -- View this message in context: http://www.nabble.com/tapply-within-a-data.frame%3A-a-simpler-alternative--tp20939647p20958347.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] partial correlation
not sure which library 'pcor' is in, but why don't you just use the ranks of the variables and then perform the correlation on the ranks: x-sample(1:10,10,rep=T) y-x+ sample(1:10,10,rep=T) cor(x,y) cor(rank(x),rank(y)) HTH david freedman Jürg Brendan Logue wrote: Hej! I have the following problem: I would like to do partial correlations on non-parametric data. I checked pcor (Computes the partial correlation between two variables given a set of other variables) but I do not know how to change to a Spearman Rank Correlation method [pcor(c(BCDNA,ImProd,A365),var(PCor))] Here's a glimpse of my data (data is raw data). A436 A365 Chla ImProd BCDNA 0.001 0.003 0.624889 11.73023 0.776919 0.138 0.126 0.624889 27.29432 0.357468 0.075 0.056 0.624889 105.3115 0.429785 0.009 0.008 0.312444 55.2929 0.547752 0.005 0.002 0.624889 26.9638 0.738775 0.018 0.006 0.312444 31.14836 0.705814 0.02 0.018 2.22E-16 11.90303 0.755003 0.002 0.003 0.624889 7.829781 0.712091 0.047 0.044 1.523167 1.423823 0.710939 0.084 0.056 13.7085 1.533703 0.280171 Im really grateful for any help since I only recently started employing R. Best regards, JBL [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - David Freedman Atlanta -- View this message in context: http://www.nabble.com/partial-correlation-tp20900010p20901761.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unlist dataframes
try newdata=do.call(rbind,l) David Freedman, Atlanta Naira wrote: Dear all, I would like to know whether it is possible to unlist elements and keep the original format of the data. To make it more clear, let me give an exemple: I have a list l of dataframes that I created with apply but which looks like this: x1=data.frame(Name=LETTERS[1:2],Age=1:2) x2=data.frame(Name=LETTERS[3:4],Age=3:4) l=list(x1,x2) l [[1]] Name Age 1A 1 2B 2 [[2]] Name Age 1C 3 2D 4 I would like to unlist l to create a dataframe with 2 columns and 4 rows but keeping the format of Name (character) and Age (numeric). Now when I unlist l, I obtain : unlist(l) Name1 Name2 Age1 Age2 Name1 Name2 Age1 Age2 1 2 1 2 1 2 3 4 Is there a way to at least obtain something like A 1 B 2 C 3 D 4 as result from the unlist?? Thanks a lot for your replies Naira - David Freedman Atlanta -- View this message in context: http://www.nabble.com/unlist---dataframes-tp20358993p20359063.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 help
You might want to take a look at http://had.co.nz/ggplot2/ There are many examples of how to use this function. And, there's a book coming out soon. hope this helps david freedman Edna Bell wrote: Hi yet again! Thank you for being patient with me. Is there a how to for ggplot2, please? I would like to look at it, but have no idea where to start, please. Thanks, Edna Bell __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - David Freedman Atlanta -- View this message in context: http://www.nabble.com/ggplot2-help-tp18639728p18643242.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fill in area between 2 lines with a color
Hi - I'd like to have the area between 2 lines on a x-y plot be filled with grey, but I haven't had any luck using polygon or rect. (In reality, I'd like to do this for twice - once for a low group and once for a high group - and then I'd like to plot a set of data points for a 'normal' group together with these 2 grey areas.) Here's a simple example of the 2 lines: age=1:10 y.low=rnorm(length(age),150,25)+10*age y.high=rnorm(length(age),250,25)+10*age plot(age,y.high,type='n',ylim=c(100,400),ylab='Y Range',xlab='Age (years)') lines(age,y.low,col='grey') lines(age,y.high,col='grey') Is it possible to fill the area between the 2 lines filled with, for example, 'grey30' ? thanks very much in advance, David Freedman - David Freedman Atlanta -- View this message in context: http://www.nabble.com/fill-in-area-between-2-lines-with-a-color-tp18556096p18556096.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.