Re: [R] subset arg in subset(). was: converting result of substitute to 'ordidnary' expression
Here is another one that works: do.call(subset, list(dat, subsetexp)) x y 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Vadim Ogranovich Sent: Saturday, 26 June 2010 11:13 AM To: 'r-help@r-project.org' Subject: [R] subset arg in subset(). was: converting result of substitute to 'ordidnary' expression Dear R users, Please disregard my previous post converting result of substitute to 'ordidnary' expression. The problem I have has nothing to do with substitute. Consider: dat - data.frame(x=1:10, y=1:10) subsetexp - expression(5x) ## this does work subset(dat, eval(subsetexp)) x y 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 ## and so does this subset(dat, 5x) x y 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 ## but this doesn't work subset(dat, subsetexp) Error in subset.data.frame(dat, subsetexp) : 'subset' must evaluate to logical Why did the last expression fail and why it worked with eval()? Thank you very much for your help, Vadim Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. Jump Trading, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict newdata question
Thanks Bill, that worked great!! You ask: # How can I use predict here, 'newdata' crashes predict(m1,newdata=wolf$predicted);wolf # it doesn't work To use predict() you need to give a fitted model object (here m1) and a *data frame* to specify the values of the predictors for which you want predictions. Here wolf$predicted is not a data frame, it is a vector. What I think you want is pv - predict(m1, newdata = wolf) That will get you linear predictors. To get probabilities you need to say so as probs - predict(m1, newdata = wolf, type = response) You can put these back into the data frame if you wish, e.g. wolf - within(wold, { lpreds - predict(m1, wolf) probs - predict(m1, wolf, type = response) }) Now if you look at head(wolf) you will see two extra columns. -Original Message- From: ymailto=mailto:r-help-boun...@r-project.org; href=mailto:r-help-boun...@r-project.org;r-help-boun...@r-project.org [mailto: href=mailto:r-help-boun...@r-project.org;r-help-boun...@r-project.org] On Behalf Of Felipe Carrillo Sent: Saturday, 26 June 2010 10:35 AM To: ymailto=mailto:r-h...@stat.math.ethz.ch; href=mailto:r-h...@stat.math.ethz.ch;r-h...@stat.math.ethz.ch Subject: [R] predict newdata question Hi: I am using a subset of the below dataset to predict PRED_SUIT for the whole dataset but I am having trouble with 'newdata'. The model was created with 153 records and want to predict for 208 records. [lots of stuff omitted] wolf$prob99-(exp(wolf$predicted))/(1+exp(wolf$predicted)) head(wolf);dim(wolf) # How can I use predict here, 'newdata' crashes predict(m1,newdata=wolf$predicted);wolf # it doesn't work Thanks for any hints Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ ymailto=mailto:R-help@r-project.org; href=mailto:R-help@r-project.org;R-help@r-project.org mailing list href=https://stat.ethz.ch/mailman/listinfo/r-help; target=_blank https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Export Results
See ?pdf ?png ?sink There is also R2wd (about which I wrote here: http://www.r-statistics.com/2010/05/exporting-r-output-to-ms-word-with-r2wd-an-example-session/ ) And there are also the brew, and Sweave packages (as Henrique mentioned). Best, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Fri, Jun 25, 2010 at 6:58 PM, Pedro Mota Veiga motave...@net.sapo.ptwrote: Hi R users, How can I automatically export results and graphs to a file? Thanks in advance Pedro Mota Veiga -- View this message in context: http://r.789695.n4.nabble.com/Export-Results-tp2268622p2268622.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create group markers in original data frame ie.countinued... ? to calculate sth for groups defined betweenpoints in one variable (string), /separating/ spliting variable into groups by i.e. be
Dear useRs, and expeRts, tahanks I have found idea how to add to oryginal data a column with markers to know with all data in wchich period in c2 they are, suimply in the code I could add: stacked_idx-stack(idx) merge(stacked_idx,C.df,by.x=c('values'),by.y=c('c0'), all=T) thanks for suggestions, Kaluza -Wiadomość oryginalna- Od: r-help-boun...@r-project.org w imieniu Eugeniusz Kaluza Wysłano: Pt 2010-06-25 14:48 Do: c Temat: Re: [R] create group markers in original data frame ie.countinued... ? to calculate sth for groups defined betweenpoints in one variable (string),/separating/ spliting variable into groups by i.e. between A-B,B-C, C-D, from: A, NA, NA, B, NA, NA, C, NA, NA, NA, D Dear useRs, at the beginning, Joris Meys, thank you for explaining how to obtain calculation result possible for groups between string marks in one variable in data frame, like in this example below (between START and STOP), wchich I would like to complete at the end by asking about... how is possible to mark each observations presented in oryginal data set # so firstly, below # START...working example of solution proposed by: Joris Meys [jorism...@gmail.com] # Same trick : c0-rbind( 1, 2 , 3, 4, 5, 6, 7, 8, 9,10,11, 12,13,14,15,16,17 ) c0 c1-rbind(10, 20 ,30,40, 50,10,60,20,30,40,50, 30,10, 0,NA,20,10.3444) c1 c2-rbind(NA,A,NA,NA,B,NA,NA,NA,NA,NA,NA,C,NA,NA,NA,NA,D) c2 pos - which(!is.na(C.df$c2)) idx - sapply(2:length(pos),function(i) pos[i-1]:(pos[i]-1)) names(idx) - sapply(2:length(pos), function(i) paste(C.df$c2[pos[i-1]],-,C.df$c2[pos[i]])) out - lapply(idx,function(i) summary(C.df[i,1:2])) out #STOP ... below from: Sent: Thu 2010-06-24 18:02: Joris Meys [jorism...@gmail.com] #Thank you, it is done and works very well # - - - - - - - -- - - - - - -- - - # Now, I try to finish my question to add gruping sybol to the whole set, making # each observation marked by the name of the interval in which that observation is placed. # to tell the observator, that this observation is between ...A and B, to enable sorting, to eneable simple acess using match in_sub_starting_from-rbind(NA,A,A,A,B,B,B,B,B,B,B,C,C,C,C,C,C) in_sub_finished_by -rbind(NA,B,B,B,C,C,C,C,C,C,C,D,D,D,D,D,D) in_sub_limited_by-rbind(NA,A-B,A-B,A-B,B-C,B-C,B-C,B-C,B-C,B-C,B-C,C-D,C-D,C-D,C-D,C-D,C-D) C.df-data.frame(c0,c1,c2,in_sub_starting_from,in_sub_finished_by,in_sub_limited_by) C.df # # Therefore my one more question: How is possible to create these vectors automaticly, having C.df$c2 (and of course having also: C.df$c0,C.df$c1), : C.df$in_sub_starting_from C.df$in_sub_finished_by C.df$in_sub_limited_by #to tell the observator, that this observation is between ...A and B, to enable sorting, to eneable simple acess using match #for example, to make possible this access to data: #to to take the 7'th observation from any row of data frame, C.df$c0[7] C.df$c1[c0==7] #and could #find in this same row in_sub_starting_from that observation is preceded by ... C.df$in_sub_starting_from[c0==7] #find in this same row in_sub_finished_by that observation is before ... C.df$in_sub_finished_by[c0==7] #find in this same row in_sub_finished_by that this observation is between ... C.df$in_sub_limited_by[c0==7] # ? #Thanks for advices, and maybe and this answer, #looking impatiently for time with possible access to internet... # Sincerely, Kaluza and the beginnig of this story; -Original Message- From: Eugeniusz Kaluza Sent: Thu 2010-06-24 17:12 To: r-help@r-project.org Subject: PD: [R] ?to calculate sth for groups defined between points in one variable (string), / value separating/ spliting variable into groups by i.e. between start, NA, NA, stop1, start2, NA, stop2 Dear useRs, Thanks for advice from Joris Meys, Now will try to think how to make it working for less specyfic case, to make the problem more general. Then the result should be displayed for every group between non empty string in c2 i.e. not only result for: #mean: c1 c3c4 c5 20 Start1 Stop1 Start1-Stop1 25.48585 Start2 Stop2 Start2-Stop2 but also for every one group created by space between two closest strings in c2, that contains only seriess of Na, NA, NA, separated from time to time by one string i.e.: #mean: c1 c3c4 c5 20 Start1 Stop1 Start1-Stop1 .. Stop1 Start2 Stop1-Start2 25.48585 Start2 Stop2 Start2-Stop2 i.e. to rewrite this maybe for another simpler version of command but also for every one group created by space between two closest strings in c2, that contains only seriess of Na, NA, NA, separated from time to time by one
[R] become a member of R user community
How do I become a member of R user community? Albert Lee, Ph.D. statistician Confidentiality Notice: This communication, and any file...{{dropped:12}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
Greg Snow kirjoitti 25.6.2010 kello 21.55: Let me see if I understand. You actually have the data for the whole population (the entire piece) but you have some pre-defined sections that you want to see if they differ from the population, or more meaningfully they are different from a randomly selected set of measures. Is that correct? Exactly. If so, since you have the entire population of interest you can create the actual sampling distribution (or a good approximation of it). Just take random samples from the population of the given size (matching the subset you are interested in) and calculate the means (or other value of interest), probably 10,000 to 1,000,000 samples. Now compare the value from your predefined subset to the set of random values you generated to see if it is in the tail or not. Thank you! I will do this. Is this kind of !Monte Carlo -evaluation (?) often used in statistics.If it is, do you know any reference for ti? Atte -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Atte Tenkanen Sent: Thursday, June 24, 2010 11:04 PM To: David Winsemius Cc: R mailing list Subject: Re: [R] Wilcoxon signed rank test and its requirements The values come from this kind of process: The musical composition is segmented into so-called 'pitch-class segments' and these segments are compared with one reference set with a distance function. Only some distance values are possible. These distance values can be averaged over music bars which produces smoother distribution and the 'comparison curve' that illustrates the distances according to the reference set through a musical piece result in more readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original values. then, I want to pick only some regions from the piece and compare those values of those regions, whether they are higher than the mean of all values. Atte On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample nÅ250-300 I do not understand why there should be so many ties. You have not described the measurement process or units. ( ... although you offer a glipmse without much background later.) i would like to test, whether the mean of the sample differ significantly from the population mean. Why? What is the purpose of this investigation? Why should the mean of a sample be that important? The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.png That picture does not offer much insidght into the features of that measurement. It appears to have much more structure than I would expect for a sample from a smooth unimodal underlying population. -- David. Atte On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. To add to what David and others have said, look at the kernel that the U-statistic associated with the WSR test uses: the indicator (0/1) of xi + xj 0. So WSR tests H0:p=0.5 where p = the probability that the average of a randomly chosen pair of values is positive. [If there are ties this probably needs to be worded as P[xi + xj 0] = P[xi + xj 0], i neq j. Frank -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-
[R] Different standard errors from R and other software
Hi all, Sorry to bother you. I'm estimating a discrete choice model in R using the maxBFGS command. Since I wrote the log-likelihood myself, in order to double check, I run the same model in Limdep. It turns out that the coefficient estimates are quite close; however, the standard errors are very different. I also computed the hessian and outer product of the gradients in R using the numDeriv package, but the results are still very different from those in Limdep. Is it the routine to compute the inverse hessian that causes the difference? Thank you very much! Best wishes. Min -- Min Chen Ph.D. Candidate Department of Agricultural, Food, and Resource Economics 125 Cook Hall Michigan State University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Accessing matrix elements within a list
Hi there, I cannot seem to figure out how to access the elements of a list if those elements are a matrix. For example I have a the following list df.list - vector(list, 3) and I have made each of the elements a matrix as follows for(i in 1:3){ assign(paste(s,i, sep=),matrix(0, nrow = 20, ncol = 5, byrow = FALSE, dimnames = NULL)) } # and then insert them with a loop like this # put matrices names in a vector matrices-c(s1,s2,s3) # insert for(i in 1:3){ df.list[[i]] - matrices[i] } My question is I cannot access the first rwo of the matrix within a list. The following does not work df.list [[1]][1,] Thanks for your help! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Passing the parameter (file name) to png()
I am fitting 3 parameter model to my response matrix and want to generate item characterstic curve. I want to specify file name to save item characterstic curve by passing it as external parameter to the R batch script. The following is the code I have written for this. *R Script:* library(ltm) cmd_args = commandArgs(); for (arg in cmd_args) cat( , arg, \n, sep=) respmat - read.table(C:\\rphp\\responsedata.txt) fit3pl - tpm(respmat) cat( , arg, \n, sep=) b - c(C:\\rphp\\,arg) png(file=b, bg=transparent) plot(fit3pl,items=c,lwd=3) dev.off() rm(respmat,fit3pl,b) q() Could you please help me in doing so? I get an error message when R executes png(). Thanks and Regards, Maulik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
Greg Snow kirjoitti 25.6.2010 kello 21.55: Let me see if I understand. You actually have the data for the whole population (the entire piece) but you have some pre-defined sections that you want to see if they differ from the population, or more meaningfully they are different from a randomly selected set of measures. Is that correct? If so, since you have the entire population of interest you can create the actual sampling distribution (or a good approximation of it). Just take random samples from the population of the given size (matching the subset you are interested in) and calculate the means (or other value of interest), probably 10,000 to 1,000,000 samples. Now compare the value from your predefined subset to the set of random values you generated to see if it is in the tail or not. I check, so you mean doing it this way: t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean (SAMPLE), alt = less) Atte -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Atte Tenkanen Sent: Thursday, June 24, 2010 11:04 PM To: David Winsemius Cc: R mailing list Subject: Re: [R] Wilcoxon signed rank test and its requirements The values come from this kind of process: The musical composition is segmented into so-called 'pitch-class segments' and these segments are compared with one reference set with a distance function. Only some distance values are possible. These distance values can be averaged over music bars which produces smoother distribution and the 'comparison curve' that illustrates the distances according to the reference set through a musical piece result in more readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original values. then, I want to pick only some regions from the piece and compare those values of those regions, whether they are higher than the mean of all values. Atte On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample nÅ250-300 I do not understand why there should be so many ties. You have not described the measurement process or units. ( ... although you offer a glipmse without much background later.) i would like to test, whether the mean of the sample differ significantly from the population mean. Why? What is the purpose of this investigation? Why should the mean of a sample be that important? The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.png That picture does not offer much insidght into the features of that measurement. It appears to have much more structure than I would expect for a sample from a smooth unimodal underlying population. -- David. Atte On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. To add to what David and others have said, look at the kernel that the U-statistic associated with the WSR test uses: the indicator (0/1) of xi + xj 0. So WSR tests H0:p=0.5 where p = the probability that the average of a randomly chosen pair of values is positive. [If there are ties this probably needs to be worded as P[xi + xj 0] = P[xi + xj 0], i neq j. Frank -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide
Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?
I am using EDMA for comparing the interlandmark distances of two forms. Actually there is a software called EDMA of course developed by Lele and Richstmeier but main point is I am trying to solve quite different problem on the same data set using R and EDMA so data entry format of EDMA software is different than R format, every trial(for every different data set) I have to set the data entry format according to EDMA. Now I am checking Julien Claud's book named Morphometrics with R (http://www.springer.com/statistics/life+sciences,+medicine+%26+health/book/978-0-387-77789-4) there is a section about EDMA and hopefully I am trying the reach same results with EDMA using EDMA software... -- View this message in context: http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2269210.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
Atte Tenkanen kirjoitti 26.6.2010 kello 5.15: Greg Snow kirjoitti 25.6.2010 kello 21.55: Let me see if I understand. You actually have the data for the whole population (the entire piece) but you have some pre-defined sections that you want to see if they differ from the population, or more meaningfully they are different from a randomly selected set of measures. Is that correct? If so, since you have the entire population of interest you can create the actual sampling distribution (or a good approximation of it). Just take random samples from the population of the given size (matching the subset you are interested in) and calculate the means (or other value of interest), probably 10,000 to 1,000,000 samples. Now compare the value from your predefined subset to the set of random values you generated to see if it is in the tail or not. I check, so you mean doing it this way: t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean (SAMPLE), alt = less) NO, this way: t.test(POPUL[sample(1:length(POPUL), length(SAMPLE), replace = FALSE)], mu=mean(SAMPLE), alt = less) Atte Atte -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Atte Tenkanen Sent: Thursday, June 24, 2010 11:04 PM To: David Winsemius Cc: R mailing list Subject: Re: [R] Wilcoxon signed rank test and its requirements The values come from this kind of process: The musical composition is segmented into so-called 'pitch-class segments' and these segments are compared with one reference set with a distance function. Only some distance values are possible. These distance values can be averaged over music bars which produces smoother distribution and the 'comparison curve' that illustrates the distances according to the reference set through a musical piece result in more readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original values. then, I want to pick only some regions from the piece and compare those values of those regions, whether they are higher than the mean of all values. Atte On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample nÅ250-300 I do not understand why there should be so many ties. You have not described the measurement process or units. ( ... although you offer a glipmse without much background later.) i would like to test, whether the mean of the sample differ significantly from the population mean. Why? What is the purpose of this investigation? Why should the mean of a sample be that important? The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.png That picture does not offer much insidght into the features of that measurement. It appears to have much more structure than I would expect for a sample from a smooth unimodal underlying population. -- David. Atte On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. To add to what David and others have said, look at the kernel that the U-statistic associated with the WSR test uses: the indicator (0/1) of xi + xj 0. So WSR tests H0:p=0.5 where p = the probability that the average of a randomly chosen pair of values is positive. [If there are ties this probably needs to be worded as P[xi + xj 0] = P[xi + xj 0], i neq j. Frank -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University
Re: [R] become a member of R user community
On 25-Jun-10 21:46:13, Albert Lee, Ph.D. wrote: How do I become a member of R user community? Albert Lee, Ph.D. statistician 1. By using R 2. By subscribing to the R-help mailing list and keeping in touch with the rest of us! To subscribe your email address to the list, visit the R-help info page at: https://stat.ethz.ch/mailman/listinfo/r-help and follow the instructions under Subscribing to R-help. Welcome! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 26-Jun-10 Time: 10:52:00 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem: RWinEdt and Windows 7
Hi I can install RWinEdt if I start R with administrator rigths, but it does not paste my code to the console. I found advice in the link below how to manage the problem, but it did not work, any other idea? http://yusung.blogspot.com/2009/01/rwinedt-and-windows-vistawindow-7.html Thanks a lot,Johannes From: Uwe Ligges ligges_at_statistik.tu-dortmund.de Date: Sun, 08 Nov 2009 16:23:34 +0100 Aha, what is that blog post and what does not work for you? I haven't got any report so far and do not have Windows 7 easily available yet. Best, Uwe Ligges Peter Flom wrote: Good morning ( http://tolstoy.newcastle.edu.au/R/e8/help/09/11/4040.html#4042qlink1 ) I just got a new computer with Windows 7. R works fine, but the editor I am used to using RWinEdt does not. I did find one blog post on how to get RWinEdt to work in Windows 7, but I could not get those instructions to work either. Is there a patch for RWinEdt? If not, is there another good R editor that works under Windows 7? I tried RSiteSearch with various combinations of Windows 7 and Editor and so on, but found nothing. I also tried googling on these terms. Thanks Peter Peter L. Flom, PhD Statistical Consultant Website: www DOT peterflomconsulting DOT com Writing; http://www.associatedcontent.com/user/582880/peter_flom.html Twitter: @peterflom __ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html ( http://www.r-project.org/posting-guide.html ) and provide commented, minimal, self-contained, reproducible code. Dr. Johannes Reichl Abteilung Energiewirtschaft Energieinstitut an der Johannes Kepler Universität Linz Altenberger StraÃe 69 A-4040 Linz * Tel.: +43-732-2468-5652 Fax: +43-732-2468-5651 Email: rei...@energieinstitut-linz.at Web: www.energieinstitut-linz.at www.energyefficiency.at [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Passing the parameter (file name) to png()
b - paste(C:\\rphp\\,arg, sep='') On Sat, Jun 26, 2010 at 12:55 AM, Maulik Shah maulik.shah2...@gmail.com wrote: I am fitting 3 parameter model to my response matrix and want to generate item characterstic curve. I want to specify file name to save item characterstic curve by passing it as external parameter to the R batch script. The following is the code I have written for this. *R Script:* library(ltm) cmd_args = commandArgs(); for (arg in cmd_args) cat( , arg, \n, sep=) respmat - read.table(C:\\rphp\\responsedata.txt) fit3pl - tpm(respmat) cat( , arg, \n, sep=) b - c(C:\\rphp\\,arg) png(file=b, bg=transparent) plot(fit3pl,items=c,lwd=3) dev.off() rm(respmat,fit3pl,b) q() Could you please help me in doing so? I get an error message when R executes png(). Thanks and Regards, Maulik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing matrix elements within a list
first of all take a look at the object you created: df.list - vector(list, 3) for(i in 1:3){ + assign(paste(s,i, sep=),matrix(0, nrow = 20, ncol = 5, byrow + = FALSE, dimnames = NULL)) + } # and then insert them with a loop like this # put matrices names in a vector matrices-c(s1,s2,s3) # insert for(i in 1:3){ + df.list[[i]] - matrices[i] + } str(df.list) List of 3 $ : chr s1 $ : chr s2 $ : chr s3 you will see that it is a list of characters since that is what is in 'matrices' What you need to do is to use 'get': df.list - vector(list, 3) for(i in 1:3){ + assign(paste(s,i, sep=),matrix(0, nrow = 20, ncol = 5, byrow + = FALSE, dimnames = NULL)) + } # and then insert them with a loop like this # put matrices names in a vector matrices-c(s1,s2,s3) # insert for(i in 1:3){ + df.list[[i]] - get(matrices[i]) + } str(df.list) List of 3 $ : num [1:20, 1:5] 0 0 0 0 0 0 0 0 0 0 ... $ : num [1:20, 1:5] 0 0 0 0 0 0 0 0 0 0 ... $ : num [1:20, 1:5] 0 0 0 0 0 0 0 0 0 0 ... df.list[[1]][1,] [1] 0 0 0 0 0 or even better just put them in the list at the first: df.list - vector(list, 3) for(i in 1:3){ + df.list[[i]] - matrix(0, nrow = 20, ncol = 5, byrow + = FALSE, dimnames = NULL) + } str(df.list) List of 3 $ : num [1:20, 1:5] 0 0 0 0 0 0 0 0 0 0 ... $ : num [1:20, 1:5] 0 0 0 0 0 0 0 0 0 0 ... $ : num [1:20, 1:5] 0 0 0 0 0 0 0 0 0 0 ... df.list[[1]][1,] [1] 0 0 0 0 0 On Fri, Jun 25, 2010 at 5:29 PM, Maria P Petrova mpetr...@u.washington.edu wrote: Hi there, I cannot seem to figure out how to access the elements of a list if those elements are a matrix. For example I have a the following list df.list - vector(list, 3) and I have made each of the elements a matrix as follows for(i in 1:3){ assign(paste(s,i, sep=),matrix(0, nrow = 20, ncol = 5, byrow = FALSE, dimnames = NULL)) } # and then insert them with a loop like this # put matrices names in a vector matrices-c(s1,s2,s3) # insert for(i in 1:3){ df.list[[i]] - matrices[i] } My question is I cannot access the first rwo of the matrix within a list. The following does not work df.list [[1]][1,] Thanks for your help! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Recursive indexing failed at level 2
Dear fellow R users, I am replacing elements of a list like so: pulse_subset[[1:20]]=unlist(pulse[i])[1:20] where pulse is a list of lists, and pulse [i] has 20 values. This gives the error Recursive Indexing failed at level 2. But, interestingly this instruction is part of a loop which has gone through about 200,000 iterations before giving this error. Actual code: pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] Error in pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] : recursive indexing failed at level 2 If anyone could shed some light I'd be rather grateful. Regards, Jim Hargreaves __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Recursive indexing failed at level 2
On 26/06/2010 7:53 AM, Jim Hargreaves wrote: Dear fellow R users, I am replacing elements of a list like so: pulse_subset[[1:20]]=unlist(pulse[i])[1:20] If pulse is a list, then pulse[i] is also a list, with one element. I think you want pulse[[i]], which extracts element i. If pulse_subset is a list, then pulse_subset[[1:20]] is equivalent to pulse_subset[[1]][[2]][[3]][[4]] ... [[20]], i.e. the syntax implies that it is a list containing a list etc, nested 20 levels deep. The error message is telling you that it's not. I'm not sure what your intention is in this case. Duncan Murdoch where pulse is a list of lists, and pulse [i] has 20 values. This gives the error Recursive Indexing failed at level 2. But, interestingly this instruction is part of a loop which has gone through about 200,000 iterations before giving this error. Actual code: pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] Error in pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] : recursive indexing failed at level 2 If anyone could shed some light I'd be rather grateful. Regards, Jim Hargreaves __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Recursive indexing failed at level 2
On 06/26/2010 01:20 PM, Duncan Murdoch wrote: On 26/06/2010 7:53 AM, Jim Hargreaves wrote: Dear fellow R users, I am replacing elements of a list like so: pulse_subset[[1:20]]=unlist(pulse[i])[1:20] If pulse is a list, then pulse[i] is also a list, with one element. I think you want pulse[[i]], which extracts element i. Ahh, I specified pulse[i] has 20 values in my original mail. Basically pulse is a list 1000 elements long, with each element in pulse having between 1000 and 2000 elements of it's own. Pulse is a list of lists. Also as far as I am aware, [[ ]]'s should only be used when assigning values to elements of a list/vector. unlist(pulse[1]) gives x1, x2, x3, x4, x5 etc. etc. If pulse_subset is a list, then pulse_subset[[1:20]] is equivalent to pulse_subset[[1]][[2]][[3]][[4]] ... [[20]], i.e. the syntax implies that it is a list containing a list etc, nested 20 levels deep. The error message is telling you that it's not. I'm not sure what your intention is in this case. Duncan Murdoch where pulse is a list of lists, and pulse [i] has 20 values. This gives the error Recursive Indexing failed at level 2. But, interestingly this instruction is part of a loop which has gone through about 200,000 iterations before giving this error. Actual code: pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] Error in pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] : recursive indexing failed at level 2 If anyone could shed some light I'd be rather grateful. Regards, Jim Hargreaves __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Recoding dates to session id in a longitudinal dataset
Hi, I'm fairly new to R but I have a large dataset (30 obs) containing patient material. Some patients came 2-9 times during the three year observation period. The patients are identified by a unique idnr, the sessions can be distinguished using the session date. How can I recode the date of the session to a session id (1-9). This would be necessary to obtain information and do some analysis on the first occurence of a specific patient or to look for trends. Thanks JP Bogers University of Antwerp [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Recursive indexing failed at level 2
On 26/06/2010 8:29 AM, Jim Hargreaves wrote: On 06/26/2010 01:20 PM, Duncan Murdoch wrote: On 26/06/2010 7:53 AM, Jim Hargreaves wrote: Dear fellow R users, I am replacing elements of a list like so: pulse_subset[[1:20]]=unlist(pulse[i])[1:20] If pulse is a list, then pulse[i] is also a list, with one element. I think you want pulse[[i]], which extracts element i. Ahh, I specified pulse[i] has 20 values in my original mail. But that could not be correct. Take a look at length(pulse[i]). Assuming that i is a scalar value, length(pulse[i]) will be 1. You really do want pulse[[i]]. You used unlist(pulse[i]) which is sometimes the same as pulse[[i]], but it really depends on what pulse[[i]] is. unlist() is a very crude tool, and you should avoid it unless you really need it. Basically pulse is a list 1000 elements long, with each element in pulse having between 1000 and 2000 elements of it's own. Pulse is a list of lists. Also as far as I am aware, [[ ]]'s should only be used when assigning values to elements of a list/vector. Whoever told you that was mistaken. Duncan Murdoch unlist(pulse[1]) gives x1, x2, x3, x4, x5 etc. etc. If pulse_subset is a list, then pulse_subset[[1:20]] is equivalent to pulse_subset[[1]][[2]][[3]][[4]] ... [[20]], i.e. the syntax implies that it is a list containing a list etc, nested 20 levels deep. The error message is telling you that it's not. I'm not sure what your intention is in this case. Duncan Murdoch where pulse is a list of lists, and pulse [i] has 20 values. This gives the error Recursive Indexing failed at level 2. But, interestingly this instruction is part of a loop which has gone through about 200,000 iterations before giving this error. Actual code: pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] Error in pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] : recursive indexing failed at level 2 If anyone could shed some light I'd be rather grateful. Regards, Jim Hargreaves __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Recoding session date into session id in a longitudinal dataset
Hi, I'm fairly new to R but I have a large dataset (30 obs) containing patient material. Some patients came 2-9 times during the three year observation period. The patients are identified by a unique idnr, the sessions can be distinguished using the session date. How can I recode the date of the session to a session id (1-9). This would be necessary to obtain information and do some analysis on the first occurence of a specific patient or to look for trends. Thanks JP Bogers University of Antwerp [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Recursive indexing failed at level 2
Hi Duncan, list, Thanks for the advice, but unfortunately that wasn't what was causing my problem. I'm still getting the Recursive indexing failed at level 2 message even after replacing my unlist(pulse[i]) with pulse[[i]]. Error: pulse_subset[[1:as.numeric(length(pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])] Error in pulse_subset[[1:as.numeric(length(pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])] : recursive indexing failed at level 2 It's almost as if the length of pulse[[i]] is too small, but it's length is 1001 and peak_start[i] and peak_end[i] are 192 and 208 respectively. Also why would the problem crop up only after 200,000 runs? Bizarre! Regards, Jim Hargreaves On 06/26/2010 01:38 PM, Duncan Murdoch wrote: On 26/06/2010 8:29 AM, Jim Hargreaves wrote: On 06/26/2010 01:20 PM, Duncan Murdoch wrote: On 26/06/2010 7:53 AM, Jim Hargreaves wrote: Dear fellow R users, I am replacing elements of a list like so: pulse_subset[[1:20]]=unlist(pulse[i])[1:20] If pulse is a list, then pulse[i] is also a list, with one element. I think you want pulse[[i]], which extracts element i. Ahh, I specified pulse[i] has 20 values in my original mail. But that could not be correct. Take a look at length(pulse[i]). Assuming that i is a scalar value, length(pulse[i]) will be 1. You really do want pulse[[i]]. You used unlist(pulse[i]) which is sometimes the same as pulse[[i]], but it really depends on what pulse[[i]] is. unlist() is a very crude tool, and you should avoid it unless you really need it. Basically pulse is a list 1000 elements long, with each element in pulse having between 1000 and 2000 elements of it's own. Pulse is a list of lists. Also as far as I am aware, [[ ]]'s should only be used when assigning values to elements of a list/vector. Whoever told you that was mistaken. Duncan Murdoch unlist(pulse[1]) gives x1, x2, x3, x4, x5 etc. etc. If pulse_subset is a list, then pulse_subset[[1:20]] is equivalent to pulse_subset[[1]][[2]][[3]][[4]] ... [[20]], i.e. the syntax implies that it is a list containing a list etc, nested 20 levels deep. The error message is telling you that it's not. I'm not sure what your intention is in this case. Duncan Murdoch where pulse is a list of lists, and pulse [i] has 20 values. This gives the error Recursive Indexing failed at level 2. But, interestingly this instruction is part of a loop which has gone through about 200,000 iterations before giving this error. Actual code: pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] Error in pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] : recursive indexing failed at level 2 If anyone could shed some light I'd be rather grateful. Regards, Jim Hargreaves __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Recursive indexing failed at level 2
Jim Hargreaves wrote: Hi Duncan, list, Thanks for the advice, but unfortunately that wasn't what was causing my problem. I'm still getting the Recursive indexing failed at level 2 message even after replacing my unlist(pulse[i]) with pulse[[i]]. Read the second part of my first message, which explains the error. You had two errors in the original expression, and have only fixed one. Duncan Murdoch Error: pulse_subset[[1:as.numeric(length(pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])] Error in pulse_subset[[1:as.numeric(length(pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])] : recursive indexing failed at level 2 It's almost as if the length of pulse[[i]] is too small, but it's length is 1001 and peak_start[i] and peak_end[i] are 192 and 208 respectively. Also why would the problem crop up only after 200,000 runs? Bizarre! Regards, Jim Hargreaves On 06/26/2010 01:38 PM, Duncan Murdoch wrote: On 26/06/2010 8:29 AM, Jim Hargreaves wrote: On 06/26/2010 01:20 PM, Duncan Murdoch wrote: On 26/06/2010 7:53 AM, Jim Hargreaves wrote: Dear fellow R users, I am replacing elements of a list like so: pulse_subset[[1:20]]=unlist(pulse[i])[1:20] If pulse is a list, then pulse[i] is also a list, with one element. I think you want pulse[[i]], which extracts element i. Ahh, I specified pulse[i] has 20 values in my original mail. But that could not be correct. Take a look at length(pulse[i]). Assuming that i is a scalar value, length(pulse[i]) will be 1. You really do want pulse[[i]]. You used unlist(pulse[i]) which is sometimes the same as pulse[[i]], but it really depends on what pulse[[i]] is. unlist() is a very crude tool, and you should avoid it unless you really need it. Basically pulse is a list 1000 elements long, with each element in pulse having between 1000 and 2000 elements of it's own. Pulse is a list of lists. Also as far as I am aware, [[ ]]'s should only be used when assigning values to elements of a list/vector. Whoever told you that was mistaken. Duncan Murdoch unlist(pulse[1]) gives x1, x2, x3, x4, x5 etc. etc. If pulse_subset is a list, then pulse_subset[[1:20]] is equivalent to pulse_subset[[1]][[2]][[3]][[4]] ... [[20]], i.e. the syntax implies that it is a list containing a list etc, nested 20 levels deep. The error message is telling you that it's not. I'm not sure what your intention is in this case. Duncan Murdoch where pulse is a list of lists, and pulse [i] has 20 values. This gives the error Recursive Indexing failed at level 2. But, interestingly this instruction is part of a loop which has gone through about 200,000 iterations before giving this error. Actual code: pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] Error in pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] : recursive indexing failed at level 2 If anyone could shed some light I'd be rather grateful. Regards, Jim Hargreaves __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Recursive indexing failed at level 2
On 06/26/2010 02:07 PM, Duncan Murdoch wrote: Jim Hargreaves wrote: Hi Duncan, list, Thanks for the advice, but unfortunately that wasn't what was causing my problem. I'm still getting the Recursive indexing failed at level 2 message even after replacing my unlist(pulse[i]) with pulse[[i]]. Read the second part of my first message, which explains the error. You had two errors in the original expression, and have only fixed one. Doh! Working as intended now, thanks very much for your help! Regards, Jim Hargreaves Duncan Murdoch Error: pulse_subset[[1:as.numeric(length(pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])] Error in pulse_subset[[1:as.numeric(length(pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])] : recursive indexing failed at level 2 It's almost as if the length of pulse[[i]] is too small, but it's length is 1001 and peak_start[i] and peak_end[i] are 192 and 208 respectively. Also why would the problem crop up only after 200,000 runs? Bizarre! Regards, Jim Hargreaves On 06/26/2010 01:38 PM, Duncan Murdoch wrote: On 26/06/2010 8:29 AM, Jim Hargreaves wrote: On 06/26/2010 01:20 PM, Duncan Murdoch wrote: On 26/06/2010 7:53 AM, Jim Hargreaves wrote: Dear fellow R users, I am replacing elements of a list like so: pulse_subset[[1:20]]=unlist(pulse[i])[1:20] If pulse is a list, then pulse[i] is also a list, with one element. I think you want pulse[[i]], which extracts element i. Ahh, I specified pulse[i] has 20 values in my original mail. But that could not be correct. Take a look at length(pulse[i]). Assuming that i is a scalar value, length(pulse[i]) will be 1. You really do want pulse[[i]]. You used unlist(pulse[i]) which is sometimes the same as pulse[[i]], but it really depends on what pulse[[i]] is. unlist() is a very crude tool, and you should avoid it unless you really need it. Basically pulse is a list 1000 elements long, with each element in pulse having between 1000 and 2000 elements of it's own. Pulse is a list of lists. Also as far as I am aware, [[ ]]'s should only be used when assigning values to elements of a list/vector. Whoever told you that was mistaken. Duncan Murdoch unlist(pulse[1]) gives x1, x2, x3, x4, x5 etc. etc. If pulse_subset is a list, then pulse_subset[[1:20]] is equivalent to pulse_subset[[1]][[2]][[3]][[4]] ... [[20]], i.e. the syntax implies that it is a list containing a list etc, nested 20 levels deep. The error message is telling you that it's not. I'm not sure what your intention is in this case. Duncan Murdoch where pulse is a list of lists, and pulse [i] has 20 values. This gives the error Recursive Indexing failed at level 2. But, interestingly this instruction is part of a loop which has gone through about 200,000 iterations before giving this error. Actual code: pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] Error in pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] : recursive indexing failed at level 2 If anyone could shed some light I'd be rather grateful. Regards, Jim Hargreaves __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] several common sub-axes within multiple plot area
Dear List, I'd really appreciate tip's or code demonstrating how i can achieve some common axis labels integrated into a multiple plot. In my example (below), i'm trying to achieve: -a single Results 1 (Int) centered btwn row 1 and row 2; -a single Results 2 (Int) centered btwn row 2 and row 3; and, -a single Results 3 (Int) centered at the bottom, ie., below row 3. I played with mtext() and par(oma=... per this post- https://stat.ethz.ch/pipermail/r-help/2004-October/059453.html But have so far failed to achieve my goal. Can i succeed with something combined with the 'high level' plot() function? Or do i need to get specific with some low level commands (help!)? With big thanks in advance for any suggestions/examples. cheers, Karl #my example: dev.new() plot.new() par(mfrow=c(3,2)) #Graph 1: plot(rnorm(20), rnorm(20), xlab = Results 1 (Int), ylab = Variable A, main = Factor X) #Graph 2: plot(rnorm(20), rnorm(20), xlab = Results 1 (Int), ylab = Variable A, main = Factor Y) #Graph 3: plot(rnorm(20), rnorm(20), xlab = Results 2 (Int), ylab = Variable B) #Graph 4: plot(rnorm(20), rnorm(20), xlab = Results 2 (Int), ylab = Variable B) #Graph 5: plot(rnorm(20), rnorm(20), xlab = Results 3 (Int), ylab = Variable C) #Graph 6: plot(rnorm(20), rnorm(20), xlab = Results 3 (Int), ylab = Variable C) -- Karl Brand Department of Genetics Erasmus MC Dr Molewaterplein 50 3015 GE Rotterdam T +31 (0)10 704 3457 |F +31 (0)10 704 4743 |M +31 (0)642 777 268 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] integration of two normal density
I sent you a doubble integration solution a couple of days ago, that answered your question. You did not have the coureteousy to acknowledge that. Now, you are asking a different question that is incorrectly formulated. What you are doing is not multivariate integration. You are just integrating a univariate function, which as Prof. Venables pointed out, is not even a density. Ravi. Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu - Original Message - From: Carrie Li carrieands...@gmail.com Date: Friday, June 25, 2010 11:29 pm Subject: [R] integration of two normal density To: r-help R-help@r-project.org Hello everyone, I have a question about integration of two density function Intuitively, I think the value after integration should be 1, but they are not. Am I missing something here ? t=function(y){dnorm(y, mean=3)*dnorm(y/2, mean=1.5)} integrate(t, -Inf, Inf) 0.3568248 with absolute error 4.9e-06 Also, is there any R function or package could do multivariate integration ? Thanks for any suggestions! Carrie [[alternative HTML version deleted]] __ R-help@r-project.org mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] integration of two normal density
On Fri, 2010-06-25 at 23:28 -0400, Carrie Li wrote: Hello everyone, I have a question about integration of two density function Intuitively, I think the value after integration should be 1, but they are not. Am I missing something here ? t=function(y){dnorm(y, mean=3)*dnorm(y/2, mean=1.5)} integrate(t, -Inf, Inf) 0.3568248 with absolute error 4.9e-06 You've demonstrated (numerically) that the product of two normal density functions, with means 3, and 1.5 respectively and variance 1, doesn't result in a pdf. However, you could make a numerically normalized pdf by multiplying by 1/0.3568248. K - integrate(t, -Inf, Inf)$value Kt - function(y) 1/K * dnorm(y, 3) * dnorm(y/2, 1.5) integrate(Kt, -Inf, Inf) 1 with absolute error 1.4e-05 Hence, the quantity you computed (K) is the normalization constant, with some small error. Note that this strategy _may_ not always work. Here's a good homework question: Can the product of two pdfs with identical support always be normalized to form a new pdf? As for empirical multivariate integration, it's tough, especially if you want to enumerate the area under the surface, which is exactly the strategy of functions like 'integrate' (search Wikipedia for numerical integration). This problem becomes increasingly difficult in additional dimensions; the dreaded curse of dimensionality. On the bright side, Bayesian statistical methods have to deal with this all the time, and we have some good methods to compute numerical integrals. Check out Monte Carlo integration, and Markov chain Monte Carlo methods. -Matt Also, is there any R function or package could do multivariate integration ? Thanks for any suggestions! Carrie [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew S. Shotwell Graduate Student Division of Biostatistics and Epidemiology Medical University of South Carolina http://biostatmatt.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Forcing scalar multiplication.
t(e$values * t(e$vectors)) Uwe Ligges On 25.06.2010 20:42, rkevinbur...@charter.net wrote: I am trying to check the results from an Eigen decomposition and I need to force a scalar multiplication. The fundamental equation is: Ax = lx. Where 'l' is the eigen value and x is the eigen vector corresponding to the eigenvalue. 'R' returns the eigenvalues as a vector (e- eigen(A); e$values). So in order to 'check' the result I would multiply the eigenvalues ('l') by the eigenvectors. But unless I do it one by one (say e$values[1] * e$vectors[,1]) 'R' tries a matrix multiplication and that is not what I want. I would like a matrix that is formed by the SCALAR multiplication of each of the values by the corresponding eigenvector. How can I force such a multiplication? Thank you. Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] boot with strata: strata argument ignored?
Hello All. I must be missing the really obvious here: mm - function(d, i) median(d[i]) b1 - boot(gravity$g, mm, R = 1000) b1 b2 - boot(gravity$g, mm, R = 1000, strata = gravity$series) b2 Both b1 and b2 seem to have done (almost) the same thing, but it looks like the strata argument in b2 has been ignored. However, str(b1) vs str(b2) does show that the strata have been noted correctly. But b2$t is a 1000 x 1 array, not a 1000 x 8 array (gravity$series is a factor with 8 levels). There is a more complex example in ?boot using the same data set that gives a result that seems to make sense (2 levels in the factor, so $t has 2 columns). I either misunderstand the expected behavior or I've missed some punctuation or syntax detail. TIA, Bryan * Bryan Hanson Acting Chair Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA sessionInfo() R version 2.11.0 (2010-04-22) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] datasets tools grid graphics grDevices utils stats [8] methods base other attached packages: [1] boot_1.2-42brew_1.0-3 faraway_1.0.4 [4] GGally_0.2 xtable_1.5-6 mvbutils_2.5.1 [7] ggplot2_0.8.7 digest_0.4.2 reshape_0.8.3 [10] proto_0.3-8ChemoSpec_1.43 R.utils_1.4.0 [13] R.oo_1.7.2 R.methodsS3_1.2.0 rgl_0.91 [16] lattice_0.18-5 mvoutlier_1.4 plyr_0.1.9 [19] RColorBrewer_1.0-2 chemometrics_0.8 som_0.3-5 [22] robustbase_0.5-0-1 rpart_3.1-46 pls_2.1-0 [25] pcaPP_1.8-1mvtnorm_0.9-9 nnet_7.3-1 [28] mclust_3.4.4 MASS_7.3-5 lars_0.9-7 [31] e1071_1.5-23 class_7.3-2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating Summaries for each level of a Categorical variable
Did you try tapply? ?tapply tapply(RT, RT$R, fun=WA) or something like that - Corey Sparks, PhD Assistant Professor Department of Demography and Organization Studies University of Texas at San Antonio 501 West Durango Blvd Monterey Building 2.270C San Antonio, TX 78207 210-458-3166 corey.sparks 'at' utsa.edu https://rowdyspace.utsa.edu/users/ozd504/www/index.htm -- View this message in context: http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269444.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?
I think the hardest thing about true EDMA (meaning the Richtsmeier and Lele version) is the bootstrapping to get significance. Have you tried their software? http://www.getahead.psu.edu/resource_new.html - Corey Sparks, PhD Assistant Professor Department of Demography and Organization Studies University of Texas at San Antonio 501 West Durango Blvd Monterey Building 2.270C San Antonio, TX 78207 210-458-3166 corey.sparks 'at' utsa.edu https://rowdyspace.utsa.edu/users/ozd504/www/index.htm -- View this message in context: http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2269445.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange behaviour of CairoPNG
Thank you Henrik for your answer. I hope now I am inline with the posting huide and perhaps I get an answer, thank you. sessionInfo() R version 2.9.0 alpha (2009-03-23 r48200) i386-pc-mingw32 locale: LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Cairo_1.4-4 2010/6/5 Henrik Bengtsson h...@stat.berkeley.edu: FYI, follow the information in the email footer: PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and make sure at a minimum to report your sessionInfo(). That increases your chances to get a response. /Henrik On Sat, Jun 5, 2010 at 11:42 AM, Thomas Steiner finbref.2...@gmail.com wrote: OK, no reply. :-( I'm more offensive: this is a bug! the font-parameter of the text fucntion does not work properly in the Cairo-package thomas 2010/6/4 Thomas Steiner finbref.2...@gmail.com: Hi, could it be that the text() fuction gives different output for normal png() and CarioPNG()? See the following example and the attached images: the font=2 and font=3 seem to be exchanged! Thanks for help, Thomas CairoPNG(Test-cairo.png,width=750,height=690) #png(Test-normal.png,width=750,height=690) plot(1,1,type=n,main=normal) text(1,1,normal,adj=c(1,1)) text(1,1,bold,font=2,adj=c(-1,-1)) text(1,1,italic,font=3,adj=c(1,-1)) text(1,1,italicbold,font=4,adj=c(-1,1)) dev.off() __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating Summaries for each level of a Categorical variable
Hi, I have a dataset which has a categorical variable R,a count variable C (integer) and 4 or more numeric variables (A,T,W,H - integers) containing measures for R. I would like to summarize each level of the variable R by the average for A,T,W and H. I have written a function to calculate weighted averages using C as the weight and this is given below. The function works perfectly but how do I add the additional dimension I require to this function? Dataset: RT= R A T W H R1 10 20 20 10 R2 60 20 50 10 R3 45 10 20 50 R4 68 50 20 10 R1 73 20 40 46 R3 25 30 10 54 R3 36 90 20 10 R2 29 10 30 30 # FUNCTION TO CALCULATE THE WEIGHTED AVERAGE FOR A WEIGHTED BY C WA-function(A,C) { sp_A-c(A %*% C) sum_C-sum(C) WA-sp_A/sum_C return(WA) } I am trying to incorporate the additional step of calculating the weighted average of A,T,W and H for each level of R. Need help with this. Thanks in advance! Raoul -- View this message in context: http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269349.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Popularity of R, SAS, SPSS, Stata...
-Original Message- From: Joris Meys [mailto:jorism...@gmail.com] Sent: Friday, June 25, 2010 10:10 PM To: Muenchen, Robert A (Bob) Cc: Dario Solari; r-help@r-project.org Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... I had taken the opposite tack with Google Trends by subtracting keywords like: SAS -shoes -airlines -sonar... but never got as good results as that beautiful X code for search. When you see the end-of-semester panic bumps in traffic, you know you're nailing it! I have to eat those words already. The R code for search that showed a peak every December did not have quotes around it, so it was searching for those three words not the complete phrase. When you add the quotes, the peaks vanish. Don't swallow! You're looking through search terms, not through web pages. R code for regression, regression code R etc. are all valid searches, no quotation marks needed. I wondered why those clear peaks had vanished when I added quotes. Here's one that combines the search terms without the quotes. It shows several March/April October/November peaks: http://www.google.com/insights/search/#q=r%20code%20for%2Br%20manual%2Br %20tutorial%2Br%20graph%2Csas%20code%20for%2Bsas%20manual%2Bsas%20tutori al%2Bsas%20graph%2Cspss%20code%20for%2Bspss%20manual%2Bspss%20tutorial%2 Bspss%20graph%2Cstata%20code%20for%2Bstata%20manual%2Bstata%20tutorial%2 Bstata%20graph%2Cs-plus%20code%20for%2Bs-plus%20manual%2Bs-plus%20tutori al%2Bs-plus%20graphcmpt=q I've been trying to make sense of Google Scholar searches. I'm obviously missing something basic. Here are two searches on www.google.com: sas - gets 68M hits sas OR spss - gets 74.3M hits. A bigger number as OR would imply. But when I do the same searches on scholar.google.com, here's what I get: sas - gets 4.6M hits sas OR spss - gets 1.65M hits How on earth can an OR get you less?? Thanks, Bob http://www.google.com/insights/search/#q=code%20for%20r%2Ccode%20for%20 S AS%2Ccode%20for%20SPSS%2Ccode%20for%20matlabcmpt=q This one is nice too. You can see that the bump in the autumn semester for R is replacing the one for Matlab. Then in the spring semester Matlab stays high but R drops. And both the US and India always have a very large search index, whereas the rest of the world is essentially worthless. Which leads me to the conclusion that : 1) The results are probably coming from google.com, excluding local versions, and 2) in the US (and India), statistics is mainly taught in the autumn semester. Given the fact that daylight has a beneficial effect on the emotional well being, the impopularity of statistics is likely caused by unfortunate scheduling. Forget Excel. Google rocks! ;-) Cheers Joris Once you go the phrase route, you gain precision but end up with zero counts on various phrases. I avoided that by combining them with + to get enough to plot. The resulting graph shows SAS dominant until mid-2006 when SPSS takes the top position, followed by R, SAS, Stata in order: http://www.google.com/insights/search/#q=%22r%20code%20for%22%2B%22r%20 m anual%22%2B%22r%20tutorial%22%2B%22r%20graph%22%2C%22sas%20code%20for%2 2 %2B%22sas%20manual%22%2B%22sas%20tutorial%22%2B%22sas%20graph%22%2C%22s p ss%20code%20for%22%2B%22spss%20manual%22%2B%22spss%20tutorial%22%2B%22s p ss%20graph%22%2C%22stata%20code%20for%22%2B%22stata%20manual%22%2B%22st a ta%20tutorial%22%2B%22stata%20graph%22%2C%22s- plus%20code%20for%22%2B%22 s-plus%20manual%22%2Bs-plus%20tutorial%22%2B%22s- plus%20graph%22cmpt=q This might be a good one to add to http://r4stats.com/popularity Bob I see that there's a car, the R Code Mustang, that adding for gets rid of. Thanks for getting me back on a topic that I had given up on! Bob -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Joris Meys Sent: Thursday, June 24, 2010 7:56 PM To: Dario Solari Cc: r-help@r-project.org Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... Nice idea, but quite sensitive to search terms, if you compare your result on ... code with ... code for: http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code % 2 0 f or%2Cspss%20code%20forcmpt=q On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari dario.sol...@gmail.com wrote: First: excuse for my english My opinion: a useful font for measuring popoularity can be Google Insights for Search - http://www.google.com/insights/search/# Every person using a software like R, SAS, SPSS needs first to learn it. So probably he make a web-search for a manual, a tutorial, a guide. One can measure the share of this kind of serach query. This kind of results can be useful to determine trends of popularity. Example 1: R tutorial/manual/guide, SAS tutorial/manual/guide, SPSS tutorial/manual/guide http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%2 0 m a n ual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%2 2 % 2 B
Re: [R] Popularity of R, SAS, SPSS, Stata...
On 26/06/10 16:07, Muenchen, Robert A (Bob) wrote: I've been trying to make sense of Google Scholar searches. I'm obviously missing something basic. Here are two searches on www.google.com: sas - gets 68M hits sas OR spss - gets 74.3M hits. A bigger number as OR would imply. But when I do the same searches on scholar.google.com, here's what I get: sas - gets 4.6M hits sas OR spss - gets 1.65M hits How on earth can an OR get you less?? Because the search for SAS alone stems the words so you get hist on SA alone (SAS obviously (!) being the plural of SA). As you will see from the first few hits (hint: the matched word is highlighted in bold). With the OR you don't stem (weird but true). Put quotes around the single search term to avoid (some of) the stemming: SAS - 4.62M SAS - 1.62M SPSS - 0.635M SAS OR SPSS - 1.52M It is obviously still not right, but closer. Happy reading of the articles by D. Sas, S.A.S. Eddington, etc. Any follow-ups probably belong on a different mailing list - I think there are forums for Google search. Allan Thanks, Bob http://www.google.com/insights/search/#q=code%20for%20r%2Ccode%20for%20 S AS%2Ccode%20for%20SPSS%2Ccode%20for%20matlabcmpt=q This one is nice too. You can see that the bump in the autumn semester for R is replacing the one for Matlab. Then in the spring semester Matlab stays high but R drops. And both the US and India always have a very large search index, whereas the rest of the world is essentially worthless. Which leads me to the conclusion that : 1) The results are probably coming from google.com, excluding local versions, and 2) in the US (and India), statistics is mainly taught in the autumn semester. Given the fact that daylight has a beneficial effect on the emotional well being, the impopularity of statistics is likely caused by unfortunate scheduling. Forget Excel. Google rocks! ;-) Cheers Joris Once you go the phrase route, you gain precision but end up with zero counts on various phrases. I avoided that by combining them with + to get enough to plot. The resulting graph shows SAS dominant until mid-2006 when SPSS takes the top position, followed by R, SAS, Stata in order: http://www.google.com/insights/search/#q=%22r%20code%20for%22%2B%22r%20 m anual%22%2B%22r%20tutorial%22%2B%22r%20graph%22%2C%22sas%20code%20for%2 2 %2B%22sas%20manual%22%2B%22sas%20tutorial%22%2B%22sas%20graph%22%2C%22s p ss%20code%20for%22%2B%22spss%20manual%22%2B%22spss%20tutorial%22%2B%22s p ss%20graph%22%2C%22stata%20code%20for%22%2B%22stata%20manual%22%2B%22st a ta%20tutorial%22%2B%22stata%20graph%22%2C%22s- plus%20code%20for%22%2B%22 s-plus%20manual%22%2Bs-plus%20tutorial%22%2B%22s- plus%20graph%22cmpt=q This might be a good one to add to http://r4stats.com/popularity Bob I see that there's a car, the R Code Mustang, that adding for gets rid of. Thanks for getting me back on a topic that I had given up on! Bob -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Joris Meys Sent: Thursday, June 24, 2010 7:56 PM To: Dario Solari Cc: r-help@r-project.org Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... Nice idea, but quite sensitive to search terms, if you compare your result on ... code with ... code for: http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code % 2 0 f or%2Cspss%20code%20forcmpt=q On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari dario.sol...@gmail.com wrote: First: excuse for my english My opinion: a useful font for measuring popoularity can be Google Insights for Search - http://www.google.com/insights/search/# Every person using a software like R, SAS, SPSS needs first to learn it. So probably he make a web-search for a manual, a tutorial, a guide. One can measure the share of this kind of serach query. This kind of results can be useful to determine trends of popularity. Example 1: R tutorial/manual/guide, SAS tutorial/manual/guide, SPSS tutorial/manual/guide http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%2 0 m a n ual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%2 2 % 2 B %22spss%20manual%22%2B%22spss%20guide%22%2C%22sas%20tutorial%22%2B%2 2 s a s %20manual%22%2B%22sas%20guide%22cmpt=q Example 2: R software, SAS software, SPSS software http://www.google.com/insights/search/#q=%22r%20software%22%2C%22sps s % 2 0
Re: [R] Popularity of R, SAS, SPSS, Stata...
Bob, i'm confused. You try a search with Google Scholar or with Google Insights for search? --- useful references for Google Insights for search: * matching terms: http://www.google.com/support/insights/bin/answer.py?hl=enanswer=94777 * interpreting search volumes: http://www.google.com/support/insights/bin/answer.py?hl=enanswer=92769 --- useful references for Google Scholar http://scholar.google.com/intl/en/scholar/refinesearch.html --- Seems that the OR option in Google Scholar doesn't work. Try to conctact the Google Scholar Support Centre: http://www.google.com/support/scholar/bin/request.py?contact_type=general On Sat, Jun 26, 2010 at 5:07 PM, Muenchen, Robert A (Bob) muenc...@utk.eduwrote: -Original Message- From: Joris Meys [mailto:jorism...@gmail.com] Sent: Friday, June 25, 2010 10:10 PM To: Muenchen, Robert A (Bob) Cc: Dario Solari; r-help@r-project.org Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... I had taken the opposite tack with Google Trends by subtracting keywords like: SAS -shoes -airlines -sonar... but never got as good results as that beautiful X code for search. When you see the end-of-semester panic bumps in traffic, you know you're nailing it! I have to eat those words already. The R code for search that showed a peak every December did not have quotes around it, so it was searching for those three words not the complete phrase. When you add the quotes, the peaks vanish. Don't swallow! You're looking through search terms, not through web pages. R code for regression, regression code R etc. are all valid searches, no quotation marks needed. I wondered why those clear peaks had vanished when I added quotes. Here's one that combines the search terms without the quotes. It shows several March/April October/November peaks: http://www.google.com/insights/search/#q=r%20code%20for%2Br%20manual%2Br %20tutorial%2Br%20graph%2Csas%20code%20for%2Bsas%20manual%2Bsas%20tutori al%2Bsas%20graph%2Cspss%20code%20for%2Bspss%20manual%2Bspss%20tutorial%2 Bspss%20graph%2Cstata%20code%20for%2Bstata%20manual%2Bstata%20tutorial%2 Bstata%20graph%2Cs-plus%20code%20for%2Bs-plus%20manual%2Bs-plus%20tutori al%2Bs-plus%20graphcmpt=q I've been trying to make sense of Google Scholar searches. I'm obviously missing something basic. Here are two searches on www.google.com: sas - gets 68M hits sas OR spss - gets 74.3M hits. A bigger number as OR would imply. But when I do the same searches on scholar.google.com, here's what I get: sas - gets 4.6M hits sas OR spss - gets 1.65M hits How on earth can an OR get you less?? Thanks, Bob http://www.google.com/insights/search/#q=code%20for%20r%2Ccode%20for%20 S AS%2Ccode%20for%20SPSS%2Ccode%20for%20matlabcmpt=q This one is nice too. You can see that the bump in the autumn semester for R is replacing the one for Matlab. Then in the spring semester Matlab stays high but R drops. And both the US and India always have a very large search index, whereas the rest of the world is essentially worthless. Which leads me to the conclusion that : 1) The results are probably coming from google.com, excluding local versions, and 2) in the US (and India), statistics is mainly taught in the autumn semester. Given the fact that daylight has a beneficial effect on the emotional well being, the impopularity of statistics is likely caused by unfortunate scheduling. Forget Excel. Google rocks! ;-) Cheers Joris Once you go the phrase route, you gain precision but end up with zero counts on various phrases. I avoided that by combining them with + to get enough to plot. The resulting graph shows SAS dominant until mid-2006 when SPSS takes the top position, followed by R, SAS, Stata in order: http://www.google.com/insights/search/#q=%22r%20code%20for%22%2B%22r%20 m anual%22%2B%22r%20tutorial%22%2B%22r%20graph%22%2C%22sas%20code%20for%2 2 %2B%22sas%20manual%22%2B%22sas%20tutorial%22%2B%22sas%20graph%22%2C%22s p ss%20code%20for%22%2B%22spss%20manual%22%2B%22spss%20tutorial%22%2B%22s p ss%20graph%22%2C%22stata%20code%20for%22%2B%22stata%20manual%22%2B%22st a ta%20tutorial%22%2B%22stata%20graph%22%2C%22s- plus%20code%20for%22%2B%22 s-plus%20manual%22%2Bs-plus%20tutorial%22%2B%22s- plus%20graph%22cmpt=q This might be a good one to add to http://r4stats.com/popularity Bob I see that there's a car, the R Code Mustang, that adding for gets rid of. Thanks for getting me back on a topic that I had given up on! Bob -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Joris Meys Sent: Thursday, June 24, 2010 7:56 PM To: Dario Solari Cc: r-help@r-project.org Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... Nice idea, but quite sensitive to search terms, if you compare your result on ... code with ... code for:
Re: [R] boot with strata: strata argument ignored?
On Sat, 26 Jun 2010, Bryan Hanson wrote: Hello All. I must be missing the really obvious here: mm - function(d, i) median(d[i]) b1 - boot(gravity$g, mm, R = 1000) b1 b2 - boot(gravity$g, mm, R = 1000, strata = gravity$series) b2 Both b1 and b2 seem to have done (almost) the same thing, but it looks like the strata argument in b2 has been ignored. However, str(b1) vs str(b2) does show that the strata have been noted correctly. But b2$t is a 1000 x 1 array, not a 1000 x 8 array (gravity$series is a factor with 8 levels). There is a more complex example in ?boot using the same data set that gives a result that seems to make sense (2 levels in the factor, so $t has 2 columns). I either misunderstand the expected behavior or I've missed some punctuation or syntax detail. Your punctuation and syntax is OK. Note: SISWR - function(x) sample(x,length(x),repl=TRUE) # no strata var(replicate(1000,median(SISWR(gravity$g [1] 0.4588338 # now stratify on series gsplit - split(gravity$g,gravity$series) var(replicate(1000,median(unlist(lapply(gsplit,SISWR) [1] 0.3882272 sqrt(.45) # this agrees with b1 [1] 0.6708204 sqrt(.39) # this agrees with b2 [1] 0.6244998 The effect of stratification depends on the relative amount of variation within vs between strata. This suggests there is not a lot: aov(g~series,gravity) Call: aov(formula = g ~ series, data = gravity) Terms: series Residuals Sum of Squares 2818.624 8239.376 Deg. of Freedom773 Residual standard error: 10.62394 Estimated effects may be unbalanced HTH, Chuck TIA, Bryan * Bryan Hanson Acting Chair Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA sessionInfo() R version 2.11.0 (2010-04-22) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] datasets tools grid graphics grDevices utils stats [8] methods base other attached packages: [1] boot_1.2-42brew_1.0-3 faraway_1.0.4 [4] GGally_0.2 xtable_1.5-6 mvbutils_2.5.1 [7] ggplot2_0.8.7 digest_0.4.2 reshape_0.8.3 [10] proto_0.3-8ChemoSpec_1.43 R.utils_1.4.0 [13] R.oo_1.7.2 R.methodsS3_1.2.0 rgl_0.91 [16] lattice_0.18-5 mvoutlier_1.4 plyr_0.1.9 [19] RColorBrewer_1.0-2 chemometrics_0.8 som_0.3-5 [22] robustbase_0.5-0-1 rpart_3.1-46 pls_2.1-0 [25] pcaPP_1.8-1mvtnorm_0.9-9 nnet_7.3-1 [28] mclust_3.4.4 MASS_7.3-5 lars_0.9-7 [31] e1071_1.5-23 class_7.3-2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict newdata question
Thanks Bill, that worked great!! You ask: # How can I use predict here, 'newdata' crashes predict(m1,newdata=wolf$predicted);wolf # it doesn't work To use predict() you need to give a fitted model object (here m1) and a *data frame* to specify the values of the predictors for which you want predictions. Here wolf$predicted is not a data frame, it is a vector. What I think you want is pv - predict(m1, newdata = wolf) That will get you linear predictors. To get probabilities you need to say so as probs - predict(m1, newdata = wolf, type = response) You can put these back into the data frame if you wish, e.g. wolf - within(wold, { lpreds - predict(m1, wolf) probs - predict(m1, wolf, type = response) }) Now if you look at head(wolf) you will see two extra columns. -Original Message- From: ymailto=mailto: href=mailto:r-help-boun...@r-project.org;r-help-boun...@r-project.org href=mailto: href=mailto:r-help-boun...@r-project.org;r-help-boun...@r-project.org ymailto=mailto:r-help-boun...@r-project.org; href=mailto:r-help-boun...@r-project.org;r-help-boun...@r-project.org [mailto: href=mailto: ymailto=mailto:r-help-boun...@r-project.org; href=mailto:r-help-boun...@r-project.org;r-help-boun...@r-project.org ymailto=mailto:r-help-boun...@r-project.org; href=mailto:r-help-boun...@r-project.org;r-help-boun...@r-project.org] On Behalf Of Felipe Carrillo Sent: Saturday, 26 June 2010 10:35 AM To: ymailto=mailto: href=mailto:r-h...@stat.math.ethz.ch;r-h...@stat.math.ethz.ch href=mailto: href=mailto:r-h...@stat.math.ethz.ch;r-h...@stat.math.ethz.ch ymailto=mailto:r-h...@stat.math.ethz.ch; href=mailto:r-h...@stat.math.ethz.ch;r-h...@stat.math.ethz.ch Subject: [R] predict newdata question Hi: I am using a subset of the below dataset to predict PRED_SUIT for the whole dataset but I am having trouble with 'newdata'. The model was created with 153 records and want to predict for 208 records. [lots of stuff omitted] wolf$prob99-(exp(wolf$predicted))/(1+exp(wolf$predicted)) head(wolf);dim(wolf) # How can I use predict here, 'newdata' crashes predict(m1,newdata=wolf$predicted);wolf # it doesn't work Thanks for any hints Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ ymailto=mailto: href=mailto:R-help@r-project.org;R-help@r-project.org href=mailto: href=mailto:R-help@r-project.org;R-help@r-project.org ymailto=mailto:R-help@r-project.org; href=mailto:R-help@r-project.org;R-help@r-project.org mailing list href= https://stat.ethz.ch/mailman/listinfo/r-help; target=_blank href=https://stat.ethz.ch/mailman/listinfo/r-help; target=_blank https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ ymailto=mailto:R-help@r-project.org; href=mailto:R-help@r-project.org;R-help@r-project.org mailing list href=https://stat.ethz.ch/mailman/listinfo/r-help; target=_blank https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Popularity of R, SAS, SPSS, Stata...
On 26 Giu, 17:19, Allan Engelhardt all...@cybaea.com wrote: On 26/06/10 16:07, Muenchen, Robert A (Bob) wrote: I've been trying to make sense of Google Scholar searches. I'm obviously missing something basic. Here are two searches onwww.google.com: sas - gets 68M hits sas OR spss - gets 74.3M hits. A bigger number as OR would imply. But when I do the same searches on scholar.google.com, here's what I get: sas - gets 4.6M hits sas OR spss - gets 1.65M hits How on earth can an OR get you less?? Try to use this search terms (in Google Scholar): SAS Institute, SPSS Inc, r project org... On 26 Giu, 17:19, Allan Engelhardt all...@cybaea.com wrote: ... It is obviously still not right, but closer. Happy reading of the articles by D. Sas, S.A.S. Eddington, etc. In this way you can avoid the Happy reading of the articles by D. Sas, S.A.S. Eddington, etc.: SAS -author:sas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] become a member of R user community
On 2010-06-26 3:52, (Ted Harding) wrote: On 25-Jun-10 21:46:13, Albert Lee, Ph.D. wrote: How do I become a member of R user community? Albert Lee, Ph.D. statistician 1. By using R 2. By subscribing to the R-help mailing list and keeping in touch with the rest of us! To subscribe your email address to the list, visit the R-help info page at: https://stat.ethz.ch/mailman/listinfo/r-help and follow the instructions under Subscribing to R-help. Welcome! Ted. Let me just add that, for very little money, you can also become a supporting member of the R Foundation. See the homepage 'Foundation' link or go directly to http://www.r-project.org/foundation/membership.html Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to group a large list of strings into categories based on string similarity?
Hi Martin, Thanks a lot for your advice. I tried the process you suggested as below, it worked, but in a different way that I planned. library(Biostrings) x - c(ACTCCCGCCGTTCGCGCGCAGCATGATCCTG, ACTCCCGCCGTTCGCGCGC, CAGGATCATGCTGCGCGCGAACGGCGGGAGT, CAGGATCATGCTGCGCGCGAANN, NCAGGATCATGCTGCGCGCGAAN, CAGGATCATGCTGCGCGCG, NNNCAGGATCATGCTGCGCGCGAANNN) names(x) - seq_along(x) dna - DNAStringSet(x) while (!all(width(dna) == width(dna - trimLRPatterns(N, N, dna {} names(dna)[order(dna)[rank(dna, ties.method=min)]] The output is, 1 2 3 4 4 6 4, this is the right answer after trimining N's, i.e. without considering N, which strings are the same. But actually, the match I planned is position-to-position match, i.e. 1st and 2nd strings are the same except for the N's So, the expected output is 1 1 2 2 3 2 4 Please advice. Thanks! --gang On Wed, Jun 23, 2010 at 7:55 PM, Martin Morgan mtmor...@fhcrc.org wrote: On 06/23/2010 07:46 PM, Martin Morgan wrote: On 06/23/2010 06:55 PM, G FANG wrote: Hi, I want to group a large list (20 million) of strings into categories based on string similarity? The specific problem is: given a list of DNA sequence as below ACTCCCGCCGTTCGCGCGCAGCATGATCCTG ACTCCCGCCGTTCGCGCGC CAGGATCATGCTGCGCGCGAACGGCGGGAGT CAGGATCATGCTGCGCGCGAANN CAGGATCATGCTGCGCGCG .. . NNNCCGTTCGCGCGCAGCATGATCCTG CGCGCGCAGCATGATCCTG GCGCGCGAACGGCGGGAGT NNCGCGCAGCATGATCCTG NNNTGCGCGCGAACGGCGGGAGT NNTTCGCGCGCAGCATGATCCTG 'N' is the missing letter It can be seen that some strings are the same except for those N's (i.e. N can match with any base) given this list of string, I want to have 1) a vector corresponding to each row (string), for each string assign an id, such that similar strings (those only differ at N's) have the same id 2) also get a mapping list from unique strings ('unique' in term of the same similarity defined above) to the ids I am a matlab user shifting to R. Please advice on efficient ways to do this. The Bioconductor Biostrings package has many tools for this sort of operation. See http://bioconductor.org/packages/release/Software.html Maybe a one-time install source('http://bioconductor.org/biocLite.R') biocLite('Biostrings') then library(Biostrings) x - c(ACTCCCGCCGTTCGCGCGCAGCATGATCCTG, ACTCCCGCCGTTCGCGCGC, CAGGATCATGCTGCGCGCGAACGGCGGGAGT, CAGGATCATGCTGCGCGCGAANN, NCAGGATCATGCTGCGCGCGAAN, CAGGATCATGCTGCGCGCG, NNNCAGGATCATGCTGCGCGCGAANNN) names(x) - seq_along(x) dna - DNAStringSet(x) while (!all(width(dna) == width(dna - trimLRPatterns(N, N, dna {} names(dna)[rank(dna)] oops, maybe closer to names(dna)[order(dna)[rank(dna, ties.method=min)]] although there might be a faster way (e.g., match 8, 4, 2, 1 N's). Also, your sequences likely come from a fasta file (Biostrings::readFASTA) or a text file with a column of sequences (ShortRead::readXStringColumns) or from alignment software (ShortRead::readAligned / ShortRead::readFastq). If you go this route you'll want to address questions to the Bioconductor mailing list http://bioconductor.org/docs/mailList.html Martin Thanks! Gang __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Export Results
On Sat, Jun 26, 2010 at 7:42 AM, Tal Galili tal.gal...@gmail.com wrote: And there are also the brew, and Sweave packages (as Henrique mentioned). Also, odfWeave and Sweave via LyX. I believe that this is FAQed. Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] boot with strata: strata argument ignored?
Thanks Chuck, I understand much better what is going on with your example. But I'm still uncertain why the b2$t array does not have the dimensions of R x no. of strata. Any further insight would be appreciated. Bryan * Bryan Hanson Acting Chair Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA On 6/26/10 12:43 PM, Charles C. Berry cbe...@tajo.ucsd.edu wrote: On Sat, 26 Jun 2010, Bryan Hanson wrote: Hello All. I must be missing the really obvious here: mm - function(d, i) median(d[i]) b1 - boot(gravity$g, mm, R = 1000) b1 b2 - boot(gravity$g, mm, R = 1000, strata = gravity$series) b2 Both b1 and b2 seem to have done (almost) the same thing, but it looks like the strata argument in b2 has been ignored. However, str(b1) vs str(b2) does show that the strata have been noted correctly. But b2$t is a 1000 x 1 array, not a 1000 x 8 array (gravity$series is a factor with 8 levels). There is a more complex example in ?boot using the same data set that gives a result that seems to make sense (2 levels in the factor, so $t has 2 columns). I either misunderstand the expected behavior or I've missed some punctuation or syntax detail. Your punctuation and syntax is OK. Note: SISWR - function(x) sample(x,length(x),repl=TRUE) # no strata var(replicate(1000,median(SISWR(gravity$g [1] 0.4588338 # now stratify on series gsplit - split(gravity$g,gravity$series) var(replicate(1000,median(unlist(lapply(gsplit,SISWR) [1] 0.3882272 sqrt(.45) # this agrees with b1 [1] 0.6708204 sqrt(.39) # this agrees with b2 [1] 0.6244998 The effect of stratification depends on the relative amount of variation within vs between strata. This suggests there is not a lot: aov(g~series,gravity) Call: aov(formula = g ~ series, data = gravity) Terms: series Residuals Sum of Squares 2818.624 8239.376 Deg. of Freedom773 Residual standard error: 10.62394 Estimated effects may be unbalanced HTH, Chuck TIA, Bryan * Bryan Hanson Acting Chair Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA sessionInfo() R version 2.11.0 (2010-04-22) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] datasets tools grid graphics grDevices utils stats [8] methods base other attached packages: [1] boot_1.2-42brew_1.0-3 faraway_1.0.4 [4] GGally_0.2 xtable_1.5-6 mvbutils_2.5.1 [7] ggplot2_0.8.7 digest_0.4.2 reshape_0.8.3 [10] proto_0.3-8ChemoSpec_1.43 R.utils_1.4.0 [13] R.oo_1.7.2 R.methodsS3_1.2.0 rgl_0.91 [16] lattice_0.18-5 mvoutlier_1.4 plyr_0.1.9 [19] RColorBrewer_1.0-2 chemometrics_0.8 som_0.3-5 [22] robustbase_0.5-0-1 rpart_3.1-46 pls_2.1-0 [25] pcaPP_1.8-1mvtnorm_0.9-9 nnet_7.3-1 [28] mclust_3.4.4 MASS_7.3-5 lars_0.9-7 [31] e1071_1.5-23 class_7.3-2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating Summaries for each level of a Categorical variable
Look at the summary.formula function inside package Hmisc Christos Date: Sat, 26 Jun 2010 05:17:34 -0700 From: raoul.t.dso...@gmail.com To: r-help@r-project.org Subject: [R] Calculating Summaries for each level of a Categorical variable Hi, I have a dataset which has a categorical variable R,a count variable C (integer) and 4 or more numeric variables (A,T,W,H - integers) containing measures for R. I would like to summarize each level of the variable R by the average for A,T,W and H. I have written a function to calculate weighted averages using C as the weight and this is given below. The function works perfectly but how do I add the additional dimension I require to this function? Dataset: RT= R A T W H R1 10 20 20 10 R2 60 20 50 10 R3 45 10 20 50 R4 68 50 20 10 R1 73 20 40 46 R3 25 30 10 54 R3 36 90 20 10 R2 29 10 30 30 # FUNCTION TO CALCULATE THE WEIGHTED AVERAGE FOR A WEIGHTED BY C WA-function(A,C) { sp_A-c(A %*% C) sum_C-sum(C) WA-sp_A/sum_C return(WA) } I am trying to incorporate the additional step of calculating the weighted average of A,T,W and H for each level of R. Need help with this. Thanks in advance! Raoul -- View this message in context: http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269349.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ Hotmail: Powerful Free email with security by Microsoft. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dynamic panelmodel pgmm
Hi, I want to estimate a dynamic paneldata model with the following code, but unfortenately I received the error message below. form-PB~Activity+Solvency+Cap_Int dynpanel-pgmm(dynformula(form,list(1,1,1,1)),data=panel[1:2185,1:37],effect=twoways,model=onestep,index=c(Aktie,Datum),gmm.inst=~PB,lag.gmm=list(c(2,12)),transformation=ld) Fehler in FUN(X[[1L]], ...) : Indizierung außerhalb der Grenzen dim(panel) [1] 3408637 Best regards, Marco -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] optim() not finding optimal values
I am trying to use optim() to minimize a sum-of-squared deviations function based upon four parameters. The basic function is defined as ... SPsse - function(par,B,CPE,SSE.only=TRUE) { n - length(B) # get number of years of data B0 - par[B0]# isolate B0 parameter K - par[K] # isolate K parameter q - par[q] # isolate q parameter r - par[r] # isolate r parameter predB - numeric(n) predB[1] - B0 for (i in 2:n) predB[i] - predB[i-1]+r*predB[i-1]*(1-predB[i-1]/K)-B[i-1] predCPE - q*predB sse - sum((CPE-predCPE)^2) if (SSE.only) sse else list(sse=sse,predB=predB,predCPE=predCPE) } My call to optim() looks like this # the data d - data.frame(catch= c(9,113300,155860,181128,198584,198395,139040,109969,71896,59314,62300,65343,76990,88606,118016,108250,108674), cpe=c(109.1,112.4,110.5,99.1,84.5,95.7,74.1,70.2,63.1,66.4,60.5,89.9,117.0,93.0,116.6,90.0,105.1)) pars - c(80,100,0.0001,0.17) # put all parameters into one vector names(pars) - c(B0,K,q,r) # name the parameters ( SPoptim - optim(pars,SPsse,B=d$catch,CPE=d$cpe) )# run optim() This produces parameter estimates, however, that are not at the minimum value of the SPsse function. For example, these parameter estimates produce a smaller SPsse, parsbox - c(732506,1160771,0.0001484,0.4049) names(parsbox) - c(B0,K,q,r) ( res2 - SPsse(parsbox,d$catch,d$cpe,SSE.only=FALSE) ) Setting the starting values near the parameters shown in parsbox even resulted in a movement away from (to a larger SSE) those parameter values. ( SPoptim2 - optim(parsbox,SPsse,B=d$catch,CPE=d$cpe) )# run optim() This issue most likely has to do with my lack of understanding of optimization routines but I'm thinking that it may have to do with the optimization method used, tolerance levels in the optim algorithm, or the shape of the surface being minimized. Ultimately I was hoping to provide an alternative method to fisheries biologists who use Excel's solver routine. If anyone can offer any help or insight into my problem here I would be greatly appreciative. Thank you in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
No I mean something like this, assuming that the iris dataset contains the full population and we want to see if Setaso have a different mean than the population (the null would be that there is no difference in sepal width between species, or that species tells nothing about sepal width): out1 - replicate( 10, mean(sample(iris$Sepal.Width, 50)) ) obs1 - mean( iris$Sepal.Width[1:50] ) hist(out1, xlim=range(out1,obs1)) abline(v=obs1) mean( out1 obs1 ) I don't have a reference (other than a text book that defines sampling distributions). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 From: Atte Tenkanen [mailto:atte...@utu.fi] Sent: Friday, June 25, 2010 10:08 PM To: Atte Tenkanen Cc: Greg Snow; David Winsemius; R mailing list Subject: Re: [R] Wilcoxon signed rank test and its requirements Atte Tenkanen kirjoitti 26.6.2010 kello 5.15: Greg Snow kirjoitti 25.6.2010 kello 21.55: Let me see if I understand. You actually have the data for the whole population (the entire piece) but you have some pre-defined sections that you want to see if they differ from the population, or more meaningfully they are different from a randomly selected set of measures. Is that correct? If so, since you have the entire population of interest you can create the actual sampling distribution (or a good approximation of it). Just take random samples from the population of the given size (matching the subset you are interested in) and calculate the means (or other value of interest), probably 10,000 to 1,000,000 samples. Now compare the value from your predefined subset to the set of random values you generated to see if it is in the tail or not. I check, so you mean doing it this way: t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean(SAMPLE), alt = less) NO, this way: t.test(POPUL[sample(1:length(POPUL), length(SAMPLE), replace = FALSE)], mu=mean(SAMPLE), alt = less) Atte Atte -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.orgmailto:greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Atte Tenkanen Sent: Thursday, June 24, 2010 11:04 PM To: David Winsemius Cc: R mailing list Subject: Re: [R] Wilcoxon signed rank test and its requirements The values come from this kind of process: The musical composition is segmented into so-called 'pitch-class segments' and these segments are compared with one reference set with a distance function. Only some distance values are possible. These distance values can be averaged over music bars which produces smoother distribution and the 'comparison curve' that illustrates the distances according to the reference set through a musical piece result in more readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original values. then, I want to pick only some regions from the piece and compare those values of those regions, whether they are higher than the mean of all values. Atte On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n250-300 I do not understand why there should be so many ties. You have not described the measurement process or units. ( ... although you offer a glipmse without much background later.) i would like to test, whether the mean of the sample differ significantly from the population mean. Why? What is the purpose of this investigation? Why should the mean of a sample be that important? The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.png That picture does not offer much insidght into the features of that measurement. It appears to have much more structure than I would expect for a sample from a smooth unimodal underlying population. -- David. Atte On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric
Re: [R] boot with strata: strata argument ignored?
On Sat, 26 Jun 2010, Bryan Hanson wrote: Thanks Chuck, I understand much better what is going on with your example. But I'm still uncertain why the b2$t array does not have the dimensions of R x no. of strata. Because the test statistic returned by mm() is a scalar. It has nothing to do with the use or number of strata. Look at what the first case in example( boot ) is doing: ncol(boot(grav1, diff.means, R=999, stype=f)$t) [1] 2 ncol(boot(grav1, diff.means, R=999, stype=f,strata=grav1[,1])$t) [1] 2 diff.means(grav1,1:nrow(grav1)) [1] -4.100549 14.722902 Chuck Any further insight would be appreciated. Bryan * Bryan Hanson Acting Chair Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA On 6/26/10 12:43 PM, Charles C. Berry cbe...@tajo.ucsd.edu wrote: On Sat, 26 Jun 2010, Bryan Hanson wrote: Hello All. I must be missing the really obvious here: mm - function(d, i) median(d[i]) b1 - boot(gravity$g, mm, R = 1000) b1 b2 - boot(gravity$g, mm, R = 1000, strata = gravity$series) b2 Both b1 and b2 seem to have done (almost) the same thing, but it looks like the strata argument in b2 has been ignored. However, str(b1) vs str(b2) does show that the strata have been noted correctly. But b2$t is a 1000 x 1 array, not a 1000 x 8 array (gravity$series is a factor with 8 levels). There is a more complex example in ?boot using the same data set that gives a result that seems to make sense (2 levels in the factor, so $t has 2 columns). I either misunderstand the expected behavior or I've missed some punctuation or syntax detail. Your punctuation and syntax is OK. Note: SISWR - function(x) sample(x,length(x),repl=TRUE) # no strata var(replicate(1000,median(SISWR(gravity$g [1] 0.4588338 # now stratify on series gsplit - split(gravity$g,gravity$series) var(replicate(1000,median(unlist(lapply(gsplit,SISWR) [1] 0.3882272 sqrt(.45) # this agrees with b1 [1] 0.6708204 sqrt(.39) # this agrees with b2 [1] 0.6244998 The effect of stratification depends on the relative amount of variation within vs between strata. This suggests there is not a lot: aov(g~series,gravity) Call: aov(formula = g ~ series, data = gravity) Terms: series Residuals Sum of Squares 2818.624 8239.376 Deg. of Freedom773 Residual standard error: 10.62394 Estimated effects may be unbalanced HTH, Chuck TIA, Bryan * Bryan Hanson Acting Chair Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA sessionInfo() R version 2.11.0 (2010-04-22) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] datasets tools grid graphics grDevices utils stats [8] methods base other attached packages: [1] boot_1.2-42brew_1.0-3 faraway_1.0.4 [4] GGally_0.2 xtable_1.5-6 mvbutils_2.5.1 [7] ggplot2_0.8.7 digest_0.4.2 reshape_0.8.3 [10] proto_0.3-8ChemoSpec_1.43 R.utils_1.4.0 [13] R.oo_1.7.2 R.methodsS3_1.2.0 rgl_0.91 [16] lattice_0.18-5 mvoutlier_1.4 plyr_0.1.9 [19] RColorBrewer_1.0-2 chemometrics_0.8 som_0.3-5 [22] robustbase_0.5-0-1 rpart_3.1-46 pls_2.1-0 [25] pcaPP_1.8-1mvtnorm_0.9-9 nnet_7.3-1 [28] mclust_3.4.4 MASS_7.3-5 lars_0.9-7 [31] e1071_1.5-23 class_7.3-2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optim() not finding optimal values
Your function is very irregular, so the optim is likely to return local minima rather than global minima. Try different methods (SANN, CG, BFGS) and see if you get the result you need. As with all numerical optimsation, I would check the sensitivity of the results to starting values. Nikhil Kaza Asst. Professor, City and Regional Planning University of North Carolina nikhil.l...@gmail.com On Jun 26, 2010, at 4:27 PM, Derek Ogle wrote: I am trying to use optim() to minimize a sum-of-squared deviations function based upon four parameters. The basic function is defined as ... SPsse - function(par,B,CPE,SSE.only=TRUE) { n - length(B) # get number of years of data B0 - par[B0]# isolate B0 parameter K - par[K] # isolate K parameter q - par[q] # isolate q parameter r - par[r] # isolate r parameter predB - numeric(n) predB[1] - B0 for (i in 2:n) predB[i] - predB[i-1]+r*predB[i-1]*(1-predB[i-1]/K)- B[i-1] predCPE - q*predB sse - sum((CPE-predCPE)^2) if (SSE.only) sse else list(sse=sse,predB=predB,predCPE=predCPE) } My call to optim() looks like this # the data d - data.frame(catch= c (9,113300,155860,181128,198584,198395,139040,109969,71896,59314,62300,65343,76990,88606,118016,108250,108674 ), cpe = c (109.1,112.4,110.5,99.1,84.5,95.7,74.1,70.2,63.1,66.4,60.5,89.9,117.0,93.0,116.6,90.0,105.1 )) pars - c(80,100,0.0001,0.17) # put all parameters into one vector names(pars) - c(B0,K,q,r) # name the parameters ( SPoptim - optim(pars,SPsse,B=d$catch,CPE=d$cpe) )# run optim() This produces parameter estimates, however, that are not at the minimum value of the SPsse function. For example, these parameter estimates produce a smaller SPsse, parsbox - c(732506,1160771,0.0001484,0.4049) names(parsbox) - c(B0,K,q,r) ( res2 - SPsse(parsbox,d$catch,d$cpe,SSE.only=FALSE) ) Setting the starting values near the parameters shown in parsbox even resulted in a movement away from (to a larger SSE) those parameter values. ( SPoptim2 - optim(parsbox,SPsse,B=d$catch,CPE=d$cpe) )# run optim() This issue most likely has to do with my lack of understanding of optimization routines but I'm thinking that it may have to do with the optimization method used, tolerance levels in the optim algorithm, or the shape of the surface being minimized. Ultimately I was hoping to provide an alternative method to fisheries biologists who use Excel's solver routine. If anyone can offer any help or insight into my problem here I would be greatly appreciative. Thank you in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
Thanks! The results were similar to the t.test p-values show (I have four samples). Thank you also for using that replicate-function which i didn't know. Till now I have just used for-loops that are not so beautiful... i don't know about the speed. Have to test that. Atte Greg Snow kirjoitti 26.6.2010 kello 23.30: No I mean something like this, assuming that the iris dataset contains the full population and we want to see if Setaso have a different mean than the population (the null would be that there is no difference in sepal width between species, or that species tells nothing about sepal width): out1 - replicate( 10, mean(sample(iris$Sepal.Width, 50)) ) obs1 - mean( iris$Sepal.Width[1:50] ) hist(out1, xlim=range(out1,obs1)) abline(v=obs1) mean( out1 obs1 ) I donÕt have a reference (other than a text book that defines sampling distributions). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 From: Atte Tenkanen [mailto:atte...@utu.fi] Sent: Friday, June 25, 2010 10:08 PM To: Atte Tenkanen Cc: Greg Snow; David Winsemius; R mailing list Subject: Re: [R] Wilcoxon signed rank test and its requirements Atte Tenkanen kirjoitti 26.6.2010 kello 5.15: Greg Snow kirjoitti 25.6.2010 kello 21.55: Let me see if I understand. You actually have the data for the whole population (the entire piece) but you have some pre-defined sections that you want to see if they differ from the population, or more meaningfully they are different from a randomly selected set of measures. Is that correct? If so, since you have the entire population of interest you can create the actual sampling distribution (or a good approximation of it). Just take random samples from the population of the given size (matching the subset you are interested in) and calculate the means (or other value of interest), probably 10,000 to 1,000,000 samples. Now compare the value from your predefined subset to the set of random values you generated to see if it is in the tail or not. I check, so you mean doing it this way: t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean (SAMPLE), alt = less) NO, this way: t.test(POPUL[sample(1:length(POPUL), length(SAMPLE), replace = FALSE)], mu=mean(SAMPLE), alt = less) Atte Atte -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Atte Tenkanen Sent: Thursday, June 24, 2010 11:04 PM To: David Winsemius Cc: R mailing list Subject: Re: [R] Wilcoxon signed rank test and its requirements The values come from this kind of process: The musical composition is segmented into so-called 'pitch-class segments' and these segments are compared with one reference set with a distance function. Only some distance values are possible. These distance values can be averaged over music bars which produces smoother distribution and the 'comparison curve' that illustrates the distances according to the reference set through a musical piece result in more readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original values. then, I want to pick only some regions from the piece and compare those values of those regions, whether they are higher than the mean of all values. Atte On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample nÅ250-300 I do not understand why there should be so many ties. You have not described the measurement process or units. ( ... although you offer a glipmse without much background later.) i would like to test, whether the mean of the sample differ significantly from the population mean. Why? What is the purpose of this investigation? Why should the mean of a sample be that important? The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.png That picture does not offer much insidght into the features of that measurement. It appears to have much more structure than I would expect for a sample from a smooth unimodal underlying population. -- David. Atte On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are
Re: [R] subset arg in subset(). was: converting result of substitute to 'ordidnary' expression
It does work, thank you, but the literal 5x now needs to be quoted by expression(): do.call(subset, list(dat, expression(5x))) x y 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 This is ok, but the standard subset(dat, 5x) looks more readable. Anyway, thank you for your help, it's a nice paradigm. Vadim -Original Message- From: bill.venab...@csiro.au [mailto:bill.venab...@csiro.au] Sent: Saturday, June 26, 2010 1:08 AM To: Vadim Ogranovich; r-help@r-project.org Subject: RE: [R] subset arg in subset(). was: converting result of substitute to 'ordidnary' expression Here is another one that works: do.call(subset, list(dat, subsetexp)) x y 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Vadim Ogranovich Sent: Saturday, 26 June 2010 11:13 AM To: 'r-help@r-project.org' Subject: [R] subset arg in subset(). was: converting result of substitute to 'ordidnary' expression Dear R users, Please disregard my previous post converting result of substitute to 'ordidnary' expression. The problem I have has nothing to do with substitute. Consider: dat - data.frame(x=1:10, y=1:10) subsetexp - expression(5x) ## this does work subset(dat, eval(subsetexp)) x y 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 ## and so does this subset(dat, 5x) x y 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 ## but this doesn't work subset(dat, subsetexp) Error in subset.data.frame(dat, subsetexp) : 'subset' must evaluate to logical Why did the last expression fail and why it worked with eval()? Thank you very much for your help, Vadim Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. Jump Trading, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. Jump Trading, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] package(pls) - extracting explained Y-variance
Dear R-help users, I'd like to use the R-package pls and want to extract the explained Y-variance to identify the important (PLS-) principal components in my model, related to the y-data. For explained X-variance there is a function: explvar(). If I understand it right, the summary() function gives an overview, where the y-variance is shown, but I can't extract it for plotting. How can I do it, withou pencil and paper? Thank you very much for help, Christian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different standard errors from R and other software
If I understand correctly from their website, discrete choice models are mostly generalized linear models with the common link functions for discrete data? Apart from a few names I didn't recognize, all analyses seem quite standard to me. So I wonder why you would write the log-likelihood yourself for techniques that are implemented in R. Unless I missed something pretty important, or you want to do a specific analysis that wasn't clear to me, you should take a closer look at the possibilities in R for generalized linear (mixed) modelling and so on. Binary choice translates to a simple glm with a logit function. Multinomial choice can be done with eg. multinom() from nnet. Ordered choice can be done with polr() from the MASS package. A nice one to look at is the package mgcv or gamm4 in case of big datasets. They offer very flexible models that can include random terms, specific variance-covariance structures and non-linear relations in the form of splines. Apologies if this is all obvious and known to you. In that case you might want to specify what exactly it is you are comparing and how exactly you calculated it yourself. Cheers Joris On Fri, Jun 25, 2010 at 11:47 PM, Min Chen chenmin0...@gmail.com wrote: Hi all, Sorry to bother you. I'm estimating a discrete choice model in R using the maxBFGS command. Since I wrote the log-likelihood myself, in order to double check, I run the same model in Limdep. It turns out that the coefficient estimates are quite close; however, the standard errors are very different. I also computed the hessian and outer product of the gradients in R using the numDeriv package, but the results are still very different from those in Limdep. Is it the routine to compute the inverse hessian that causes the difference? Thank you very much! Best wishes. Min -- Min Chen Ph.D. Candidate Department of Agricultural, Food, and Resource Economics 125 Cook Hall Michigan State University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use a data frame whose name is stored as a string variable?
Thanks! Works like a charm. -Seth -- View this message in context: http://r.789695.n4.nabble.com/use-a-data-frame-whose-name-is-stored-as-a-string-variable-tp2269095p2269732.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Recursive indexing failed at level 2
Why do you use *double* square brackets on the left side of the replacement? From the help info for [[: The most important distinction between [, [[ and $ is that the [ can select more than one element whereas the other two select a single element. You seem to be selecting 20 elements. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Hargreaves Sent: Saturday, 26 June 2010 9:54 PM To: r-help@r-project.org Subject: [R] Recursive indexing failed at level 2 Dear fellow R users, I am replacing elements of a list like so: pulse_subset[[1:20]]=unlist(pulse[i])[1:20] where pulse is a list of lists, and pulse [i] has 20 values. This gives the error Recursive Indexing failed at level 2. But, interestingly this instruction is part of a loop which has gone through about 200,000 iterations before giving this error. Actual code: pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] Error in pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] - unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] : recursive indexing failed at level 2 If anyone could shed some light I'd be rather grateful. Regards, Jim Hargreaves __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
Atte, note the similarity between what Greg described and a bootstrap. The difference to a true bootstrap is that in Greg's version you subsample the population (or in other instances the data). This is known as subsampling bootstrap and discussed in Politis, Romano, and Wolf (1999). HTH, Daniel -- View this message in context: http://r.789695.n4.nabble.com/Wilcoxon-signed-rank-test-and-its-requirements-tp2266165p2269775.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ways to work with R and Postgres
Hi, I post this message to the general r-help list hoping anyone within a wider range have suggestions: There are three ways to integration R and postgres, especially on 64bit Microsoft windows Platform, 1. via RODBC package, which has 32 bit and 64 bit version for windows 2. via RPostgres interface, which only has 32bit version currently 3. via plr for Greenplum, which only supports a few kinds of functionality, and supports only specific versions of R. Do you have any idea about the advantages and disadvantages of each, and the differences among them Your sincerely Xiaobo.Gu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optim() not finding optimal values
Derek, The problem is that your function is poorly scaled. You can see that the parameters vary over 10 orders of magnitude (from 1e-04 to 1e06). You can get good convergence once you properly scale your function. Here is how you do it: par.scale - c(1.e06, 1.e06, 1.e-06, 1.0) SPoptim - optim(pars, SPsse, B=d$catch, CPE=d$cpe, control=list(maxit=1500, parscale=par.scale)) SPoptim $par B0Kqr 7.329553e+05 1.160097e+06 1.484375e-04 4.050476e-01 $value [1] 1619.487 $counts function gradient 1401 NA $convergence [1] 0 $message NULL Hope this helps, Ravi. Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu - Original Message - From: Derek Ogle do...@northland.edu Date: Saturday, June 26, 2010 4:28 pm Subject: [R] optim() not finding optimal values To: R (r-help@R-project.org) r-help@r-project.org I am trying to use optim() to minimize a sum-of-squared deviations function based upon four parameters. The basic function is defined as ... SPsse - function(par,B,CPE,SSE.only=TRUE) { n - length(B) # get number of years of data B0 - par[B0]# isolate B0 parameter K - par[K] # isolate K parameter q - par[q] # isolate q parameter r - par[r] # isolate r parameter predB - numeric(n) predB[1] - B0 for (i in 2:n) predB[i] - predB[i-1]+r*predB[i-1]*(1-predB[i-1]/K)-B[i-1] predCPE - q*predB sse - sum((CPE-predCPE)^2) if (SSE.only) sse else list(sse=sse,predB=predB,predCPE=predCPE) } My call to optim() looks like this # the data d - data.frame(catch= c(9,113300,155860,181128,198584,198395,139040,109969,71896,59314,62300,65343,76990,88606,118016,108250,108674), cpe=c(109.1,112.4,110.5,99.1,84.5,95.7,74.1,70.2,63.1,66.4,60.5,89.9,117.0,93.0,116.6,90.0,105.1)) pars - c(80,100,0.0001,0.17) # put all parameters into one vector names(pars) - c(B0,K,q,r) # name the parameters ( SPoptim - optim(pars,SPsse,B=d$catch,CPE=d$cpe) )# run optim() This produces parameter estimates, however, that are not at the minimum value of the SPsse function. For example, these parameter estimates produce a smaller SPsse, parsbox - c(732506,1160771,0.0001484,0.4049) names(parsbox) - c(B0,K,q,r) ( res2 - SPsse(parsbox,d$catch,d$cpe,SSE.only=FALSE) ) Setting the starting values near the parameters shown in parsbox even resulted in a movement away from (to a larger SSE) those parameter values. ( SPoptim2 - optim(parsbox,SPsse,B=d$catch,CPE=d$cpe) )# run optim() This issue most likely has to do with my lack of understanding of optimization routines but I'm thinking that it may have to do with the optimization method used, tolerance levels in the optim algorithm, or the shape of the surface being minimized. Ultimately I was hoping to provide an alternative method to fisheries biologists who use Excel's solver routine. If anyone can offer any help or insight into my problem here I would be greatly appreciative. Thank you in advance. __ R-help@r-project.org mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ways to work with R and Postgres
2010/6/27 顾小波 guxiaobo1...@gmail.com: Hi, I post this message to the general r-help list hoping anyone within a wider range have suggestions: There are three ways to integration R and postgres, especially on 64bit Microsoft windows Platform, 1. via RODBC package, which has 32 bit and 64 bit version for windows 2. via RPostgres interface, which only has 32bit version currently 3. via plr for Greenplum, which only supports a few kinds of functionality, and supports only specific versions of R. Do you have any idea about the advantages and disadvantages of each, and the differences among them There is also the RpgSQL package. In addition the sqldf package uses RpgSQL. sqldf by default uses SQLite but if the RpgSQL package is loaded then it defaults to PostgreSQL. Here BOD Is a built in R data.frame: library(sqldf) Loading required package: DBI Loading required package: RSQLite Loading required package: RSQLite.extfuns Loading required package: gsubfn Loading required package: proto Loading required package: chron library(RpgSQL) Loading required package: RJDBC BOD Time demand 118.3 22 10.3 33 19.0 44 16.0 55 15.6 67 19.8 sqldf('select regr_slope(demand, Time) slope, + regr_intercept(demand, Time) intercept, + corr(demand, Time) corr from BOD') Loading required package: tcltk Loading Tcl/Tk interface ... done slope intercept corr 1 1.721429 8.521429 0.8030693 coef(lm(demand ~ Time, BOD)); cor(BOD$Time, BOD$demand) (Intercept)Time 8.5214291.721429 [1] 0.8030693 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optim() not finding optimal values
A slightly better scaling is the following: par.scale - c(1.e06, 1.e06, 1.e-05, 1) # q is scaled differently SPoptim - optim(pars, SPsse, B=d$catch, CPE=d$cpe, control=list(maxit=1500, parscale=par.scale)) SPoptim $par B0Kqr 7.320899e+05 1.159939e+06 1.485560e-04 4.051735e-01 $value [1] 1619.482 $counts function gradient 585 NA $convergence [1] 0 $message NULL Note that the Nelder-Mead converges in half the number of iterations compared to that under previous scaling. Ravi. Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu - Original Message - From: Ravi Varadhan rvarad...@jhmi.edu Date: Sunday, June 27, 2010 0:42 am Subject: Re: [R] optim() not finding optimal values To: Derek Ogle do...@northland.edu Cc: R (r-help@R-project.org) r-help@r-project.org Derek, The problem is that your function is poorly scaled. You can see that the parameters vary over 10 orders of magnitude (from 1e-04 to 1e06). You can get good convergence once you properly scale your function. Here is how you do it: par.scale - c(1.e06, 1.e06, 1.e-06, 1.0) SPoptim - optim(pars, SPsse, B=d$catch, CPE=d$cpe, control=list(maxit=1500, parscale=par.scale)) SPoptim $par B0Kqr 7.329553e+05 1.160097e+06 1.484375e-04 4.050476e-01 $value [1] 1619.487 $counts function gradient 1401 NA $convergence [1] 0 $message NULL Hope this helps, Ravi. Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu - Original Message - From: Derek Ogle do...@northland.edu Date: Saturday, June 26, 2010 4:28 pm Subject: [R] optim() not finding optimal values To: R (r-help@R-project.org) r-help@r-project.org I am trying to use optim() to minimize a sum-of-squared deviations function based upon four parameters. The basic function is defined as ... SPsse - function(par,B,CPE,SSE.only=TRUE) { n - length(B) # get number of years of data B0 - par[B0]# isolate B0 parameter K - par[K] # isolate K parameter q - par[q] # isolate q parameter r - par[r] # isolate r parameter predB - numeric(n) predB[1] - B0 for (i in 2:n) predB[i] - predB[i-1]+r*predB[i-1]*(1-predB[i-1]/K)-B[i-1] predCPE - q*predB sse - sum((CPE-predCPE)^2) if (SSE.only) sse else list(sse=sse,predB=predB,predCPE=predCPE) } My call to optim() looks like this # the data d - data.frame(catch= c(9,113300,155860,181128,198584,198395,139040,109969,71896,59314,62300,65343,76990,88606,118016,108250,108674), cpe=c(109.1,112.4,110.5,99.1,84.5,95.7,74.1,70.2,63.1,66.4,60.5,89.9,117.0,93.0,116.6,90.0,105.1)) pars - c(80,100,0.0001,0.17) # put all parameters into one vector names(pars) - c(B0,K,q,r) # name the parameters ( SPoptim - optim(pars,SPsse,B=d$catch,CPE=d$cpe) )# run optim() This produces parameter estimates, however, that are not at the minimum value of the SPsse function. For example, these parameter estimates produce a smaller SPsse, parsbox - c(732506,1160771,0.0001484,0.4049) names(parsbox) - c(B0,K,q,r) ( res2 - SPsse(parsbox,d$catch,d$cpe,SSE.only=FALSE) ) Setting the starting values near the parameters shown in parsbox even resulted in a movement away from (to a larger SSE) those parameter values. ( SPoptim2 - optim(parsbox,SPsse,B=d$catch,CPE=d$cpe) )# run optim() This issue most likely has to do with my lack of understanding of optimization routines but I'm thinking that it may have to do with the optimization method used, tolerance levels in the optim algorithm, or the shape of the surface being minimized. Ultimately I was hoping to provide an alternative method to fisheries biologists who use Excel's solver routine. If anyone can offer any help or insight into my problem here I would be greatly appreciative. Thank you in advance. __ R-help@r-project.org mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code.
Re: [R] Recoding dates to session id in a longitudinal dataset
-- Forwarded message -- From: John-Paul Bogers john-paul.bog...@ua.ac.be Date: Sat, Jun 26, 2010 at 10:14 PM Subject: Re: [R] Recoding dates to session id in a longitudinal dataset To: jim holtman jholt...@gmail.com Dear Jim, he data concerns HPV screening data. The data looks as follows pat1 sampledate1 HPV16 0.3 pat2 sampledate2 HPV16 0 pat3 sampledata3 HPV16 0.5 pat1 sampledate4 HPV16 0.6 pat4 sampledate5 HPV16 0 pat2 sampledate6 HPV16 0 pat1 sampledate7 HPV16 0 What I would like is pat1 1 HPV16 0.3 pat2 1 HPV16 0 pat3 1 HPV16 0.5 pat1 2 HPV16 0.6 pat4 1 HPV16 0 pat2 2 HPV16 0 pat1 3 HPV16 0 I would like to recode sampledate (real date, in date format) to session sequence (first sample of this patient, second sample of this patient, ) I hope this makes it clear. Thanks JP PS: I answered this as a reply to your private mail, how do I get this on the mailinglist? On Sat, Jun 26, 2010 at 7:59 PM, jim holtman jholt...@gmail.com wrote: It would be useful if you could provide an example of what the data looks like now and what you would like it to look like; otherwise it is impossible to help. On Sat, Jun 26, 2010 at 8:37 AM, John-Paul Bogers john-paul.bog...@ua.ac.be wrote: Hi, I'm fairly new to R but I have a large dataset (30 obs) containing patient material. Some patients came 2-9 times during the three year observation period. The patients are identified by a unique idnr, the sessions can be distinguished using the session date. How can I recode the date of the session to a session id (1-9). This would be necessary to obtain information and do some analysis on the first occurence of a specific patient or to look for trends. Thanks JP Bogers University of Antwerp [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.