Re: [R] mean
It would be easier to diagnose the problem if you included an example illustrating exactly what you did. I'll guess: a - list(3,4,5) mean(a) [1] NA Warning message: In mean.default(a) : argument is not numeric or logical: returning NA mean(as.numeric(a)) [1] 4 But that's just a guess, as I don't know the actual contents of your list! albyn On Fri, Aug 30, 2013 at 5:18 AM, agnes69 fes...@gredeg.cnrs.fr wrote: When I try to apply mean to a list, I get the answer : argument is not numeric or logical: returning NA Could you help me? (I am a very beginner) -- View this message in context: http://r.789695.n4.nabble.com/mean-tp4674999.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sample(c(0, 1)...) vs. rbinom
After a bit of playing around, I discovered that sample() does something similar in other situations: set.seed(105021) sample(1:5,1,prob=c(1,1,1,1,1)) [1] 3 set.seed(105021) sample(1:5,1) [1] 2 set.seed(105021) sample(1:5,5,prob=c(1,1,1,1,1)) [1] 3 4 2 1 5 set.seed(105021) sample(1:5,5) [1] 2 5 1 4 3 albyn On 2013-05-22 22:24, peter dalgaard wrote: On May 23, 2013, at 07:01 , Jeff Newmiller wrote: You seem to be building an elaborate structure for testing the reproducibility of the random number generator. I suspect that rbinom is calling the random number generator a different number of times when you pass prob=0.5 than otherwise. Nope. It's switching 0 and 1: set.seed(1); sample(0:1,10,replace=TRUE,prob=c(1-pp,pp)); set.seed(1); rbinom(10,1,pp) [1] 1 1 0 0 1 0 0 0 0 1 [1] 0 0 1 1 0 1 1 1 1 0 which is curious, but of course has no implication for the distributional properties. Curiouser, if you drop the prob= in sample. set.seed(1); sample(0:1,10,replace=TRUE); set.seed(1); rbinom(10,1,pp) [1] 0 0 1 1 0 1 1 1 1 0 [1] 0 0 1 1 0 1 1 1 1 0 However, it was never a design goal that two different random functions (or even two code paths within the same function) should give exactly the same values, even if they simulate the same distribution, so this is nothing more than a curiosity. Appendix A: some R code that exhibits the problem = ppp - seq(0, 1, by = 0.01) result - do.call(rbind, lapply(ppp, function(p) { set.seed(1) sampleRes - sample(c(0, 1), size = 1, replace = TRUE, prob=c(1-p, p)) set.seed(1) rbinomRes - rbinom(1, size = 1, prob = p) data.frame(prob = p, equivalent = all(sampleRes == rbinomRes)) })) result Appendix B: the output from the R code == prob equivalent 1 0.00 TRUE 2 0.01 TRUE 3 0.02 TRUE 4 0.03 TRUE 5 0.04 TRUE 6 0.05 TRUE 7 0.06 TRUE 8 0.07 TRUE 9 0.08 TRUE 10 0.09 TRUE 11 0.10 TRUE 12 0.11 TRUE 13 0.12 TRUE 14 0.13 TRUE 15 0.14 TRUE 16 0.15 TRUE 17 0.16 TRUE 18 0.17 TRUE 19 0.18 TRUE 20 0.19 TRUE 21 0.20 TRUE 22 0.21 TRUE 23 0.22 TRUE 24 0.23 TRUE 25 0.24 TRUE 26 0.25 TRUE 27 0.26 TRUE 28 0.27 TRUE 29 0.28 TRUE 30 0.29 TRUE 31 0.30 TRUE 32 0.31 TRUE 33 0.32 TRUE 34 0.33 TRUE 35 0.34 TRUE 36 0.35 TRUE 37 0.36 TRUE 38 0.37 TRUE 39 0.38 TRUE 40 0.39 TRUE 41 0.40 TRUE 42 0.41 TRUE 43 0.42 TRUE 44 0.43 TRUE 45 0.44 TRUE 46 0.45 TRUE 47 0.46 TRUE 48 0.47 TRUE 49 0.48 TRUE 50 0.49 TRUE 51 0.50 FALSE 52 0.51 TRUE 53 0.52 TRUE 54 0.53 TRUE 55 0.54 TRUE 56 0.55 TRUE 57 0.56 TRUE 58 0.57 TRUE 59 0.58 TRUE 60 0.59 TRUE 61 0.60 TRUE 62 0.61 TRUE 63 0.62 TRUE 64 0.63 TRUE 65 0.64 TRUE 66 0.65 TRUE 67 0.66 TRUE 68 0.67 TRUE 69 0.68 TRUE 70 0.69 TRUE 71 0.70 TRUE 72 0.71 TRUE 73 0.72 TRUE 74 0.73 TRUE 75 0.74 TRUE 76 0.75 TRUE 77 0.76 TRUE 78 0.77 TRUE 79 0.78 TRUE 80 0.79 TRUE 81 0.80 TRUE 82 0.81 TRUE 83 0.82 TRUE 84 0.83 TRUE 85 0.84 TRUE 86 0.85 TRUE 87 0.86 TRUE 88 0.87 TRUE 89 0.88 TRUE 90 0.89 TRUE 91 0.90 TRUE 92 0.91 TRUE 93 0.92 TRUE 94 0.93 TRUE 95 0.94 TRUE 96 0.95 TRUE 97 0.96 TRUE 98 0.97 TRUE 99 0.98 TRUE 100 0.99 TRUE 101 1.00 TRUE Appendix C: Session information === sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
Re: [R] sample(c(0, 1)...) vs. rbinom
the something similar is return a different result in two situations where one might expect the same result, ie when a probability vector with equal probabilities is supplied versus the default of equal probabilities. And, assuming that by concerns me you mean worries me, I have no clue why you think it does! It is a curiosity. albyn On Thu, May 23, 2013 at 04:38:18PM +, Nordlund, Dan (DSHS/RDA) wrote: -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Albyn Jones Sent: Thursday, May 23, 2013 8:30 AM To: r-help@r-project.org Subject: Re: [R] sample(c(0, 1)...) vs. rbinom After a bit of playing around, I discovered that sample() does something similar in other situations: set.seed(105021) sample(1:5,1,prob=c(1,1,1,1,1)) [1] 3 set.seed(105021) sample(1:5,1) [1] 2 set.seed(105021) sample(1:5,5,prob=c(1,1,1,1,1)) [1] 3 4 2 1 5 set.seed(105021) sample(1:5,5) [1] 2 5 1 4 3 albyn What is the something similar you are referring to? And I guess I still don't understand what it is that concerns you about the sample function. Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 On 2013-05-22 22:24, peter dalgaard wrote: On May 23, 2013, at 07:01 , Jeff Newmiller wrote: You seem to be building an elaborate structure for testing the reproducibility of the random number generator. I suspect that rbinom is calling the random number generator a different number of times when you pass prob=0.5 than otherwise. Nope. It's switching 0 and 1: set.seed(1); sample(0:1,10,replace=TRUE,prob=c(1-pp,pp)); set.seed(1); rbinom(10,1,pp) [1] 1 1 0 0 1 0 0 0 0 1 [1] 0 0 1 1 0 1 1 1 1 0 which is curious, but of course has no implication for the distributional properties. Curiouser, if you drop the prob= in sample. set.seed(1); sample(0:1,10,replace=TRUE); set.seed(1); rbinom(10,1,pp) [1] 0 0 1 1 0 1 1 1 1 0 [1] 0 0 1 1 0 1 1 1 1 0 However, it was never a design goal that two different random functions (or even two code paths within the same function) should give exactly the same values, even if they simulate the same distribution, so this is nothing more than a curiosity. Appendix A: some R code that exhibits the problem = ppp - seq(0, 1, by = 0.01) result - do.call(rbind, lapply(ppp, function(p) { set.seed(1) sampleRes - sample(c(0, 1), size = 1, replace = TRUE, prob=c(1-p, p)) set.seed(1) rbinomRes - rbinom(1, size = 1, prob = p) data.frame(prob = p, equivalent = all(sampleRes == rbinomRes)) })) result Appendix B: the output from the R code == prob equivalent 1 0.00 TRUE 2 0.01 TRUE 3 0.02 TRUE 4 0.03 TRUE 5 0.04 TRUE 6 0.05 TRUE 7 0.06 TRUE 8 0.07 TRUE 9 0.08 TRUE 10 0.09 TRUE 11 0.10 TRUE 12 0.11 TRUE 13 0.12 TRUE 14 0.13 TRUE 15 0.14 TRUE 16 0.15 TRUE 17 0.16 TRUE 18 0.17 TRUE 19 0.18 TRUE 20 0.19 TRUE 21 0.20 TRUE 22 0.21 TRUE 23 0.22 TRUE 24 0.23 TRUE 25 0.24 TRUE 26 0.25 TRUE 27 0.26 TRUE 28 0.27 TRUE 29 0.28 TRUE 30 0.29 TRUE 31 0.30 TRUE 32 0.31 TRUE 33 0.32 TRUE 34 0.33 TRUE 35 0.34 TRUE 36 0.35 TRUE 37 0.36 TRUE 38 0.37 TRUE 39 0.38 TRUE 40 0.39 TRUE 41 0.40 TRUE 42 0.41 TRUE 43 0.42 TRUE 44 0.43 TRUE 45 0.44 TRUE 46 0.45 TRUE 47 0.46 TRUE 48 0.47 TRUE 49 0.48 TRUE 50 0.49 TRUE 51 0.50 FALSE 52 0.51 TRUE 53 0.52 TRUE 54 0.53 TRUE 55 0.54 TRUE 56 0.55 TRUE 57 0.56 TRUE 58 0.57 TRUE 59 0.58 TRUE 60 0.59 TRUE 61 0.60 TRUE 62 0.61 TRUE 63 0.62 TRUE 64 0.63 TRUE 65 0.64 TRUE 66 0.65 TRUE 67 0.66 TRUE 68 0.67 TRUE 69 0.68 TRUE 70 0.69 TRUE 71 0.70 TRUE 72 0.71 TRUE 73 0.72 TRUE 74 0.73 TRUE 75 0.74 TRUE 76 0.75 TRUE 77 0.76 TRUE 78 0.77 TRUE 79 0.78 TRUE 80 0.79 TRUE 81 0.80 TRUE 82 0.81 TRUE 83 0.82 TRUE 84 0.83
Re: [R] the joy of spreadsheets (off-topic)
I once had a discussion with an economist who told me in almost these exact words: I don't care what the data say, the theory is so clear. albyn On 2013-04-26 9:30, William Dunlap wrote: The prior for the incompetence/malice question is usually best set pretty heavily in favour of incompetence ... The following comment on economic research is from a 2010 article in the Atlantic reviewing John Ioannidis' work. http://www.theatlantic.com/magazine/print/2010/11/lies-damned-lies-and-medical-science/308269/ Medical research is not especially plagued with wrongness. Other meta-research experts have confirmed that similar issues distort research in all fields of science, from physics to economics (where the highly regarded economists J. Bradford DeLong and Kevin Lang once showed how a remarkably consistent paucity of strong evidence in published economics studies made it unlikely that any of them were right). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of S Ellison Sent: Friday, April 26, 2013 9:08 AM To: Thomas Adams; peter dalgaard Cc: r-help Subject: Re: [R] the joy of spreadsheets (off-topic) One might wonder if the Excel error was indeed THAT or perhaps a way to get the desired results, give the other issues in their analysis? The prior for the incompetence/malice question is usually best set pretty heavily in favour of incompetence ... S *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prop.test vs hand calculated confidence interval
It might help to look at the documentation. Typing ?prop.test on the command line. That would reveal various items of interest, including a section labeled Details. A close reading of that section turns up the explanation: The confidence interval is computed by inverting the score test. There are also journal references given. albyn On Wed, Apr 03, 2013 at 06:17:56PM -0400, Sarah Goslee wrote: One of the joys of R is that it's open source: you can read the code for prop.test yourself and see what's happening. In this case, simply typing prop.test at the command line will provide it, although without comments. Sarah On Wed, Apr 3, 2013 at 6:10 PM, David Arnold dwarnol...@suddenlink.net wrote: Hi, This code: n=40 x=17 phat=x/n SE=sqrt(phat*(1-phat)/n) zstar=qnorm(0.995) E=zstar*SE phat+c(-E,E) Gives this result: [1] 0.2236668 0.6263332 The TI Graphing calculator gives the same result. Whereas this test: prop.test(x,n,conf.level=0.99,correct=FALSE) Give this result: 0.2489036 0.6224374 I'm wondering why there is a difference. D. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prop.test correct true and false gives same answer
?prop.test is helpful. Continuity correction is used only if it does not exceed the difference between sample and null proportions in absolute value. albyn On Wed, Mar 27, 2013 at 02:04:51PM -0700, David Arnold wrote: All, How come both of these are the same. Both say 1-sample proportions test without continuity correction. I would suspect one would say without and one would say with. prop.test(118,236,.5,correct=FALSE,conf.level=0.95) 1-sample proportions test without continuity correction data: 118 out of 236, null probability 0.5 X-squared = 0, df = 1, p-value = 1 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.4367215 0.5632785 sample estimates: p 0.5 prop.test(118,236,.5,correct=TRUE,conf.level=0.95) 1-sample proportions test without continuity correction data: 118 out of 236, null probability 0.5 X-squared = 0, df = 1, p-value = 1 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.4367215 0.5632785 sample estimates: p 0.5 -- View this message in context: http://r.789695.n4.nabble.com/prop-test-correct-true-and-false-gives-same-answer-tp4662659.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] order statistic of multivariate normal
R^n for n 1 is not an ordered set. albyn On Thu, Mar 21, 2013 at 02:32:44PM -0700, David Winsemius wrote: On Mar 21, 2013, at 1:44 PM, li li wrote: Hi all, Is there an R function that computes the probabilty or quantiles of order statistics of multivariate normal? Thank you. There is an mvtnorm package. I don't know what you mean by quantiles of order statistics of multivariate normal, though. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exponentiate very large numbers
I stayed out of this one thinking it was probably a homework exercise. After others have responded, I'll go ahead with my gloss on Bill's function... The specific problem is really of the form exp(a) - exp(a+eps) = exp(a)*(1-exp(eps)) So even though we can't compute exp(1347), we can compute 1-exp(eps) for eps of the desired size. The given eps was 1351+log(.1) -1347 [1] 1.697415 So the answer is exp(1347)*(1-exp(1.697415)), and as Bill noted, you can compute the log of the absolute value of that quantity... 1347 + log(abs(1-exp(1.697415))) [1] 1348.495 albyn On 2013-02-05 10:01, William Dunlap wrote: Are there any tricks I can use to get a real result for exp( ln(a) ) - exp( ln(0.1) + ln(b) ), either in logarithm or exponential form? log.a - 1347 log.b - 1351 f0 - function(log.a, log.b) exp(log.a) - exp(log(0.1) + log.b) will not work because f0(1347,1351) is too big to represent as a double precision number (abs(result)10^308)). If you are satisfied with computing log(result) you can do it with some helper function: addOnLogScale - function (e1, e2) { # log(exp(e1) + exp(e2)) small - pmin(e1, e2) big - pmax(e1, e2) log1p(exp(small - big)) + big } subtractOnLogScale - function (e1, e2) { # log(abs(exp(e1) - exp(e2))) small - pmin(e1, e2) big - pmax(e1, e2) structure(log1p(-exp(small - big)) + big, isPositive = e1 e2) } as f1 - function(log.a, log.b) { # log(abs(exp(log.a) - exp( log(0.1) + log.b))), avoiding overflow subtractOnLogScale( log.a, log(0.1) + log.b ) } E.g., f0(7,3) [1] 1094.625 exp(f1(7,3)) [1] 1094.625 attr(,isPositive) [1] TRUE With your log.a and log.b we get f1(1347, 1351) [1] 1348.495 attr(,isPositive) [1] FALSE You can use the Rmpfr package to compute the results to any desired precision. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of francesca casalino Sent: Monday, February 04, 2013 8:00 AM To: r-help@r-project.org Subject: Re: [R] Exponentiate very large numbers I am sorry I have confused you, the logs are all base e: ln(a) = 1347 ln(b) = 1351 And I am trying to solve this expression: exp( ln(a) ) - exp( ln(0.1) + ln(b) ) Thank you. 2013/2/4 francesca casalino francy.casal...@gmail.com: Dear R experts, I have the logarithms of 2 values: log(a) = 1347 log(b) = 1351 And I am trying to solve this expression: exp( ln(a) ) - exp( ln(0.1) + ln(b) ) But of course every time I try to exponentiate the log(a) or log(b) values I get Inf. Are there any tricks I can use to get a real result for exp( ln(a) ) - exp( ln(0.1) + ln(b) ), either in logarithm or exponential form? Thank you very much for the help __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing linear regression coefficients to a slope of 1
Is this homework? PLEASE do read the posting guide http://www.R-project.org/posting-guide.html albyn On Sat, Nov 24, 2012 at 07:27:25PM -0500, Catriona Hendry wrote: Hi! I have a question that is probably very basic, but I cannot figure out how to do it. I simply need to compare the significance of a regression slope against a slope of 1, instead of the default of zero. I know this topic has been posted before, and I have tried to use the advice given to others to fix my problem. I tried the offset command based on one of these advice threads as follows: Regression - lm(y~x+offset(1*x)) but this resulted in a regression line that was plotted perpendicular to the data when added with the abline function. I would be extremely grateful for your help!! Thanks!! Cat [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing linear regression coefficients to a slope of 1
Dear Cat My apologies for presuming... Here's a primitive solution: compute a t-statistic or CI. t = (beta-hat - 1)/SE(beta-hat), compare to qt(.975, res.df) Or Better, compute the 95% confidence interval beta-hat + c(-1,1)*qt(.975, res.df)*SE(beta-hat) albyn On 2012-11-24 18:05, Catriona Hendry wrote: Hi, @ Albyn, David.. No, its not homework. Its basic groundwork for testing allometric relationships for a graduate project I am working on. I read the guide before posting, I spent half the day trying to understand how I am going wrong based on the advice given to others. @Bert, David... I apologise for the lack of code, I wasn't sure how to explain my problem and I guess I went about it the wrong way. I do think this is what I need to be doing, I am testing allometric relationships of body size against a predicted isometric (1:1) relationship. So I would like to know if the relationship between my variables deviates from that. Hopefully the information below will be what is needed. Here is the part of the code relevant to the regression and plot: plot(Contrast_log_MTL_ALL, Contrast_log_FTL_ALL) Regression_PhyloContrasts_ALL - lm(Contrast_log_FTL_ALL ~ Contrast_log_MTL_ALL, offset=1*Contrast_log_MTL_ALL) abline(Regression_PhyloContrasts_ALL) the plot that resulted is attached as an image file. Below are the vectors of my variables. The are converted from other values imported and indexed from a csv file, so unfortunately I don't have matrix set up for them. Contrast_log_FTL_ALL Contrast_Log_MTL_ALL 83 0.226593 0.284521 84 0.165517 0.084462 85 -0.1902 -0.0055 86 0.585176 0.639916 87 -0.01078 0.118011 88 0.161142 0.073762 89 -0.08566 -0.04788 90 -0.13818 -0.0524 91 -0.02504 -0.21099 92 -0.05027 -0.07594 93 -0.11399 -0.07251 94 -0.07299 -0.08247 95 -0.09507 -0.04817 96 0.207591 0.151695 97 -0.14224 -0.05097 98 0.06375 -0.0229 99 0.04607 0.06246 100 0.257389 0.190531 101 -0.0612 -0.10902 102 -0.1981 -0.24698 103 -0.12328 -0.36942 104 0.269877 0.341989 105 0.125377 0.227183 106 0.087038 -0.05962 107 0.114929 0.096112 108 0.252807 0.305583 109 -0.0895 -0.08586 110 -0.38483 -0.20671 111 -0.72506 -0.63785 112 -0.37212 -0.21458 113 0.010348 0.117577 114 -0.09625 -0.0059 115 -0.26291 -0.25986 116 0.056922 0.064041 117 0.051472 -0.09747 118 -0.05691 0.075005 119 0.117095 -0.15497 120 -0.01329 -0.12473 121 0.098725 0.020522 122 -0.0019 -0.01998 123 -0.12446 -0.02312 124 0.019234 0.031391 125 0.385366 0.391766 126 0.495518 0.468946 127 -0.09251 -0.08045 128 0.147965 0.139117 129 -0.03143 -0.02319 130 -0.19801 -0.14924 131 0.014104 -0.01917 132 0.031872 -0.01381 133 -0.01412 -0.04381 134 -0.12864 -0.08527 135 -0.07179 -0.03525 136 0.31003 0.29553 137 -0.09347 -0.11903 138 -0.10706 -0.16654 139 0.078655 0.065509 140 0.08279 -0.00766 141 0.181885 0.001414 142 0.345818 0.496323 143 0.235044 0.095073 144 -0.03022 0.039918 145 0.042577 0.136586 146 0.064208 0.001379 147 -0.02237 -0.03009 148 -3.55E-05 0.040197 149 0.011168 0.087116 150 0.019964 0.071822 151 -0.04602 -0.06616 152 0.083087 0.038592 153 0.032078 0.107237 154 -0.21108 -0.22347 155 0.122959 0.297917 156 -0.05898 0.012547 157 -0.07584 -0.21588 158 -0.00929 -0.06864 159 -0.01211 -0.04559 160 0.090948 0.136582 161 0.016974 0.018259 162 -0.04083 0.016245 163 -0.20328 -0.31678 On Sat, Nov 24, 2012 at 8:22 PM, Bert Gunter gunter.ber...@gene.com wrote: 1. The model is correct : lm( y~ x + offset(x)) ( AFAICS) 2. Read the posting guide, please: Code? I do not know what you mean by: this resulted in a regression line that was plotted perpendicular to the data when added with the abline function. Of course, maybe someone else will groc this. 3. I wonder if you really want to do what you are doing, anyway. For example, in comparing two assays to see whether they give similar results, you would **not** do what you are doing. If you care to follow up on this, I suggest you post complete context to a statistical mailing list, not here, like stats.stackexchange .com. Also, feel free to ignore me, of course. I'm just guessing. Cheers, Bert Cheers, Bert On Sat, Nov 24, 2012 at 4:27 PM, Catriona Hendry hen...@gwmail.gwu.eduwrote: Hi! I have a question that is probably very basic, but I cannot figure out how to do it. I simply need to compare the significance of a regression slope against a slope of 1, instead of the default of zero. I know this topic has been posted before, and I have tried to use the advice given to others to fix my problem. I tried the offset command based on one of these advice threads as follows: Regression - lm(y~x+offset(1*x)) but this resulted in a regression line that was plotted perpendicular to the data when added with the abline function. I would be extremely grateful for your help!! Thanks!! Cat [[alternative HTML version deleted]]
Re: [R] How to do an infinite sum in R
Perhaps it would help to think before you compute. albyn On Fri, Nov 16, 2012 at 09:30:32AM -0800, Simon Bolivar wrote: I'm having trouble to do an infinite sum in R I want to do the infinite sum of 1/(1+n) how would I do this in R? Thank You -- View this message in context: http://r.789695.n4.nabble.com/How-to-do-an-infinite-sum-in-R-tp4649770.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to generate a set of random numbers that sum to 1 with uniform distribution of elements
What uniform distribution do you want for the columns? The average value of X_k given \sum X_i = 1 must be 1/n. If you start with X_i ~ U(0,1), the result must be skewed. If you generate X_i uniformly distributed on (0, 2/n), the conditional distribution given the sum is 1 will be less skewed. U - matrix(runif(1000*100)/50,nrow=1000) s - apply(U,1,sum) V - U/s qqplot(U[,1],V[,1]) albyn On Wed, Nov 07, 2012 at 05:02:10AM -0800, Bärbel wrote: Hi, I am looking for a way to generate a matrix of random numbers in a way that each row of the matrix would sum to 1 and that the numbers in the columns of the matrix would have a uniform distribution. So far I have found several ways to create random numbers that would sum to 1, but then the distribution of the individual elements is more or less skewed - there are much more small numbers than larger ones. Any help and suggestions? - Bärbel -- View this message in context: http://r.789695.n4.nabble.com/how-to-generate-a-set-of-random-numbers-that-sum-to-1-with-uniform-distribution-of-elements-tp4648695.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] peer-reviewed (or not) publications on R
More links on reproducible research: Opinion: Open and Free: Software and Scientific Reproducibility Seismological Research Letters Volume 83 · Number 5 · September/October 2012 Reproducible Research in Computational Science Roger D. Peng Science 2 December 2011: 1226-1227. albyn On 2012-10-30 9:05, R. Michael Weylandt wrote: On Tue, Oct 30, 2012 at 2:22 PM, Paul Artes paul_h_ar...@yahoo.co.uk wrote: Dear Friends, I'm contributing to a paper on a new R package for a clinical (medicine, ophthalmology) audience, and part of the mission is to encourage people who might be occasional users of Excel or SPSS, to become more familiar with R. I'd really appreciate any pointers to more recent papers that describe R, it's growth (statistics on user base, number of packages, volume of help list traffic) and application in many diverse fields. Published peer-reviewed papers of course would be best, but I'd appreciate any pointers to other resources and compilations that might float around somewhere. Is there anything bibliometric (number of citations)? I will happily send something back to the list... Best wishes Paul Two possible starting points would be the Journal of Statistical Software or the R Journal. There's also this interesting paper -- http://arxiv.org/abs/1210.0530 -- which doesn't touch R to the best of my memory, but explains why FOSS + Science is a good idea and sketches (one group's ideas of) best practices. Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mac Text editors
Have you looked at aquamacs? (emacs for the mac). its at aquamacs.org. albyn On 2012-09-26 17:48, Steven Wolf wrote: Hi everyone, I've recently moved from using a windows machine to a Mac (some might call it an upgrade, others not…I'll let you be the judge). Once I started using Notepad ++ on my windows machine, I really began to like it. Unfortunately, I'm not sure what the free text editor options are for the Mac (Notepad ++ is windows only). I've dabbled with Linux before and used Emacs/ESS there. However, I seem to remember fighting pretty hard to make that work and the OSX file structure isn't that intuitive to me yet. (For example, where is my .emacs file?) What text editors are best for the Mac, keeping in mind that I'm probably going to use them via the command line interface (e.g. X11 or Terminal). Thanks! -Steve ___ Steven Wolf Research Associate CREATE for STEM Institute 115 H Erickson Hall Michigan State University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Course
Dear Pedro in your R session, enter the commands license() RShowDoc(COPYING) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Those imply no restriction on charging a fee for presenting courses in R. albyn On Tue, Aug 28, 2012 at 03:42:54PM +0200, Pedro Henrique Lamarão Souza wrote: Hello, I am a student of Materials Engineering and I want to minister an introductory course of R at the university I study here in Brazil. I know R is a free software, but I just want to know if I do need a special authorization for doing it. The course will be totaly free and I also will not receive any money for doing it. The idea is just to show the program. -- Atenciosamente, Pedro Lamar??o ITEC/UFPA/PPGEM/GPEMAT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] anova of lme objects (model1, model2) gives different results depending on order of models
No, both yield the same result: reject the null hypothesis, which always corresponds to the restricted (smaller) model. albyn On Thu, May 31, 2012 at 12:47:30PM +0100, Chris Beeley wrote: Hello- I understand that it's convention, when comparing two models using the anova function anova(model1, model2), to put the more complicated (for want of a better word) model as the second model. However, I'm using lme in the nlme package and I've found that the order of the models actually gives opposite results. I'm not sure if this is supposed to be the case or if I have missed something important, and I can't find anything in the Pinheiro and Bates book or in ?anova, or in Google for that matter which unfortunately only returns results about ANOVA which isn't much help. I'm using the latest version of R and nlme, just checked both. Here is the code and output: PHQmodel1=lme(PHQ~Age+Gender+Date*Treatment, data=compfinal, random=~1|Case, na.action=na.omit) PHQmodel2=lme(PHQ~Age+Gender+Date*Treatment, data=compfinal, random=~1|Case, na.action=na.omit, + correlation=corAR1(form=~Date|Case)) anova(PHQmodel1, PHQmodel2) # accept model 2 Model df AIC BIClogLik Test L.Ratio p-value PHQmodel1 1 8 48784.57 48840.43 -24384.28 PHQmodel2 2 9 48284.68 48347.51 -24133.34 1 vs 2 501.8926 .0001 PHQmodel1=lme(PHQ~Age+Gender+Date*Treatment, data=compfinal, random=~1|Case, na.action=na.omit, + correlation=corAR1(form=~Date|Case)) PHQmodel2=lme(PHQ~Age+Gender+Date*Treatment, data=compfinal, random=~1|Case, na.action=na.omit) anova(PHQmodel1, PHQmodel2) # accept model 2 Model df AIC BIClogLik Test L.Ratio p-value PHQmodel1 1 9 48284.68 48347.51 -24133.34 PHQmodel2 2 8 48784.57 48840.43 -24384.28 1 vs 2 501.8926 .0001 In both cases I am led to accept model 2 even though they are opposite models. Is it really just that you have to put them in the right order? It just seems like if there were say four models you wouldn't necessarily be able to determine the correct order. Many thanks, Chris Beeley, Institute of Mental Health, UK ...session info follows sessionInfo() R version 2.15.0 (2012-03-30) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] gridExtra_0.9 RColorBrewer_1.0-5 car_2.0-12 nnet_7.3-1 MASS_7.3-17 [6] xtable_1.7-0 psych_1.2.4languageR_1.4 nlme_3.1-104 ggplot2_0.9.1 loaded via a namespace (and not attached): [1] colorspace_1.1-1 dichromat_1.2-4 digest_0.5.2 labeling_0.1 lattice_0.20-6 memoise_0.1 [7] munsell_0.3 plyr_1.7.1 proto_0.3-9.2 reshape2_1.2.1 scales_0.2.1 stringr_0.6 [13] tools_2.15.0 packageDescription(nlme) Package: nlme Version: 3.1-104 Date: 2012-05-21 Priority: recommended Title: Linear and Nonlinear Mixed Effects Models Authors@R: c(person(Jose, Pinheiro, comment = S version), person(Douglas, Bates, comment = up to 2007), person(Saikat, DebRoy, comment = up to 2002), person(Deepayan, Sarkar, comment = up to 2005), person(R-core, email = r-c...@r-project.org, role = c(aut, cre))) Author: Jose Pinheiro (S version), Douglas Bates (up to 2007), Saikat DebRoy (up to 2002), Deepayan Sarkar (up to 2005), the R Core team. Maintainer: R-core r-c...@r-project.org Description: Fit and compare Gaussian linear and nonlinear mixed-effects models. Depends: graphics, stats, R (= 2.13) Imports: lattice Suggests: Hmisc, MASS LazyLoad: yes LazyData: yes License: GPL (= 2) BugReports: http://bugs.r-project.org Packaged: 2012-05-23 07:28:59 UTC; ripley Repository: CRAN Date/Publication: 2012-05-23 07:37:45 Built: R 2.15.0; x86_64-pc-mingw32; 2012-05-29 12:36:01 UTC; windows __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Study design question; MLB; pay and performance.
There is an ASA section on statistics in sports, you might start looking there... http://www.amstat.org/sections/sis/ albyn On Thu, Apr 19, 2012 at 10:05:39AM -0500, N. S. Miceli, Ph.D. wrote: Dear List Members, I am in the process of designing a study examining pay and performance in Major League Baseball across several seasons, and before I get too deeply into it, I'd like to ask whether the group members think that performance across seasons is independent, or if it needs to be treated like a time-series variable so that lack of independence can be controlled. Any ideas or considerations that need to be taken into account would be appreciated. Regards, Nick Miceli __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating a loop with an indefinite end term
Here are a couple of constructions that work. albyn === num - rep(0,10) for (i in 2:10) { num[i] - num[i-1] + 5 if(num[i] 20) break } num [1] 0 5 10 15 20 25 0 0 0 0 or num - rep(0,10) done - FALSE i - 2 while(!done){ num[i] - num[i-1] + 5 if(num[i] 20) done - TRUE i - i + 1 } num [1] 0 5 10 15 20 25 0 0 0 0 On Tue, Apr 10, 2012 at 10:48:34AM -0400, Steve Lavrenz wrote: Everyone, I'm very new to R, especially when it comes to loops and functions, so please bear with me if this is an elementary question. I cannot seem to figure out how to construct a loop which runs a function until a certain value is computed. For example, say I have the following: num = numeric (10) num [1] = 0 for (i in 2:10) { num [i] = num [i-1] + 5 } This adds 5 to the preceding spot of a vector of length 10 to get the value in the current spot. However, say I don't just want to run this for 10 spots; rather I want to run it until a certain value (say, 100) is computed. How I construct my loop to do this? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating a loop with an indefinite end term
What's wrong with num - rep(0,10) done - FALSE i - 2 while(!done){ num[i] - num[i-1] + 5 if(num[i] 20) done - TRUE i - i + 1 } num - num[1:(i-1)] You can delete the unused tail when you finish. Iteratively reassigning to a vector of undefined length is possible too, but that is much slower: x - numeric() system.time(for(i in 1:10){ x[i] - i}) user system elapsed 29.641 27.509 57.163 y - numeric(10) system.time(for(i in 1:10){ y[i] - i}) user system elapsed 0.537 0.000 0.532 albyn On Tue, Apr 10, 2012 at 11:53:08AM -0400, Steve Lavrenz wrote: Albyn, Thanks for your help. This however, still assumes that I have to define an array of length 10. Is there a way that I can construct this so that my array is exactly as long as the number of spots I need to reach my threshold value? Thanks, -Steve -Original Message- From: Albyn Jones [mailto:jo...@reed.edu] Sent: Tuesday, April 10, 2012 11:46 AM To: Steve Lavrenz Cc: r-help@r-project.org Subject: Re: [R] Creating a loop with an indefinite end term Here are a couple of constructions that work. albyn === num - rep(0,10) for (i in 2:10) { num[i] - num[i-1] + 5 if(num[i] 20) break } num [1] 0 5 10 15 20 25 0 0 0 0 or num - rep(0,10) done - FALSE i - 2 while(!done){ num[i] - num[i-1] + 5 if(num[i] 20) done - TRUE i - i + 1 } num [1] 0 5 10 15 20 25 0 0 0 0 On Tue, Apr 10, 2012 at 10:48:34AM -0400, Steve Lavrenz wrote: Everyone, I'm very new to R, especially when it comes to loops and functions, so please bear with me if this is an elementary question. I cannot seem to figure out how to construct a loop which runs a function until a certain value is computed. For example, say I have the following: num = numeric (10) num [1] = 0 for (i in 2:10) { num [i] = num [i-1] + 5 } This adds 5 to the preceding spot of a vector of length 10 to get the value in the current spot. However, say I don't just want to run this for 10 spots; rather I want to run it until a certain value (say, 100) is computed. How I construct my loop to do this? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Prediciting sports team scores
Robin Lock at St Lawrence has done this for hockey, see http://it.stlawu.edu/~chodr/faq.html As I recall, he has a poisson regression model with parameters for offense and defense, and perhaps home 'field' advantage. I confess I am skeptical that this is the right approach for football - teams adjust their strategy and tactics as a function of the opponent and the current match score. Teams are trying to maximize the probability of getting a result, not the probability of scoring goals. The poisson model corresponds to a constant rate for scoring. albyn Quoting kerry1912 kerry1...@hotmail.com: I am working on predicitng the scores for a days worth of matches of team sports. I have already collected data for the teams for the season we are concentrating on. I have been fitting poisson models for football games and have worked out what model is best and which predictor variables are most important. We would now like to predict the probability distribution for the scores for each team. eg. What is the probability of Manchester United vs Chelsea ending 1-1? -- View this message in context: http://r.789695.n4.nabble.com/Prediciting-sports-team-scores-tp4303708p4303708.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculate quantiles of a custom function
FWIW, the integral of a mixture density is the same mixture of the CDFs, so you can use the pbeta functions: pcustom - function(x) (pbeta(x,2,6) + pbeta(x,6,2))/2 albyn Quoting Gerhard felds...@gmx.net: Am Dienstag, 3. Januar 2012, 19:51:36 schrieb Prof. Dr. Matthias Kohl: D - AbscontDistribution(d = function(x) dbeta(x, 2, 6) + dbeta(x,6,2), low = 0, up = 1, withStand = TRUE) Dear all, thank you all again for your help. So, summing up, (in case this might be useful to other beginners - like me) this is how it can be done: library(distr) dcustom - function(x) { (dbeta(x,2,6) + dbeta(x,6,2))/2# I need to divide by 2 to get 1 as # result of integration; } pcustom - function(x) { integrate(dmyspeaker,0,x)$value } D - AbscontDistribution(d = dcustom, low = 0, up = 1, withStand = TRUE) qcustom - function(x){ q(D)(x) } Best, Gerhard __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculate quantiles of a custom function
What do quantiles mean here? If you have a mixture density, say myf - function(x,p0) p0*dbeta(x,2,6) + (1-p0)*dbeta(x,6,2) then I know what quantiles mean. To find the Pth quantile use uniroot to solve for the x such that myf(x,p0) - P =0. albyn Quoting VictorDelgado victor.m...@fjp.mg.gov.br: Gerhard wrote Suppose I create a custom function, consisting of two beta-distributions: myfunction - function(x) { dbeta(x,2,6) + dbeta(x,6,2) } How can I calculate the quantiles of myfunction? Thank you in advance, Gerhard Gehard, if do you want to know the quantiles of the new distribution created by myfunction. Maybe you can also do: x - seq(0,1,.01) # insert your 'x' q - myfunction(x) # And: quantile(x) 0% 25% 50% 75% 100% 0.00 1.476177 2.045389 2.581226 2.817425 # This gives the sample quantiles. You can also look foward to simulations (like Bert Gunter had suggested) to know better the properties of distributions quantiles obtained after 'myfunction'. - Victor Delgado cedeplar.ufmg.br P.H.D. student www.fjp.mg.gov.br reseacher -- View this message in context: http://r.789695.n4.nabble.com/calculate-quantiles-of-a-custom-function-tp4256887p4257551.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculate quantiles of a custom function
right. replace dbetas with pbetas. albyn Quoting Duncan Murdoch murdoch.dun...@gmail.com: On 03/01/2012 1:33 PM, Albyn Jones wrote: What do quantiles mean here? If you have a mixture density, say myf- function(x,p0) p0*dbeta(x,2,6) + (1-p0)*dbeta(x,6,2) then I know what quantiles mean. To find the Pth quantile use uniroot to solve for the x such that myf(x,p0) - P =0. You would want to integrate first... Duncan Murdoch albyn Quoting VictorDelgadovictor.m...@fjp.mg.gov.br: Gerhard wrote Suppose I create a custom function, consisting of two beta-distributions: myfunction- function(x) { dbeta(x,2,6) + dbeta(x,6,2) } How can I calculate the quantiles of myfunction? Thank you in advance, Gerhard Gehard, if do you want to know the quantiles of the new distribution created by myfunction. Maybe you can also do: x- seq(0,1,.01) # insert your 'x' q- myfunction(x) # And: quantile(x) 0% 25% 50% 75% 100% 0.00 1.476177 2.045389 2.581226 2.817425 # This gives the sample quantiles. You can also look foward to simulations (like Bert Gunter had suggested) to know better the properties of distributions quantiles obtained after 'myfunction'. - Victor Delgado cedeplar.ufmg.br P.H.D. student www.fjp.mg.gov.br reseacher -- View this message in context: http://r.789695.n4.nabble.com/calculate-quantiles-of-a-custom-function-tp4256887p4257551.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding all triangles in a graph
Taral The general problem of finding subgraphs with a given structure (motifs) is hard (ie computationally expensive). There is some literature... have you looked at the package igraph, function graph.motifs()? albyn Quoting Taral tar...@gmail.com: I have the adjacency matrix of a graph. I'm trying to find all triangles (embeddings of C_3). This doesn't work: index = function(l) seq(l)[l] pairs = do.call(rbind, lapply(seq(nrow(adj)), function(x) cbind(x, index(adj[x,] triangles = do.call(rbind, apply(pairs, 1, function(x) cbind(x, index(adj[x[1],] adj[x[2],] I'm absolutely certain I've gone down the wrong path here. O great R gurus, your guidance would be most welcome. -- Taral tar...@gmail.com Please let me know if there's any further trouble I can give you. -- Unknown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Statistical tests and measures for cone-like distributions?
Have you tried plotting log(y) vs log(x)? albyn On Wed, Dec 21, 2011 at 02:18:56PM +, Matthew Gwynne wrote: Hi, Are there any special statistical tests, or functions in R I could use to measure cone-like distributions? I have several data-sets, which I've been plotting parts of as 2D plots, where I get a cone-like distribution of the data points. That is, the data appears to be bounded by two non-parallel lines, starting at the origin, giving rise to a cone-like appearance. I guess I could work out it's bounding lines fairly easily, but this seems perhaps a fairly naive thing to do, and I was wondering if there are any standard tests to do? I guess one might consider the angle of the cone, how much of the cone is filled out, density of points within the cone and so on. I don't even know if thinking of a cone makes any sense, what it would mean or whether I should simply be thinking of a linear regression line with some non-constant variance? An example plot is at http://cs.swan.ac.uk/~csmg/aes/plots/canon_1_3_4_r1_v_cfs.pdf if anyone is interested. Thanks in advance! Matthew Gwynne http://cs.swan.ac.uk/~csmg/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] map at fips level using multiple variables
since each county gets a single color, you would need to combine two color schemes. I am skeptical that this will work well, but you could try using rgb(). For example - code one variable using the red component, the other using the blue component. albyn On Wed, Dec 07, 2011 at 11:14:27PM -0500, bby2...@columbia.edu wrote: Hi David, Sorry it sounds vague. Here is my current code, which gives the distribution of family size at US county level. You will see on a US map family size distribution represented by different colors. Now if i have another variable income, which has 3 categories(50k, 50k-80k,80k). How do I show 5x3 categories on the map? I guess I could always create a third variable to do this. Just wondering maybe there is a function to do this readily? Thank you! Bonnie Yuan y=data1$size x11() hist(y,nclass=15) # histogram of y fivenum(y) # specify the cut-off point of y y.colorBuckets=as.numeric(cut(y, c(1,2, 3,6))) # legend showing the cut-off points. legend.txt=c(0-1,1-2,2-3,3-6,6) colorsmatched=y.colorBuckets[match(county.fips$fips,fips[,1])] x11() map(county, col = colors[colorsmatched], fill = TRUE, resolution = 0) map(state, col = white, fill = FALSE, add = TRUE, lty = 1, lwd = 0.2) title(Family Size) legend(bottom, legend.txt, horiz = TRUE, fill = colors, cex=0.7) Quoting David Winsemius dwinsem...@comcast.net: On Dec 7, 2011, at 6:12 PM, bby2...@columbia.edu wrote: Hi, I just started playing with county FIPS feature in maps package which allows geospatial visualization of variables on US county level. Pretty cool. Got code? I did some search but couldn't find answer to this question--how can I map more than 2 variables on US map? 2 variables is a bit on the vague side for programming purposes. For example, you can map by the breakdown of income or family size. How do you further breakdown based on the values of both variables and show them on the county FIPS level? breakdown suggests a factor construct. If so, then : ?interaction But the show part of the question remains very vague. Can't you be a bit more specific? What DO you want? -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Discrepancy with p.value from t.test
The print method is the issue: t.out - t.test(b1,b2,var.equal=T) t.out$p.value [1] 4.108001e-38 t.out$statistic t -15.93656 albyn On Tue, Nov 01, 2011 at 02:40:15PM -0500, Jonathan Thayn wrote: Sometimes the p.value returned by t.test() is the same that I calculate using pt() and sometimes it's not. I don't understand the difference. I'm sure there is a simple explanation but I haven't been able to find it, even after looking at the code for t.test.default. I apologize if this is a basic and obvious question. For example: data(sleep) t.test(extra~group,data=sleep,var.equal=T) # the p.value returned is 0.07939 2*pt(-1.8608,18) # using the t.statistic and the df returned above [1] 0.0791887 These p.values are the same. However, they are different when I use a different dataset: data(beavers) b1 - beaver1$temp b2 - beaver2$temp t.test(b1,b2,var.equal=T) # the p.value returned is 2.2e-16 2*pt(-15.9366,212) # using the t.statistic and the df returned above [1] 4.10686e-38 Jonathan B. Thayn, Ph.D. Illinois State University Department of Geography and Geology 200A Felmley Hall Normal, Illinois 61790 (309) 438-8112 jth...@ilstu.edu my.ilstu.edu/~jthayn [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cumVar and cumSkew
Wow, this takes me back a ways :-) As I recall, BMDP used updating algorithms to compute sample means, variances, and covariances in one pass through the dataset. You need the updating algorithm for means: \bar{X}_{n} = \frac{n-1}{n}\bar{X}_{n-1}+\frac{X_n}{n} You can work out the algorithm for variances using the same idea, or consult: Algorithms for Computing the Sample Variance: Analysis and Recommendations Algorithms for Computing the Sample Variance: Tony F. Chan, Gene H. Golub, Randall J. LeVeque The American Statistician, Vol. 37, No. 3 (Aug., 1983), pp. 242-247 I think the updating and other algorithms are described therein. albyn On Thu, Sep 15, 2011 at 02:57:40PM -0700, D_Tomas wrote: Hi there, I need to do the same thing as cumsum but with the variance and skewness. I have tried to do a loop for like this: var.value - vector(mode = numeric, length = length(daily)) for (i in (1:length(daily))) { var.value[i] - var(daily[1:i]) } But because my dataset is so huge, I run out of memory. Any ideas?!?! Much appreciate it! -- View this message in context: http://r.789695.n4.nabble.com/cumVar-and-cumSkew-tp3816899p3816899.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Density function: Area under density plot is not equal to 1. Why?
Look at area - sum(a$y)*(a$x[1]-a$y[2]) The problem appears to be a$x[1]-a$y[2]; that is not the length of the base of an approximating rectangle, whatever it is :-) albyn On Thu, Sep 08, 2011 at 11:36:23AM -0400, Gonçalo Ferraz wrote: Hi, I have a vector 'data' of 58 probability values (bounded between 0 and 1) and want to draw a probability density function of these values. For this, I used the commands: data - runif(58) a - density(data, from=0, to=1) plot(a, type=l,lwd=3) But then, when I try to approximate the area under the plotted curve with the command: area - sum(a$y)*(a$x[1]-a$y[2]) I get an area that is clearly smaller than 1. Strangely, if I don't bound the density function with 'to=0,from=1' (which is against my purpose because it extends the pdf beyond the limits of a probability value), I get an area of 1.000. This suggests that I am computing the area well, but using the density function improperly. Why is this happening? Does anyone know how to constrain the density function while still getting a true pdf (summing to 1 under the curve) at the end? Should I use a different function? I read through the density function notes but could not figure out a solution. Thank you! Gonçalo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating p-value for 1-tailed test in a linear model
For H_0: beta = 0, then the correct p-value is pt(tvalue,df) regardless of the sign of tvalue. Negative tvalues of large magnitude will yield small p-values. albyn On Mon, Aug 22, 2011 at 05:22:06PM +, Ben Bolker wrote: Campomizzi, Andrew J acampomizzi at neo.tamu.edu writes: On 20/08/11 10:20, Andrew Campomizzi wrote: Hello, I'm having trouble figuring out how to calculate a p-value for a 1-tailed test of beta_1 in a linear model fit using command lm. My model has only 1 continuous, predictor variable. I want to test the null hypothesis beta_1 is= 0. I can calculate the p-value for a 2-tailed test using the code 2*pt(-abs(t-value), df=degrees.freedom), where t-value and degrees.freedom are values provided in the summary of the lm. The resulting p-value is the same as provided by the summary of the lm for beta_1. I'm unsure how to change my calculation of the p-value for a 1-tailed test. Isn't it just pt(tvalue,df=degrees.freedom,lower.tail=FALSE) if the value is positive (and expected to be positive) or pt(tvalue,df=degrees.freedom) if the value is negative (and expected to be negative)? In fact, if the value is in the expected direction, I think you can just leave out the multiplication by 2 and get the right answer ... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generate two sets of random numbers that are correlated
Here is a simple method z1 - rpois(n,mu1) z2 - rpois(n,mu1) z3 - rpois(n,mu2) Y - z1 + z3 X - z2 + z3 Cov(X,Y) = mu2, so Cor(X,Y) = mu2/(mu1+mu2) albyn On Thu, Aug 11, 2011 at 09:01:50AM -0700, Kathie wrote: almost forgot. In fact, I want to generate correlated Poisson random vectors. Thank you anyway -- View this message in context: http://r.789695.n4.nabble.com/generate-two-sets-of-random-numbers-that-are-correlated-tp3736161p3736287.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] useR! 2011 T-shirt competition
Me too, or three! albyn On Fri, May 27, 2011 at 12:47:45PM -0400, Carl Witthoft wrote: More important question: can us non-attendees buy one? I made myself a UseR t-shirt a couple yrs ago and wouldn't mind adding to my collection. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with 2-D plot of k-mean clustering analysis
One idea: Pick the three largest clusters, their centers determine a plane. project your data into that plane. albyn On Wed, May 18, 2011 at 06:55:39PM +0200, Claudia Beleites wrote: Hi Meng, I would like to use R to perform k-means clustering on my data which included 33 samples measured with ~1000 variables. I have already used kmeans package for this analysis, and showed that there are 4 clusters in my data. However, it's really difficult to plot this cluster in 2-D format since the huge number of variables. One possible way is to project the multidimensional space into 2-D platform, but I could not find any good way to do that. Any suggestions or comments will be really helpful! For suggestions it would be extremely helpful to tell us what kind of variables your 1000 variables are. Parallel coordinate plots plot values over (many) variables. Whether this is useful, depends very much on your variables: E.g. I have spectral channels, they have an intrinsic order and the values have physically the same meaning (and almost the same range), so the parallel coordinate plot comes naturally (it produces in fact the spectra). Claudia Thanks, Meng [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power Analysis
First, note that you are doing two separate power calculations, one with n=2 and sd = 1.19, the other with n=3 and sd = 4.35. I will assume this was on purpose. Now... power.t.test(n = 2, delta = 13.5, sd = 1.19, sig.level = 0.05) Two-sample t test power calculation n = 2 delta = 13.5 sd = 1.19 sig.level = 0.05 power = 0.9982097 alternative = two.sided Now, with n=2, the power is already .99. With n=1, there are zero df. So, what n corresponds to a power of .8? power.t.test(n = 1.6305, delta = 13.5, sd = 1.19, sig.level = 0.05) Two-sample t test power calculation n = 1.6305 delta = 13.5 sd = 1.19 sig.level = 0.05 power = 0.8003734 alternative = two.sided It looks like 1.63 subjects will do the job :-) Finally, look at the power.t.test function, there is a line that explains your error message: else if (is.null(n)) n - uniroot(function(n) eval(p.body) - power, c(2, 1e+07))$root power.t.test() is making the sensible assumption that we only care about sample sizes of at least n = 2 albyn On Mon, Apr 18, 2011 at 02:31:19PM -0700, Schatzi wrote: I am trying to do a power analysis to get the number of replicas per treatment. If I try to get the power it works just fine: setn=c(2,3) sdx=c(1.19,4.35) power.t.test(n = setn, delta = 13.5, sd = sdx, sig.level = 0.05,power = NULL) If I go the other way to obtain the n I have problems. sdx=c(1.19,4.35) pow=c(.8,.8) power.t.test(n = NULL, delta = 13.5, sd = sdx, sig.level = 0.05, power = 0.8) Is there any way to do this? Thank you. -- View this message in context: http://r.789695.n4.nabble.com/Power-Analysis-tp3458786p3458786.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power Analysis
Yes, Richard Savage used to call this inter ocular data; the answer should leap up and strike you right between the eyes... albyn On Mon, Apr 18, 2011 at 05:23:05PM -0500, David Cross wrote: It seems to me, with deltas this large (relative to the SD), that a significance test is a moot point! David Cross d.cr...@tcu.edu www.davidcross.us On Apr 18, 2011, at 5:14 PM, Albyn Jones wrote: First, note that you are doing two separate power calculations, one with n=2 and sd = 1.19, the other with n=3 and sd = 4.35. I will assume this was on purpose. Now... power.t.test(n = 2, delta = 13.5, sd = 1.19, sig.level = 0.05) Two-sample t test power calculation n = 2 delta = 13.5 sd = 1.19 sig.level = 0.05 power = 0.9982097 alternative = two.sided Now, with n=2, the power is already .99. With n=1, there are zero df. So, what n corresponds to a power of .8? power.t.test(n = 1.6305, delta = 13.5, sd = 1.19, sig.level = 0.05) Two-sample t test power calculation n = 1.6305 delta = 13.5 sd = 1.19 sig.level = 0.05 power = 0.8003734 alternative = two.sided It looks like 1.63 subjects will do the job :-) Finally, look at the power.t.test function, there is a line that explains your error message: else if (is.null(n)) n - uniroot(function(n) eval(p.body) - power, c(2, 1e+07))$root power.t.test() is making the sensible assumption that we only care about sample sizes of at least n = 2 albyn On Mon, Apr 18, 2011 at 02:31:19PM -0700, Schatzi wrote: I am trying to do a power analysis to get the number of replicas per treatment. If I try to get the power it works just fine: setn=c(2,3) sdx=c(1.19,4.35) power.t.test(n = setn, delta = 13.5, sd = sdx, sig.level = 0.05,power = NULL) If I go the other way to obtain the n I have problems. sdx=c(1.19,4.35) pow=c(.8,.8) power.t.test(n = NULL, delta = 13.5, sd = sdx, sig.level = 0.05, power = 0.8) Is there any way to do this? Thank you. -- View this message in context: http://r.789695.n4.nabble.com/Power-Analysis-tp3458786p3458786.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MLE where loglikelihood function is a function of numerical solutions
Hi Kristian The obvious approach is to treat it like any other MLE problem: evaluation of the log-likelihood is done as often as necessary for the optimizer you are using: eg a call to optim(psi,LL,...) where LL(psi) evaluates the log likelihood at psi. There may be computational shortcuts that would work if you knew that LL(psi+eps) were well approximated by LL(psi), for the values of eps used to evaluate numerical derivatives of LL. Of course, then you might need to write your own custom optimizer. albyn Quoting Kristian Lind kristian.langgaard.l...@gmail.com: Hi there, I'm trying to solve a ML problem where the likelihood function is a function of two numerical procedures and I'm having some problems figuring out how to do this. The log-likelihood function is of the form L(c,psi) = 1/T sum [log (f(c, psi)) - log(g(c,psi))], where c is a 2xT matrix of data and psi is the parameter vector. f(c, psi) is the transition density which can be approximated. The problem is that in order to approximate this we need to first numerically solve 3 ODEs. Second, numerically solve 2 non-linear equations in two unknowns wrt the data. The g(c,psi) function is known, but dependent on the numerical solutions. I have solved the ODEs using the deSolve package and the 2 non-linear equations using the BB package, but the results are dependent on the parameters. How can I write a program that will maximise this log-likelihood function, taking into account that the numerical procedures needs to be updated for each iteration in the maximization procedure? Any help will be much appreciated. Kristian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MLE where loglikelihood function is a function of numerical solutions
to clarify: by if you knew that LL(psi+eps) were well approximated by LL(psi), for the values of eps used to evaluate numerical derivatives of LL. I mean the derivatives of LL(psi+eps) are close to the derivatives of LL(psi), and perhaps you would want the hessian to be close as well. albyn Quoting Albyn Jones jo...@reed.edu: Hi Kristian The obvious approach is to treat it like any other MLE problem: evaluation of the log-likelihood is done as often as necessary for the optimizer you are using: eg a call to optim(psi,LL,...) where LL(psi) evaluates the log likelihood at psi. There may be computational shortcuts that would work if you knew that LL(psi+eps) were well approximated by LL(psi), for the values of eps used to evaluate numerical derivatives of LL. Of course, then you might need to write your own custom optimizer. albyn Quoting Kristian Lind kristian.langgaard.l...@gmail.com: Hi there, I'm trying to solve a ML problem where the likelihood function is a function of two numerical procedures and I'm having some problems figuring out how to do this. The log-likelihood function is of the form L(c,psi) = 1/T sum [log (f(c, psi)) - log(g(c,psi))], where c is a 2xT matrix of data and psi is the parameter vector. f(c, psi) is the transition density which can be approximated. The problem is that in order to approximate this we need to first numerically solve 3 ODEs. Second, numerically solve 2 non-linear equations in two unknowns wrt the data. The g(c,psi) function is known, but dependent on the numerical solutions. I have solved the ODEs using the deSolve package and the 2 non-linear equations using the BB package, but the results are dependent on the parameters. How can I write a program that will maximise this log-likelihood function, taking into account that the numerical procedures needs to be updated for each iteration in the maximization procedure? Any help will be much appreciated. Kristian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there an exact binomial k-sample test of equivalence?
Presumably the null hypothesis is that at least one of the differences is larger in absolute magnitude than the chosen epsilon. I expect that your procedure would be conservative: if it rejects the null hypothesis, then you are ok, but presumably what you really want would be based on a joint confidence region for all the proportions. albyn Quoting ?ukasz R?c?awowicz lukasz.reclawow...@gmail.com: Hi, I've got one silly question for evening. I don't know is this reasonable, but can test with two the most extreme proportions from the samples could be good enough evidence for testing equivalence, or should I have to look for something else...? -- Mi³ego dnia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for equivalence
Reading the original post it was clear to me that the poster was looking for a test of equivalence, but obviously there was room for interpretation! albyn On Mon, Feb 14, 2011 at 09:46:13AM -0700, Greg Snow wrote: Reading the original post it is fairly clear that the original poster's question does not match with the traditional test of equivalence, but rather is trying to determine distinguishable or indistinguishable. If the test in my suggestion is statistically significant (and note I did not suggest only testing the interaction) then that meets one possible interpretation of distinguishable, a non-significant result could mean either equivalence or low power, the combination of which could be an interpretation of indistinguishable. I phrased my response as a question in hopes that the original poster would think through what they really wanted to test and get back to us with further details. It could very well be that my response is very different from what they were thinking, but explaining how it does not fit will better help us understand the real problem. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Albyn Jones [mailto:jo...@reed.edu] Sent: Sunday, February 13, 2011 9:53 PM To: Greg Snow Cc: syrvn; r-help@r-project.org Subject: Re: [R] Test for equivalence testing the null hypothesis of no interaction is not the same as a test of equivalence for the two differences. There is a literature on tests of equivalence. First you must develop a definition of equivalence, for example the difference is in the interval (-a,a). Then, for example, you test the null hypothesis that the difference is in [a,inf) or (-inf,-a] (a TOST, or two one sided tests). One simple procedure: check to see if the 90% CI for the difference (difference of the differences or the interaction effect) is contained in the interval (-a,a). albyn Quoting Greg Snow greg.s...@imail.org: Does it make sense for you to combine the 2 data sets and do a 2-way anova with treatment vs. control as one factor and experiment number as the other factor? Then you could test the interaction and treatment number factor to see if they make a difference. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of syrvn Sent: Saturday, February 12, 2011 7:30 AM To: r-help@r-project.org Subject: [R] Test for equivalence Hi! is there a way in R to check whether the outcome of two different experiments is statistically distinguishable or indistinguishable? More preciously, I used the wilcoxon test to determine the differences between controls and treated subjects for two different experiments. Now I would like to check whether the two lists of analytes obtained are statistically distinguishable or indistinguishable I tried to use a equivalence test from the 'equivalence' package in R but it seems that this test is not applicable to my problem. The test in the 'equivalence' package just determines similarity between two conditions but I need to compare the outcome of two different experiments. My experiments are constructed as follows: Exp1: 8 control samples 8 treated samples - determine significantly changes (List A) Exp2: 8 control samples 8 treated samples - determine significantly changes (List B) Now i would like to check whether List A and List B are distinguishable or indistinguishable. Any advice is very much appreciated! Best, beginner -- View this message in context: http://r.789695.n4.nabble.com/Test- for- equivalence-tp3302739p3302739.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
Re: [R] Test for equivalence
testing the null hypothesis of no interaction is not the same as a test of equivalence for the two differences. There is a literature on tests of equivalence. First you must develop a definition of equivalence, for example the difference is in the interval (-a,a). Then, for example, you test the null hypothesis that the difference is in [a,inf) or (-inf,-a] (a TOST, or two one sided tests). One simple procedure: check to see if the 90% CI for the difference (difference of the differences or the interaction effect) is contained in the interval (-a,a). albyn Quoting Greg Snow greg.s...@imail.org: Does it make sense for you to combine the 2 data sets and do a 2-way anova with treatment vs. control as one factor and experiment number as the other factor? Then you could test the interaction and treatment number factor to see if they make a difference. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of syrvn Sent: Saturday, February 12, 2011 7:30 AM To: r-help@r-project.org Subject: [R] Test for equivalence Hi! is there a way in R to check whether the outcome of two different experiments is statistically distinguishable or indistinguishable? More preciously, I used the wilcoxon test to determine the differences between controls and treated subjects for two different experiments. Now I would like to check whether the two lists of analytes obtained are statistically distinguishable or indistinguishable I tried to use a equivalence test from the 'equivalence' package in R but it seems that this test is not applicable to my problem. The test in the 'equivalence' package just determines similarity between two conditions but I need to compare the outcome of two different experiments. My experiments are constructed as follows: Exp1: 8 control samples 8 treated samples - determine significantly changes (List A) Exp2: 8 control samples 8 treated samples - determine significantly changes (List B) Now i would like to check whether List A and List B are distinguishable or indistinguishable. Any advice is very much appreciated! Best, beginner -- View this message in context: http://r.789695.n4.nabble.com/Test-for- equivalence-tp3302739p3302739.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Integral of PDF
To really understaand it you will have to look at the fortran code underlying integrate. I tracked it back through a couple of layers (dqagi, dqagie, ... just use google, these are old netlib subroutines) then decided I ought to get back to grading papers :-) It looks like the integral is split into the sum of two integrals, (-Inf,0] and [0,Inf). integrate(function(x) dnorm(x, 350,50), 0, Inf) 1 with absolute error 1.5e-05 integrate(function(x) dnorm(x, 400,50), 0, Inf) 1.068444e-05 with absolute error 2.1e-05 integrate(function(x) dnorm(x, 500,50), 0, Inf) 8.410947e-11 with absolute error 1.6e-10 integrate(function(x) dnorm(x, 500,50), 0, 1000) 1 with absolute error 7.4e-05 The integral from 0 to Inf is the lim_{x - Inf} of the integral over [0,x]. I suspect that the algorithm is picking an interval [0,x], evaluating the integral over that interval, and then performing some test to decide whether to expand the interval. When the initial interval doesn't contain much, expanding a little won't add enough to trigger another iteration. albyn On Thu, Dec 02, 2010 at 04:21:34PM -0500, Doran, Harold wrote: The integral of any probability density from -Inf to Inf should equal 1, correct? I don't understand last result below. integrate(function(x) dnorm(x, 0,1), -Inf, Inf) 1 with absolute error 9.4e-05 integrate(function(x) dnorm(x, 100,10), -Inf, Inf) 1 with absolute error 0.00012 integrate(function(x) dnorm(x, 500,50), -Inf, Inf) 8.410947e-11 with absolute error 1.6e-10 all.equal(integrate(function(x) dnorm(x, 500,50), -Inf, Inf)$value, 0) [1] TRUE sessionInfo() R version 2.10.1 (2009-12-14) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] statmod_1.4.6 mlmRev_0.99875-1 lme4_0.999375-35 Matrix_0.999375-33 lattice_0.17-26 loaded via a namespace (and not attached): [1] grid_2.10.1 nlme_3.1-96 stats4_2.10.1 tools_2.10.1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Integral of PDF
On Thu, Dec 02, 2010 at 06:23:45PM -0500, Ravi Varadhan wrote: A simple solution is to locate the mode of the integrand, which should be quite easy to do, and then do a coordinate shift to that point and then integrate the mean-shifted integrand using `integrate'. Ravi. Translation: think before you compute! albyn __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Consistency of Logistic Regression
do you have factors (categorical variables) in the model? it could be just a parameterization difference. albyn On Thu, Nov 11, 2010 at 12:41:03PM -0500, Benjamin Godlove wrote: Dear R developers, I have noticed a discrepancy between the coefficients returned by R's glm() for logistic regression and SAS's PROC LOGISTIC. I am using dist = binomial and link = logit for both R and SAS. I believe R uses IRLS whereas SAS uses Fisher's scoring, but the difference is something like 100 SE on the intercept. What accounts for such a huge difference? Thank you for your time. Ben Godlove [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to detect if a vector is FP constant?
how about all.equal(x,rep(mean(x),length(x))) or all.equal(x,rep(mean(x),length(x), tolerance=...) albyn On Mon, Nov 08, 2010 at 06:45:00PM -0600, Hadley Wickham wrote: Hi all, What's the equivalent to length(unique(x)) == 1 if want to ignore small floating point differences? Should I look at diff(range(x)) or sd(x) or something else? What cut off should I use? If it helps to be explicit, I'm interested in detecting when a vector is constant for the purpose of visual display. In other words, if I rescale x to [0, 1] do I have enough precision to get at least 100 unique values. Thanks! Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Clustering
I have worked with seismic data measured at 100hz, and had no trouble locating events in long records (several times the size of your dataset). 20 minutes is high frequency? what kind of waves are these? what is the wavelength? some details would help. albyn On Thu, Oct 28, 2010 at 05:00:10AM -0700, dpender wrote: I am looking to use R in order to determine the number of extreme events for a high frequency (20 minutes) dataset of wave heights that spans 25 years (657,432) data points. I require the number, spacing and duration of the extreme events as an output. I have briefly used the clusters function in evd package. Can anyone suggest a more appropriate package to use for such a large dataset? Thanks, Doug -- View this message in context: http://r.789695.n4.nabble.com/Clustering-tp3017056p3017056.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [OT] (slightly) - OpenOffice Calc and text files
How about emacs? albyn On Wed, Oct 13, 2010 at 01:13:03PM -0400, Schwab,Wilhelm K wrote: Hello all, . Have any of you found a nice (or at least predictable) way to use OO Calc to edit files like this? If it insists on thinking for me, I wish it would think in 24 hour time and 4 digit years :) I work on Linux, so Excel is off the table, but another spreadsheet or text editor would be a viable option, as would configuration changes to Calc. Bill __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The future of R - Ross Ihaka stirs discussions around the web
There was an award session at the Vancouver JSM where both Ross Ihaka and Robert Gentleman spoke. The simply-start-over post sounds very much like Ross's JSM talk. Robert's response was much more positive - coming from the perspective of developing BioConductor I think. I don't recall the details, but the cartoon version is something like R isn't really broken. albyn On Mon, Sep 13, 2010 at 09:51:39PM +0200, Tal Galili wrote: Hello all, There is currently a (very !) lively discussions happening around the web, surrounding the following topics: 1) Is R efficient? (scripting wise, and performance wise) 2) Should R be written from scratch? 3) What should be the license of R (if it was made a new)? Very serious people have taken part in the debates so far. I hope to let you know of the places I came by, so you might be able to follow/participate in these (IMHO) important discussions. The discussions started in the response for the following blog post on Xi'An's blog: http://xianblog.wordpress.com/2010/09/06/insane/ Followed by the (short) response post by Ross Ihaka: http://xianblog.wordpress.com/2010/09/13/simply-start-over-and-build-something-better/ Other discussions started to appear on Andrew Gelman's blog: http://www.stat.columbia.edu/~cook/movabletype/archives/2010/09/ross_ihaka_to_r.html And (many) more responses started to appear in the hackers news website: http://news.ycombinator.com/item?id=1687054 I hope these discussions will have fruitful results for our community, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mathematica and R
Take a look at Sage, which is an open source alternative. It already integrates R (http://www.sagemath.org) albyn Quoting David Bickel davidbickel.com+rh...@gmail.com: What are some effective ways to leverage the strengths of R and Mathematica for the analysis of a single data set? More specifically, are there any functions that can assist with any of the following? 1. Calling an R function from Mathematica. 2. Calling a Mathematica function from R. 3. Using XML or another reliable data format to pass vectors, matrices, and/or lists from one environment to the other. Any advice would be appreciated. David -- David R. Bickel, PhD Associate Professor Ottawa Institute of Systems Biology Biochem., Micro. and I. Department Mathematics and Statistics Department University of Ottawa 451 Smyth Road Ottawa, Ontario K1H 8M5 http://www.statomics.com Office Tel: (613) 562-5800 ext. 8670 Office Fax: (613) 562-5185 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Entropy of a Markov chain
The transition matrix is a collection of conditional distributions. it would seem natural to compute the entropy of the stationary distribution. albyn Quoting Wilson, Andrew a.wil...@lancaster.ac.uk: Does anyone have any R code for computing the entropy of a simple first or second order Markov chain, given a transition matrix something like the following (or the symbol vector from which it is computed)? AGRe ARIe CSRe DIRe DSCe eos HRMe SPTe TOBe AGRe 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 ARIe 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 CSRe 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 DIRe 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 DSCe 0.167 0.167 0.000 0.000 0.167 0.000 0.167 0.167 0.167 eos 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 HRMe 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 NMSe 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 TOBe 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 [The second order matrix would have column names incorporating both prior states - e.g. SPTe.TOBe.] I looked around at the various simple entropy functions but couldn't find anything for this specific problem - they seem mostly to assume a single numerical vector as input. Many thanks in advance for any help, Andrew Wilson __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very OT: World Cup Statistics
Paul The FIFA database doesn't have times that goals are scored either. The best I have found is at http://www.worldcup-history.com/, but you have to check individual match reports for the times that goals are scored. albyn On Mon, Jun 07, 2010 at 09:50:34PM +0100, Paul wrote: Hello, Sorry for the very Off TOpic post, but I need some data on past football (soccer) world cups. I'd like to find (or calculate) the time to the first goal of each match (where a goal was scored). I''ve looked at the uefa website and can't find what I want, maybe someone here can help ? Thanks Paul. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random walk
Sums of correlated increments have the same correlation as the original variables... library(mvtnorm) X- matrix(0,nrow=1000,ncol=2) for(i in 1:1000){ Y - rmvnorm(1000,mean=mu,sigma=S) X[i,] - apply(Y,2,sum) } cor(Y) [,1] [,2] [1,] 1.000 0.4909281 [2,] 0.4909281 1.000 So, unless you meant that you want the _sample_ correlation to be pre-specified, you are all set. albyn On Sun, May 09, 2010 at 09:20:25PM -0400, Sergio Andrés Estay Cabrera wrote: Hi everybody, I am trying to generate two random walks with an specific correlation, for example, two random walks of 200 time steps with a correlation 0.7. I built the random walks with: x-cumsum(rnorm(200, mean=0,sd=1)) y-cumsum(rnorm(200, mean=0,sd=1)) but I don't know how to fix the correlation between them. With white noise is easy to fix the correlation using the function rmvnorm in the package mvtnorm I surfed in the web in the searchable mail archives in the R web site but no references appears. If you have some advices to solve this problems I would be very thankful. Thanks in advance. Sergio A. Estay *CASEB * Departamento de Ecología Universidad Catolica de Chile -- “La disciplina no tiene ningún mérito en circunstancias ideales. ” – Habor Mallow __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cluster with mahalanobis distance
Note: this procedure assumes that all clusters have the same covariance matrix. albyn On Wed, Mar 03, 2010 at 01:23:37PM -0800, Phil Spector wrote: The manhattan distance and the Mahalanobis distances are quite different. One of the main differences is that a covariance matrix is necessary to calculate the Mahalanobis distance, so it's not easily accomodated by dist. There is a function in base R which does calculate the Mahalanobis distance -- mahalanobis(). So if you pass a distance matrix calculated by mahalanobis() to the clustering function, you'll get what you want. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Wed, 3 Mar 2010, Tal Galili wrote: when you create the distance function to put into the hclust, use: dist(x, method = manhattan) Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Wed, Mar 3, 2010 at 9:14 PM, naama nw...@technion.ac.il wrote: How can I perform cluster analysis using the mahalanobis distance instead of the euclidean distance? thank you Naama Wolf -- View this message in context: http://n4.nabble.com/cluster-with-mahalanobis-distance-tp1577038p1577038.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optimization challenge
The key idea is that you are building a matrix that contains the solutions to smaller problems which are sub-problems of the big problem. The first row of the matrix SSQ contains the solution for no splits, ie SSQ[1,j] is just the sum of squares about the overall mean for reading chapters1 through j in one day. The iteration then uses row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j chapters in m-1 days) is part of the overall optimal solution, you have already computed it, and so don't ever need to recompute it. TS = SSQ[m-1,j]+(SSQ1[j+1]) computes the vector of possible solutions for SSQ[m,n] (n chapters in n days) breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1 to n in 1 day. j is a vector in the function, and min(TS) is the minimum over choices of j, ie SSQ[m,n]. At the end, SSQ[128,239] is the optimal value for reading all 239 chapters in 128 days. That's just the objective function, so the rest involves constructing the list of optimal cuts, ie which chapters are grouped together for each day's reading. That code uses the same idea... constructing a list of lists of cutpoints. statisticians should study a bit of data structures and algorithms! albyn On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote: WOW, your results give about half the variance of my best optim run (possibly due to my suboptimal use of optim). Can you describe a little what the algorithm is doing? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Albyn Jones [mailto:jo...@reed.edu] Sent: Tuesday, January 12, 2010 5:31 PM To: Greg Snow Cc: r-help@r-project.org Subject: Re: [R] optimization challenge Greg Nice problem: I wasted my whole day on it :-) I was explaining my plan for a solution to a colleague who is a computer scientist, he pointed out that I was trying to re-invent the wheel known as dynamic programming. here is my code, apparently it is called bottom up dynamic programming. It runs pretty quickly, and returns (what I hope is :-) the optimal sum of squares and the cut-points. function(X=bom3$Verses,days=128){ # find optimal BOM reading schedule for Greg Snow # minimize variance of quantity to read per day over 128 days # N = length(X) Nm1 = N-1 SSQ- matrix(NA,nrow=days,ncol=N) Cuts - list() # # SSQ[i,j]: the ssqs about the overall mean for the optimal partition # for i days on the chapters 1 to j # M = sum(X)/days CS = cumsum(X) SSQ[1,]= (CS-M)^2 Cuts[[1]]= as.list(1:N) # for(m in 2:days){ Cuts[[m]]=list() #for(i in 1:(m-1)) Cuts[[m]][[i]] = Cuts[[m-1]][[i]] for(n in m:N){ CS = cumsum(X[n:1])[n:1] SSQ1 = (CS-M)^2 j = (m-1):(n-1) TS = SSQ[m-1,j]+(SSQ1[j+1]) SSQ[m,n] = min(TS) k = min(which((min(TS)== TS)))+m-1 Cuts[[m]][[n]] = c(Cuts[[m-1]][[k-1]],n) } } list(SSQ=SSQ[days,N],Cuts=Cuts[[days]][[N]]) } $SSQ [1] 11241.05 $Cuts [1] 2 4 7 9 11 13 15 16 17 19 21 23 25 27 30 31 34 37 [19] 39 41 44 46 48 50 53 56 59 60 62 64 66 68 70 73 75 77 [37] 78 80 82 84 86 88 89 91 92 94 95 96 97 99 100 103 105 106 [55] 108 110 112 113 115 117 119 121 124 125 126 127 129 131 132 135 137 138 [73] 140 141 142 144 145 146 148 150 151 152 154 156 157 160 162 163 164 166 [91] 167 169 171 173 175 177 179 181 183 185 186 188 190 192 193 194 196 199 [109] 201 204 205 207 209 211 213 214 215 217 220 222 223 225 226 228 234 236 [127] 238 239 On Tue, Jan 12, 2010 at 11:33:36AM -0700, Greg Snow wrote: I have a challenge that I want to share with the group. This is not homework (but I may assign it as such if I teach the appropriate class again) and I have found one solution, so don't need anything urgent. This is more for fun to see if others can find a better solution than I did. The challenge: I want to read a book in a given number of days. I want to read an integer number of chapters each day (there are more chapters than days), no stopping part way through a chapter, and at least 1 chapter each day. The chapters are very non uniform in length (some very short, a few very long, many in between) so I would like to come up with a reading schedule that minimizes the variance of the length of the days readings (read multiple short chapters on the same day, long chapters are the only one read that day). I also want to read through the book in order (no skipping ahead to combine short chapters that are not naturally next to each other. My thought was that the optim function with method=SANN would be an appropriate approach, but my first couple of tries did
Re: [R] optimization challenge
Hi Aaron! It's always nice to see a former student doing well. Thanks for the notes and references, too! albyn On Wed, Jan 13, 2010 at 07:29:57PM -0500, Aaron Mackey wrote: FYI, in bioinformatics, we use dynamic programming algorithms in similar ways to solve similar problems of finding guaranteed-optimal partitions in streams of data (usually DNA or protein sequence, but sometimes numerical data from chip-arrays). These path optimization algorithms are often called Viterbi algorithms, a web search for which should provide multiple references. The solutions are not necessarily unique (there may be multiple paths/partitions with identical integer maxima in some systems) and there is much research on whether the optimal solution is actually the one you want to work with (for example, there may be a fair amount of probability mass within an area/ensemble of suboptimal solutions that overall have greater posterior probabilities than does the optimal solution singleton). See Chip Lawrence's PNAS paper for more erudite discussion, and references therein: www.pnas.org/content/105/9/3209.abstract -Aaron P.S. Good to see you here Albyn -- I enjoyed your stat. methods course at Reed back in 1993, which started me down a somewhat windy road to statistical genomics! -- Aaron J. Mackey, PhD Assistant Professor Center for Public Health Genomics University of Virginia amac...@virginia.edu On Wed, Jan 13, 2010 at 5:23 PM, Ravi Varadhan rvarad...@jhmi.edu wrote: Greg - thanks for posting this interesting problem. Albyn - thanks for posting a solution. Now, I have some questions: (1) is the algorithm guaranteed to find a best solution? (2) can there be multiple solutions (it seems like there can be more than 1 solution depending on the data)?, and (3) is there a good reference for this and similar algorithms? Thanks Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tmlhttp://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h%0Atml -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Albyn Jones Sent: Wednesday, January 13, 2010 1:19 PM To: Greg Snow Cc: r-help@r-project.org Subject: Re: [R] optimization challenge The key idea is that you are building a matrix that contains the solutions to smaller problems which are sub-problems of the big problem. The first row of the matrix SSQ contains the solution for no splits, ie SSQ[1,j] is just the sum of squares about the overall mean for reading chapters1 through j in one day. The iteration then uses row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j chapters in m-1 days) is part of the overall optimal solution, you have already computed it, and so don't ever need to recompute it. TS = SSQ[m-1,j]+(SSQ1[j+1]) computes the vector of possible solutions for SSQ[m,n] (n chapters in n days) breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1 to n in 1 day. j is a vector in the function, and min(TS) is the minimum over choices of j, ie SSQ[m,n]. At the end, SSQ[128,239] is the optimal value for reading all 239 chapters in 128 days. That's just the objective function, so the rest involves constructing the list of optimal cuts, ie which chapters are grouped together for each day's reading. That code uses the same idea... constructing a list of lists of cutpoints. statisticians should study a bit of data structures and algorithms! albyn On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote: WOW, your results give about half the variance of my best optim run (possibly due to my suboptimal use of optim). Can you describe a little what the algorithm is doing? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Albyn Jones [mailto:jo...@reed.edu] Sent: Tuesday, January 12, 2010 5:31 PM To: Greg Snow Cc: r-help@r-project.org Subject: Re: [R] optimization challenge Greg Nice problem: I wasted my whole day on it :-) I was explaining my plan for a solution to a colleague who is a computer scientist, he pointed out that I was trying to re-invent the wheel known as dynamic programming. here is my code, apparently it is called bottom up dynamic
Re: [R] optimization challenge
Greg Nice problem: I wasted my whole day on it :-) I was explaining my plan for a solution to a colleague who is a computer scientist, he pointed out that I was trying to re-invent the wheel known as dynamic programming. here is my code, apparently it is called bottom up dynamic programming. It runs pretty quickly, and returns (what I hope is :-) the optimal sum of squares and the cut-points. function(X=bom3$Verses,days=128){ # find optimal BOM reading schedule for Greg Snow # minimize variance of quantity to read per day over 128 days # N = length(X) Nm1 = N-1 SSQ- matrix(NA,nrow=days,ncol=N) Cuts - list() # # SSQ[i,j]: the ssqs about the overall mean for the optimal partition # for i days on the chapters 1 to j # M = sum(X)/days CS = cumsum(X) SSQ[1,]= (CS-M)^2 Cuts[[1]]= as.list(1:N) # for(m in 2:days){ Cuts[[m]]=list() #for(i in 1:(m-1)) Cuts[[m]][[i]] = Cuts[[m-1]][[i]] for(n in m:N){ CS = cumsum(X[n:1])[n:1] SSQ1 = (CS-M)^2 j = (m-1):(n-1) TS = SSQ[m-1,j]+(SSQ1[j+1]) SSQ[m,n] = min(TS) k = min(which((min(TS)== TS)))+m-1 Cuts[[m]][[n]] = c(Cuts[[m-1]][[k-1]],n) } } list(SSQ=SSQ[days,N],Cuts=Cuts[[days]][[N]]) } $SSQ [1] 11241.05 $Cuts [1] 2 4 7 9 11 13 15 16 17 19 21 23 25 27 30 31 34 37 [19] 39 41 44 46 48 50 53 56 59 60 62 64 66 68 70 73 75 77 [37] 78 80 82 84 86 88 89 91 92 94 95 96 97 99 100 103 105 106 [55] 108 110 112 113 115 117 119 121 124 125 126 127 129 131 132 135 137 138 [73] 140 141 142 144 145 146 148 150 151 152 154 156 157 160 162 163 164 166 [91] 167 169 171 173 175 177 179 181 183 185 186 188 190 192 193 194 196 199 [109] 201 204 205 207 209 211 213 214 215 217 220 222 223 225 226 228 234 236 [127] 238 239 On Tue, Jan 12, 2010 at 11:33:36AM -0700, Greg Snow wrote: I have a challenge that I want to share with the group. This is not homework (but I may assign it as such if I teach the appropriate class again) and I have found one solution, so don't need anything urgent. This is more for fun to see if others can find a better solution than I did. The challenge: I want to read a book in a given number of days. I want to read an integer number of chapters each day (there are more chapters than days), no stopping part way through a chapter, and at least 1 chapter each day. The chapters are very non uniform in length (some very short, a few very long, many in between) so I would like to come up with a reading schedule that minimizes the variance of the length of the days readings (read multiple short chapters on the same day, long chapters are the only one read that day). I also want to read through the book in order (no skipping ahead to combine short chapters that are not naturally next to each other. My thought was that the optim function with method=SANN would be an appropriate approach, but my first couple of tries did not give very good results. I have since come up with an optim with SANN solution that gives what I consider good results (but I accept that better is possible). Below is a data frame with the lengths of the chapters for the book that originally sparked the challenge for me (but the general idea should work for any book). Each row represents a chapter (in order) with 3 different measures of the length of the chapter. For this challenge I want to read the book in 128 days (there are 239 chapters). I will post my solutions in a few days, but I want to wait so that my direction does not influence people from trying other approaches (if there is something better than optim, that is fine). Good luck for anyone interested in the challenge, The data frame: bom3 - structure(list(Chapter = structure(1:239, .Label = c(1 Nephi 1, 1 Nephi 2, 1 Nephi 3, 1 Nephi 4, 1 Nephi 5, 1 Nephi 6, 1 Nephi 7, 1 Nephi 8, 1 Nephi 9, 1 Nephi 10, 1 Nephi 11, 1 Nephi 12, 1 Nephi 13, 1 Nephi 14, 1 Nephi 15, 1 Nephi 16, 1 Nephi 17, 1 Nephi 18, 1 Nephi 19, 1 Nephi 20, 1 Nephi 21, 1 Nephi 22, 2 Nephi 1, 2 Nephi 2, 2 Nephi 3, 2 Nephi 4, 2 Nephi 5, 2 Nephi 6, 2 Nephi 7, 2 Nephi 8, 2 Nephi 9, 2 Nephi 10, 2 Nephi 11, 2 Nephi 12, 2 Nephi 13, 2 Nephi 14, 2 Nephi 15, 2 Nephi 16, 2 Nephi 17, 2 Nephi 18, 2 Nephi 19, 2 Nephi 20, 2 Nephi 21, 2 Nephi 22, 2 Nephi 23, 2 Nephi 24, 2 Nephi 25, 2 Nephi 26, 2 Nephi 27, 2 Nephi 28, 2 Nephi 29, 2 Nephi 30, 2 Nephi 31, 2 Nephi 32, 2 Nephi 33, Jacob 1, Jacob 2, Jacob 3, Jacob 4, Jacob 5, Jacob 6, Jacob 7, Enos 1, Jarom 1, Omni 1, Words of Mormon 1, Mosiah 1, Mosiah 2, Mosiah 3, Mosiah 4, Mosiah 5, Mosiah 6, Mosiah 7, Mosiah 8, Mosiah 9, Mosiah 10, Mosiah 11, Mosiah 12, Mosiah 13, Mosiah 14, Mosiah 15, Mosiah 16, Mosiah 17, Mosiah 18, Mosiah 19, Mosiah 20, Mosiah 21, Mosiah 22, Mosiah 23, Mosiah 24, Mosiah 25, Mosiah 26, Mosiah 27, Mosiah 28, Mosiah 29, Alma
Re: [R] physical distributions
Is the Lorentz distribution another name for the Cauchy distribution with density f(x) = \frac{1}{\pi} \frac{1}{1+x^2} possibly with location and scale parameters? albyn On Mon, Dec 21, 2009 at 05:04:05PM +0100, Markus Häge wrote: Hi there, I was looking for a Lorentz-distribution in R but I didn't find any physical distribution. Is there s.th which I can use for fitting. thanks Markus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MLE for a t distribution
k - infinity gives the normal distribution. You probably don't care much about the difference between k=1000 and k=10, so you might try reparametrizing df on [1,infinity) to a parameter on [0,1]... albyn On Thu, Dec 10, 2009 at 02:14:26PM -0600, Barbara Gonzalez wrote: Given X1,...,Xn ~ t_k(mu,sigma) student t distribution with k degrees of freedom, mean mu and standard deviation sigma, I want to obtain the MLEs of the three parameters (mu, sigma and k). When I try traditional optimization techniques I don't find the MLEs. Usually I just get k-infty. Does anybody know of any algorithms/functions in R that can help me obtain the MLEs? I am especially interested in the MLE for k, the degrees of freedom. Thank you! Barbara __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plots with k-means
what is the dimension of your data? you might try projecting the points into planes defined by 3 cluster centers. plot, for each cluster, a density plot or histogram of distances to the cluster center, and perhaps overlay the density curve for points not in the cluster. albyn Quoting Iuri Gavronski i...@proxima.adm.br: Hi, I'm doing a k-means cluster with 6 clusters and 15 variables. Any suggestions on how to plot the results? I've tried the standard xy plot, but couldn't get much of it. Thansk in advance, Iuri. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plots with k-means
oops, I see the answer in your question: 15... Quoting Albyn Jones jo...@reed.edu: what is the dimension of your data? you might try projecting the points into planes defined by 3 cluster centers. plot, for each cluster, a density plot or histogram of distances to the cluster center, and perhaps overlay the density curve for points not in the cluster. albyn Quoting Iuri Gavronski i...@proxima.adm.br: Hi, I'm doing a k-means cluster with 6 clusters and 15 variables. Any suggestions on how to plot the results? I've tried the standard xy plot, but couldn't get much of it. Thansk in advance, Iuri. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fast cumulative matrix multiplication
If the matrices are not all the same size, then the order of computation will make a difference. simple example: A is 1xn, B is nx1, C is 1xn. A(BC) takes n^3 multiplies, while (AB)C requires 2n. albyn Quoting Todd Schneider todd.w.schnei...@gmail.com: Hi all, I am looking for a function like cumprod() that works for matrix multiplication. In other words, I have matrices [M1, M2, ..., Mn], and I want to calculate [M1, M1%*%M2, M1%*%M2%*%M3, ..., M1%*%...%*%Mn] as quickly as possible. Right now I'm using a for() loop but it seems like there should be a faster way. Any help is appreciated! Thanks, Todd Schneider todd.w.schnei...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] what's the R code for wavelet decomposition (Haar transformation)?
There is a dwt() in package:waveslim, reading the help file: dwt(x, wf=la8, n.levels=4, boundary=periodic) wf: Name of the wavelet filter to use in the decomposition. By default this is set to 'la8', the Daubechies orthonormal compactly supported wavelet of length L=8 (Daubechies, 1992), least asymmetric family. I don't see a list of wavelet filter names in the documentation, but as I recall, there is one in one of the functions: wave.filter(). Look at the code for wave.filer, it includes the following code: switch(name, haar = select.haar(), d4 = select.d4(), mb4 = select.mb4(), w4 = select.w4(), bs3.1 = select.bs3.1(), fk4 = select.fk4(), d6 = select.d6(), fk6 = select.fk6(), d8 = select.d8(), fk8 = select.fk8(), la8 = select.la8(), mb8 = select.mb8(), bl14 = select.bl14(), fk14 = select.fk14(), d16 = select.d16(), la16 = select.la16(), mb16 = select.mb16(), la20 = select.la20(), bl20 = select.bl20(), fk22 = select.fk22(), mb24 = select.mb24(), stop(Invalid selection for wave.filter)) try dwt(x, wf=haar) albyn On Mon, Oct 19, 2009 at 02:41:56PM -0500, stephen sefick wrote: What package are you using? There are quite a few functions that do wavelet decomposition. Have you tried an R site search? Stephen On Fri, Oct 16, 2009 at 3:43 PM, Zhen Li z...@bios.unc.edu wrote: Dear all, Using R function dwt, it seems that I cannot specify the wavelet transformation like Haar. What's the R code for wavelet decomposition which allows me to specify Haar wavelet transformation? Of course, if it can include db2, that is even better. In general, I want an R function like matlab code dwt. Thanks in advance! Zhen Li __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating a stochastic matrix with a specified second dominant eigenvalue
ooops! I tested it on a 3x3, and a 4x4, and it worked both times. Then I ran it once more and pasted it in without looking carefully... Here's another half-baked idea. Generate a random graph with given degree distribution, turn the incidence matrix into a stochastic matrix (ie the transition probability is 1/d where d is the degree of the vertex). Now all we need is the connection between the degree distribution and the second eigenvalue :-) If you want lambda close to one, perhaps you could generate two graphs, then add vertices joining the two graphs until the second eigenvalue is the right size of course, this will give a random lambda, not exactly your chosen value. albyn On Fri, Oct 16, 2009 at 10:32:48AM -0400, Ravi Varadhan wrote: A valiant attempt, Albyn! Unfortunately, the matrix B is not guaranteed to be a stochastic matrix. In fact, it is not even guaranteed to be a real matrix. Your procedure can generate a B that contains negative elements or even complex elements. M = matrix(runif(9),nrow=3) M = M/apply(M,1,sum) e=eigen(M) e$values[2]= .7 Q = e$vectors Qi = solve(Q) B = Q %*% diag(e$values) %*% Qi eigen(B)$values [1] 1. 0.7000 -0.04436574 apply(B,1,sum) [1] 1 1 1 B [,1] [,2] [,3] [1,] 0.77737077 0.3340768 -0.1114476 [2,] 0.20606226 0.2601840 0.5337537 [3,] 0.08326022 0.2986603 0.6180794 Note that B[1,3] is negative. Another example: M = matrix(runif(9),nrow=3) M = M/apply(M,1,sum) e=eigen(M) e$values[2]= .95 Q = e$vectors Qi = solve(Q) B = Q %*% diag(e$values) %*% Qi eigen(B)$values [1] 1.-0.i 0.9500+0.i -0.09348883-0.02904173i apply(B,1,sum) [1] 1+0i 1-0i 1+0i B [,1] [,2] [,3] [1,] 0.6558652-0.550613i 0.2408879+0.2212234i 0.1032469+0.3293896i [2,] 0.1683119+1.594515i 0.6954317-0.7378503i 0.1362564-0.8566647i [3,] 0.2812210-2.462385i 0.2135648+1.2029636i 0.5052143+1.2594216i Note that B has complex elements. So, I took your algorithm and embedded it in an iterative procedure to keep repeating your steps until it found a B matrix that is real and non-negative. Here is that function: e2stochMat - function(N, e2, maxiter) { iter - 0 while (iter = maxiter) { iter - iter + 1 M - matrix(runif(N*N), nrow=N) M - M / apply(M,1,sum) e - eigen(M) e$values[2] -e2 Q - e$vectors B - Q %*% diag(e$values) %*% solve(Q) real - all (abs(Im(B)) 1.e-16) positive - all (Re(B) 0) if (real positive) break } list(stochMat=B, iter=iter) } e2stochMat(N=3, e2=0.95, maxiter=1) # works e2stochMat(N=5, e2=0.95, maxiter=1) # fails This works for very small N, say, N = 3, but it fails for larger N. The probability of success is a decreasing function of N and e2. So, the algorithm fails for large N and for values of e2 close to 1. Thanks for trying. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tml -Original Message- From: Albyn Jones [mailto:jo...@reed.edu] Sent: Thursday, October 15, 2009 6:56 PM To: Ravi Varadhan Cc: r-help@r-project.org Subject: Re: [R] Generating a stochastic matrix with a specified second dominant eigenvalue I just tried the following shot in the dark: generate an N by N stochastic matrix, M. I used M = matrix(runif(9),nrow=3) M = M/apply(M,1,sum) e=eigen(M) e$values[2]= .7 (pick your favorite lambda, you may need to fiddle with the others to guarantee this is second largest.) Q = e$vectors Qi = solve(Q) B = Q %*% diag(e$values) %*% Qi eigen(B)$values [1] 1. 0.7000 -0.08518772 apply(B,1,sum) [1] 1 1 1 I haven't proven that this must work, but it seems to. Since you can verify that it worked afterwards, perhaps the proof is in the pudding. albyn On Thu, Oct 15, 2009 at 06:24:20PM -0400, Ravi Varadhan wrote: Hi, Given a positive integer N, and a real number \lambda such that 0 \lambda 1, I would like to generate an N by N stochastic matrix (a matrix with all the rows summing to 1), such that it has the second largest eigenvalue equal to \lambda (Note: the dominant eigenvalue of a stochastic matrix is 1). I don't care what the other eigenvalues are. The second eigenvalue is important in that it governs the rate at which the random process given by the stochastic matrix converges to its
Re: [R] Generating a stochastic matrix with a specified second dominant eigenvalue
I just tried the following shot in the dark: generate an N by N stochastic matrix, M. I used M = matrix(runif(9),nrow=3) M = M/apply(M,1,sum) e=eigen(M) e$values[2]= .7 (pick your favorite lambda, you may need to fiddle with the others to guarantee this is second largest.) Q = e$vectors Qi = solve(Q) B = Q %*% diag(e$values) %*% Qi eigen(B)$values [1] 1. 0.7000 -0.08518772 apply(B,1,sum) [1] 1 1 1 I haven't proven that this must work, but it seems to. Since you can verify that it worked afterwards, perhaps the proof is in the pudding. albyn On Thu, Oct 15, 2009 at 06:24:20PM -0400, Ravi Varadhan wrote: Hi, Given a positive integer N, and a real number \lambda such that 0 \lambda 1, I would like to generate an N by N stochastic matrix (a matrix with all the rows summing to 1), such that it has the second largest eigenvalue equal to \lambda (Note: the dominant eigenvalue of a stochastic matrix is 1). I don't care what the other eigenvalues are. The second eigenvalue is important in that it governs the rate at which the random process given by the stochastic matrix converges to its stationary distribution. Does anyone know of an algorithm to do this? Thanks for any help, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan. html http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tml [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parameters of Beta distribution
Maithili I find it really hard to believe that a beta distribution would be a reasonable probability model for loss data. There would have to be an upper bound on the size of losses. What is the process that generates the data. Is there any natural upper bound? Why is there a lower bound greater than zero? That said, the MLE's would be the min and max, but those will underestimate the range of a beta. It is an elementary exercise to see why with the uniform[0,B] (ie beta(1,1)), for which the expected value of the max of a sample of size n is B*n/(n+1). If you have a lot of data, this may not bother you. For an arbitrary beta distribution you would have 4 parameters to estimate...probably a Bayes estimator would be easiest. I'll put this one away for an exercise my next math stats course... albyn Quoting Maithili Shiva maithili_sh...@yahoo.com: Dear Albyn, Thanks for your reply. Yes A and B are unknown. I was just thinking to assign - A = min(amounts) and B = max(amounts). The actual loss data I am dealing with is large. I am trying to fit some statistical distributions to this data. I already have done with R code pertaining to many other distributions like Normal, Weibull, Pareto, Generalized extreme Value distribution etc. So just want to know how to estimate the parameters if I need to check whether the Beta distribution fits the loss data. Is it possible for you to guide me how to estimate A and B or can I assume A = min(amounts) and B = max(Amounts) Regards Maithili --- On Wed, 7/10/09, Albyn Jones jo...@reed.edu wrote: From: Albyn Jones jo...@reed.edu Subject: Re: [R] Parameters of Beta distribution To: jlu...@ria.buffalo.edu Cc: Maithili Shiva maithili_sh...@yahoo.com, r-help@r-project.org, r-help-boun...@r-project.org Date: Wednesday, 7 October, 2009, 3:30 PM Are A and B known? That is, are there known upper and lower bounds for this credit loss data? If not, you need to think about how to estimate those bounds. Why do you believe the data have a beta distribution? albyn On Wed, Oct 07, 2009 at 09:03:31AM -0400, jlu...@ria.buffalo.edu wrote: Rescale your data x to (x-A)/(B-A). Maithili Shiva maithili_sh...@yahoo.com Sent by: r-help-boun...@r-project.org 10/07/2009 08:39 AM To r-help@r-project.org cc Subject [R] Parameters of Beta distribution Supose I have a data pertaining to credit loss as amounts - c(46839.50,31177.12,35696.69,21192.57,29200.91,42049.64,42422.19, 44976.18, 32135.36,47936.57,27322.91,37359.09,43179.60, 48381.02, 45872.38, 28057.30,44643.83,36156.33,16037.62, 45432.28) I am trying to fit Beta distribution (two parameters distribution but where lower bound and upper bounds are NOT 0 and 1 respectively). For this I need to estimate the two parameters of Beta distribution. I found some code in VGAM pacakge but it deals with standard Beta distribution i.e. lower bound (say A) = 0 and upper bound (say B) = 1. How do I estimate the parameters of the Beta distribution for above data where A and B are not 0's? Please guide. Thanking you in advance Maithili Add whatever you love to the Yahoo! India homepage. Try now! http://in.yahoo.com/trynew [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Now, send attachments up to 25MB with Yahoo! India Mail. Learn how. http://in.overview.mail.yahoo.com/photos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parameters of Beta distribution
Quoting David Winsemius dwinsem...@comcast.net: In insurance situation there is typically a cap on the covered losses and there is also typically an amount below which it would not make sense to offer a policy. So a minimum and a maximum are sensible assumptions about loss distributions in may real modeling situations. -- David. is that cap the same for every policy, or are there different types of policies with different bounds? If there are caps, then other probability models mentioned like the Weibull don't seem reasonable, unless they are truncated distribution models. albyn __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parameters of Beta distribution
Are A and B known? That is, are there known upper and lower bounds for this credit loss data? If not, you need to think about how to estimate those bounds. Why do you believe the data have a beta distribution? albyn On Wed, Oct 07, 2009 at 09:03:31AM -0400, jlu...@ria.buffalo.edu wrote: Rescale your data x to (x-A)/(B-A). Maithili Shiva maithili_sh...@yahoo.com Sent by: r-help-boun...@r-project.org 10/07/2009 08:39 AM To r-help@r-project.org cc Subject [R] Parameters of Beta distribution Supose I have a data pertaining to credit loss as amounts - c(46839.50,31177.12,35696.69,21192.57,29200.91,42049.64,42422.19, 44976.18, 32135.36,47936.57,27322.91,37359.09,43179.60, 48381.02, 45872.38, 28057.30,44643.83,36156.33,16037.62, 45432.28) I am trying to fit Beta distribution (two parameters distribution but where lower bound and upper bounds are NOT 0 and 1 respectively). For this I need to estimate the two parameters of Beta distribution. I found some code in VGAM pacakge but it deals with standard Beta distribution i.e. lower bound (say A) = 0 and upper bound (say B) = 1. How do I estimate the parameters of the Beta distribution for above data where A and B are not 0's? Please guide. Thanking you in advance Maithili Add whatever you love to the Yahoo! India homepage. Try now! http://in.yahoo.com/trynew [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scale or not to scale that is the question - prcomp
scaling changes the metric, ie which things are close to each other. there is no reason to expect the picture to look the same when you change the metric. On the other hand, your two pictures don't look so different to me. It appears that the scaled plot is similar to the unscaled plot, with the roles of the second and third pc reversed, ie the scaled plot is similar but rotated and distorted. For example, the observations forming the strip across the bottom of the first plot form a vertical strip on the right hand side of the second plot. albyn On Wed, Aug 19, 2009 at 02:31:23PM +0200, Petr PIKAL wrote: Dear all here is my data called rglp structure(list(vzorek = structure(1:17, .Label = c(179/1/1, 179/2/1, 180/1, 181/1, 182/1, 183/1, 184/1, 185/1, 186/1, 187/1, 188/1, 189/1, 190/1, 191/1, 192/1, R310, R610L), class = factor), iep = c(7.51, 7.79, 5.14, 6.35, 5.82, 7.13, 5.95, 7.27, 6.29, 7.5, 7.3, 7.27, 6.46, 6.95, 6.32, 6.32, 6.34), skupina = c(7.34, 7.34, 5.14, 6.23, 6.23, 7.34, 6.23, 7.34, 6.23, 7.34, 7.34, 7.34, 6.23, 7.34, 6.23, 6.23, 6.23), sio2 = c(0.023, 0.011, 0.88, 0.028, 0.031, 0.029, 0.863, 0.898, 0.95, 0.913, 0.933, 0.888, 0.922, 0.882, 0.923, 1, 1), p2o5 = c(0.78, 0.784, 1.834, 1.906, 1.915, 0.806, 1.863, 0.775, 0.817, 0.742, 0.783, 0.759, 0.787, 0.758, 0.783, 3, 2), al2o3 = c(5.812, 5.819, 3.938, 5.621, 3.928, 3.901, 5.621, 5.828, 4.038, 5.657, 3.993, 5.735, 4.002, 5.728, 4.042, 6, 5), dus = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c(ano, ne), class = factor)), .Names = c(vzorek, iep, skupina, sio2, p2o5, al2o3, dus), class = data.frame, row.names = c(NA, -17L)) and I try to do principal component analysis. Here is one without scaling fit-prcomp(~iep+sio2+al2o3+p2o5+as.numeric(dus), data=rglp, factors=2) biplot(fit, choices=2:3,xlabs=rglp$vzorek, cex=.8) you can see that data make 3 groups according to variables sio2 and dus which seems to be reasonable as lowest group has different value of dus = ano while highest group has low value of sio2. But when I do the same with scale=T fit-prcomp(~iep+sio2+al2o3+p2o5+as.numeric(dus), data=rglp, factors=2, scale=T) biplot(fit, choices=2:3,xlabs=rglp$vzorek, cex=.8) I get completely different picture which is not possible to interpret in such an easy way. So if anybody can advice me if I shall follow recommendation from help page (which says The default is FALSE for consistency with S, but in general scaling is advisable. or if I shall stay with scale = FALSE and with simply interpretable result? Thank you. Petr Pikal petr.pi...@precheza.cz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] truncating values into separate categories
It appears that your difficulty lies in miscounting the number of intervals. cut(NP, breaks=c(0,1,2,3,4,max(NP))) [1] (0,1] (0,1] (1,2] (0,1] (0,1] (1,2] (1,2] (0,1] (3,4] (0,1] NA (4,6] (2,3] (2,3] (0,1] [16] (4,6] (2,3] (4,6] (0,1] (4,6] (0,1] (1,2] (1,2] (1,2] (3,4] (3,4] (0,1] (1,2] (0,1] (2,3] [31] (2,3] (0,1] (1,2] (1,2] (0,1] (1,2] (0,1] (1,2] (1,2] (2,3] (0,1] (0,1] (3,4] (3,4] (0,1] [46] (0,1] (0,1] (1,2] (1,2] (1,2] Levels: (0,1] (1,2] (2,3] (3,4] (4,6] cut(NP, breaks=c(0,1,2,3,max(NP)),labels=c(1,2,3,4+)) [1] 112112214+ 1NA 4+ 3 314+ 34+ [19] 14+ 12224+ 4+ 12133 12212 [37] 1223114+ 4+ 111222 Levels: 1 2 3 4+ a=cut(NP, breaks=c(0,1,2,3,max(NP)),labels=c(1,2,3,4+)) table(a,exclude=NULL) a 123 4+ NA 19 15691 Generally it is better to let R keep track of the NA's for you. albyn Quoting PDXRugger j_r...@hotmail.com: Hi all, Simple question which i thought i had the answer but it isnt so simple for some reason. I am sure someone can easily help. I would like to categorize the values in NP into 1 of the five values in Per, with the last category(4) representing values =4(hence 4:max(NP)). The problem is that R is reading max(NP) as multiple values instead of range so the lengths of the labels and the breaks are not matching. Suggestions? Per - c(NA, 1, 2, 3,4) NP=c(1 ,1 ,2 ,1, 1 ,2 ,2 ,1 ,4 ,1 ,0 ,5 ,3 ,3 ,1 ,5 ,3, 5, 1, 6, 1, 2, 2, 2, 4, 4, 1, 2, 1, 3, 3, 1 ,2 ,2 ,1 ,2, 1, 2, 2, 3, 1, 1, 4, 4, 1, 1, 1, 2, 2, 2) Person_CAT - cut(NP, breaks=c(0,1,2,3,4:max(NP)), labels=Per) -- View this message in context: http://www.nabble.com/truncating-values-into-separate-categories-tp24749046p24749046.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to calculate growth rate of a variable
Your data will have all sorts of patterns (diurnal, seasonal) in addition to long term trend. I'd start by smoothing out the cyclic patterns with loess or gam, then use a secant approximation to the slope on the smoothed series. albyn On Fri, Jul 24, 2009 at 06:13:00PM +0530, Yogesh Tiwari wrote: Dear R Users, If a variable, say CO2(ppm), is varying with time. Then how to calculate CO2 (ppm) growth rate /a-1 I have CO2 time series (1991-2000), as: time, year, month, day, hour, min, sec, lat, long, height, CO2 1991.476722 1991 6 24 0 5 0 -38.93 145.15 4270 353.680 1991.476741 1991 6 24 0 15 0 -39.20 145.22 4270 353.950 1991.476747 1991 6 24 0 18 0 -39.43 145.28 4270 353.510 --- 2000.740788 2000 9 28 3 5 0 -38.00 145.00 2280 366.750 2000.740794 2000 9 28 3 8 0 -38.00 145.00 1830 366.550 2000.740803 2000 9 28 3 13 0 -38.00 145.00 1220 370.550 -- Yogesh K. Tiwari (Dr.rer.nat), Scientist, Indian Institute of Tropical Meteorology, Homi Bhabha Road, Pashan, Pune-411008 INDIA Phone: 0091-99 2273 9513 (Cell) : 0091-20-25904350 (O) Fax: 0091-20-258 93 825 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] assign question
I don't think you want assign() here. x1 = rnorm(20) min(x1) [1] -0.9723398 min(eval(paste(x,1,sep=))) # not the solution [1] x1 min(eval(as.name(paste(x,1,sep= # a solution [1] -0.9723398 try: for(i in 1:27) { xener[i] - min(eval(as.name((paste(sa,i,sep=) } albyn On Mon, Jul 20, 2009 at 01:26:05PM -0500, Erin Hodgess wrote: Dear R People: I have several vectors, sa1, sa2,...sa27 of varying lengths. I want to produce one vector xener[1:27] which has the minimum of each sa[i]. I'm trying to set up a loop and use the assign statement, but here are my results: for(i in 1:27) { + xener[i] - min(assign(paste(sa,i,sep=))) + } Error in assign(paste(sa, i, sep = )) : element 2 is empty; the part of the args list of '.Internal' being evaluated was: (x, value, envir, inherits) Any suggestions would be most welcome. Thanks in advance, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] putting circles/buffers arround points on a 2D graph/plot
It sounds like you might want to draw the convex hull for each group of points. There is a package called chplot which appears to do this, though I haven't used it... albyn On Thu, Jul 16, 2009 at 06:23:54PM +0100, Ahmed, Sadia wrote: Hi, I'll try to keep my question brief and to the point, to anyone who helps 'THANK YOU!' I want to get a circle/ring/buffer or some other form of drawn line around certain points on a x,y plot, the points usually don't form a circle, often they form a 'wobbly' shape, so ideally I'd like something that follows the shape the points make. I'll start by explaining what I've got (I'm sorry if it's a little long, but it's all important for explaining what I want to do): I have a simple 2D plot of an x variable, called 'percent.forest'', and a y variable called 'metric'. the data for this plot is in a large data frame with a third variable called 'Buffer', there are 5 buffer sizes (250, 500, 1000, 3000, 1). to create the plot I simply used plot(percent.forest,metric) to add in the data about buffer size, I have extracted the percent forest and metric data by: buff250-subset(data1,data1$Buffer==250) buff500-subset(data1,data1$Buffer==500) buff1000-subset(data1,data1$Buffer==1000) buff3000-subset(data1,data1$Buffer==3000) buff1-subset(data1,data1$Buffer==1) i then used the 'points' function to plot the percent forest, metric data points in different colours according to buffer size, using this code (first 2 lines divides the data appropriately, 3rd line is points): metric.250-(log(buff250$metric)) percent.forest.250-(buff250$percent.forest) points(percent.forest.250,metric.250,col='skyblue') I repeated this for all 5 buffer sizes, so on the plot (percent.forest,metric) the points are coloured differently according to the size of the buffer used. This works fine (the next bit is the problem): Does anyone know how I can get R to draw around the different coloured points, so that i get 5 rings/wiggly lines on the plot that clearly shows the extent of each of the buffers, you can sort of see a pattern in the colours but its not as clear as it would be if I could draw around it). If anyone wants to see a picture of the plot (if my explanation isn't clear enough) I'd be happy to email one directly to you (the R instructions says that the mailing list scraps attachments so I've not put one on here) Thank you for reading this, Any help will be greatly appreciated, sadia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix inversion-different answers from LAPACK and LINPACK
As you seem to be aware, the matrix is poorly conditioned: kappa(PLLH,exact=TRUE) [1] 115868900869 It might be worth your while to think about reparametrizing. albyn On Wed, Jun 17, 2009 at 11:37:48AM -0400, avraham.ad...@guycarp.com wrote: Hello. I am trying to invert a matrix, and I am finding that I can get different answers depending on whether I set LAPACK true or false using qr. I had understood that LAPACK is, in general more robust and faster than LINPACK, so I am confused as to why I am getting what seems to be invalid answers. The matrix is ostensibly the Hessian for a function I am optimizing. I want to get the parameter correlations, so I need to invert the matrix. There are times where the standard solve(X) returns an error, but solve(qr(X, LAPACK=TRUE)) returns values. However, there are times, where the latter returns what seems to be bizarre results. For example, an example matrix is PLLH (Pareto LogLikelihood Hessian) alphatheta alpha 1144.6262175141619082 -0.01290205604828788 theta -0.012902056048 0.00155437688485563 Running plain solve (PLLH) or solve (qr(PLLH)) returns: [,1] [,2] alpha0.0137466171688024 1141.53956787721 theta 1141.5395678772131305 101228592.41439932585 However, running solve(qr(PLLH, LAPACK=TRUE)) returns: [,1] [,2] [1,]0.01374661716880240.0137466171688024 [2,] 1141.5395678772131305 1141.5395678772131305 which seems wrong as the original matrix had identical entries on the off diagonals. I am neither a programmer nor an expert in matrix calculus, so I do not understand why I should be getting different answers using different libraries to perform the ostensibly same function. I understand the extremely small value of d²X/d(theta)² (PLLH[2,2]) may be contributing to the error, but I would appreciate confirmation, or correction, from the experts on this list. Thank you very much, --Avraham Adler PS: For completeness, the QR decompositions per R under LINPACK and LAPACK are shown below: qr(PLLH) $qr alpha theta alpha -1144.6262175869414932095 0.0129020653695122277 theta-0.112768491646264 0.987863187747112 $rank [1] 2 $qraux [1] 1.993641619511209 0.987863187747112 $pivot [1] 1 2 attr(,class) [1] qr qr(PLLH, LAPACK=TRUE) $qr alpha theta alpha -1144.62621758694149320945 0.0129020653695122277 theta-0.0563842458249248 0.987863187747112 $rank [1] 2 $qraux [1] 1.993642 0.00 $pivot [1] 1 2 attr(,useLAPACK) [1] TRUE attr(,class) [1] qr __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SAGE: was large factorials
On Wed, Apr 22, 2009 at 08:26:51PM -0700, Ben Bolker wrote: ??? octave is a Matlab clone, not a Mathematica clone (I would be interested in an open source Mathematica clone ...) ??? You might take a look at Sage. It is not a mathematica clone, but open source mathematical software which has a development community similar to that of R. http://www.sagemath.org/ albyn __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] non-positive definite matrix remedies?
That's an interesting problem. My first thought was to choose the closest positive definite matrix to the given matrix, say in the least squares sense. However, the 2x2 diagonal matrix with diagonal (1,0) makes it clear that there isn't a closest pd symmetric matrix. Perhaps multiple imputation would work: impute a complete data matrix X, compute polycor(X), and repeat. Will the average of these positive definite matrices be positive definite I think it would if you were computing pearson correlations, but I am not sure about the polychoric case. albyn On Wed, Mar 11, 2009 at 04:20:27PM -0600, Matthew Keller wrote: Hi all, For computational reasons, I need to estimate an 18x18 polychoric correlation matrix two variables at a time (rather than trying to estimate them all simultaneously using ML). The resulting polychoric correlation matrix I am getting is non-positive definite, which is problematic because I'm using this matrix later on as if it were a legitimately estimated correlation matrix (in order to fit an SEM model). I could add to the diagonal I suppose until it becomes positive definite. Does anyone have any other ideas on how to deal with this problem, and what the strengths and weaknesses of different approaches are? Thanks in advance, Matt -- Matthew C Keller Asst. Professor of Psychology University of Colorado at Boulder www.matthewckeller.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cholesky Decomposition in R
try Cholesky() in package Matrix. albyn On Tue, Mar 10, 2009 at 02:33:01PM -0700, Manli Yan wrote: Hi everyone: I try to use r to do the Cholesky Decomposition,which is A=LDL',so far I only found how to decomposite A in to LL' by using chol(A),the function Cholesky(A) doesnt work,any one know other command to decomposte A in to LDL' My r code is: library(Matrix) A=matrix(c(1,1,1,1,5,5,1,5,14),nrow=3) chol(A) [,1] [,2] [,3] [1,]111 [2,]022 [3,]003 Cholesky(A) Error in function (classes, fdef, mtable) : unable to find an inherited method for function Cholesky, for signature matrix whatz wrong??? thanks~ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extract statistics from lm()
Look at the data structure produced by summary() names(summary(lm.D9)) [1] call terms residuals coefficients [5] aliased sigma dfr.squared [9] adj.r.squared fstatisticcov.unscaled Now look at the data structure for the coefficients in the summary: summary(lm.D9)$coefficients Estimate Std. Error t value Pr(|t|) (Intercept)5.032 0.2202177 22.850117 9.547128e-15 groupTrt -0.371 0.3114349 -1.191260 2.490232e-01 class(summary(lm.D9)$coefficients) [1] matrix summary(lm.D9)$coefficients[,3] (Intercept)groupTrt 22.850117 -1.191260 albyn Quoting Bogaso bogaso.christo...@gmail.com: Hi, perhaps this question was answered previously however I could not find them. My problem is how how to extract a particular statistic from the result given by lm(). For e.g. ctl - c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt - c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group - gl(2,10,20, labels=c(Ctl,Trt)) weight - c(ctl, trt) summary(lm.D9 - lm(weight ~ group)) Call: lm(formula = weight ~ group) Residuals: Min 1Q Median 3Q Max -1.0710 -0.4938 0.0685 0.2462 1.3690 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 5.0320 0.2202 22.850 9.55e-15 *** groupTrt -0.3710 0.3114 -1.1910.249 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6964 on 18 degrees of freedom Multiple R-squared: 0.07308,Adjusted R-squared: 0.02158 F-statistic: 1.419 on 1 and 18 DF, p-value: 0.249 Here I want to extract the values of t-stat, Pr(|t|) individually. Can anyone please guide me how to get them? -- View this message in context: http://www.nabble.com/Extract-statistics-from-lm%28%29-tp22265770p22265770.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Incorrect p value for binom.test?
The computation 2*sum(dbinom(c(10:25),25,0.061)) does not correspond to any reasonable definition of p-value. For a symmetric distribution, it is fine to use 2 times the tail area of one tail. For an asymetric distribution, this is silly. The standard definition given in elementary texts is usually somthing like the probability of observing a test statistic at least as extreme as the observed value or more formally as the smallest significance level at which the observed result would lead to rejection of the null hypothesis Either definition requires further decisions (what does at least as extreme mean?). In an asymetric distribution, at least as far from E(X|H0) is not a good interpretation, since deviations in one direction may be much less probable than deviations in the other direction. Peter's interpretation corresponds both to the interpretation of at least as extreme as at least as improbable, and also to the smallest significance level interpretation for the test implemented in binom.test, ie the Clopper-Pearson exact test. 2 times the upper tail area corresponds to neither. The fact that it is implemented in SAS and appears in a text do not rescue it from that fundamental failure to make sense. albyn On Thu, Feb 05, 2009 at 09:48:11PM +0100, Peter Dalgaard wrote: Michael Grant wrote: I believe the binom.test procedure is producing one tailed p values rather than the two tailed value implied by the alternative hypothesis language. A textbook and SAS both show 2*9.94e-07 = 1.988e-06 as the two tailed value. As does the R summation syntax from R below. It looks to me like the alternative hypothesis language should be revised to something like ... greater than or equal to ... Am I mistaken? Yes. Or maybe, it is a matter of definition. The problem is that (0:25)[dbinom(0:25,25,.061) = dbinom(10,25,.061)] [1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 so with R's definition of more extreme, all such values are in the upper tail. Actually, if you look at the actual distribution, I think you'll agree that it is rather difficult to define a lower tail with positive probability that corresponds to X = 10. round(dbinom(0:25,25,.061),6) [1] 0.207319 0.336701 0.262476 0.130726 0.046708 0.012744 [7] 0.002760 0.000487 0.71 0.09 0.01 0.00 [13] 0.00 0.00 0.00 0.00 0.00 0.00 [19] 0.00 0.00 0.00 0.00 0.00 0.00 [25] 0.00 0.00 In any case, you would be hard pressed to find a subset of 0:25 that has the probability that SAS and your textbook claims as the p value. M.C.Grant 2*sum(dbinom(c(10:25),25,0.061)) [1] 1.987976e-06 binom.test(10,25,0.061) Exact binomial test data: 10 and 25 number of successes = 10, number of trials = 25, p-value = 9.94e-07 alternative hypothesis: true probability of success is not equal to 0.061 95 percent confidence interval: 0.2112548 0.6133465 sample estimates: probability of success 0.4 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing for bimodal distribution
dip {diptest} is Hartigan's dip test. albyn On Tue, Feb 03, 2009 at 05:42:34PM -0500, Andrew Yee wrote: I'm not sure where to begin with this, but I was wondering if someone could refer me to an R package that would test to see if a distribution fits a bimodal distribution better than a unimodal distribution. Thanks, Andrew [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Goodness of fit for gamma distributions
it is easy to make a qqplot for the gamma; suppose that the sample parameters are 1.101 and 2.49, the data in x: plot(qgamma(ppoints(x),1.101,2.49),sort(x)) see also lattice:qqmath albyn Quoting Dan31415 d.m.mitch...@reading.ac.uk: Ah yes, that does produce a nice plot. Can i just ask what exactly it is showing. It seems to me to be a sort of Q-Q plot but with a different set of axes. Is this correct, if so do the same interpretation rules apply for this plot, i.e. departures from either end of the curve show poor fitting of the extreme data. thanks for your help Remko, its been very helpful. Dann Remko Duursma-2 wrote: It sounds like you just want to graph it though. For gammas, it's nice to graph the log of the density, because the tail is so thin and long, so you don't see much otherwise: mydata - rgamma(1, shape=1.1, rate=2.5) # now suppose you fit a gamma distribution, and get these estimated parameters: shapeest - 1.101 rateest - 2.49 h - hist(mydata, breaks=50, plot=FALSE) plot(h$mids, log(h$density)) curve(log(dgamma(x, shape=shapeest, rate=rateest)), add=TRUE) #Remko - Remko Duursma Post-Doctoral Fellow Centre for Plant and Food Science University of Western Sydney Hawkesbury Campus Richmond NSW 2753 Dept of Biological Science Macquarie University North Ryde NSW 2109 Australia Mobile: +61 (0)422 096908 On Wed, Jan 28, 2009 at 1:13 AM, Dan31415 d.m.mitch...@reading.ac.uk wrote: Thanks for that Remko, but im slightly confused because isnt this testing the goodness of fit of 2 slightly different gamma distributions, not of how well a gamma distribution is representing the data. e.g. data.vec-as.vector(data) (do some mle to find the parameters of a gamma distribution for data.vec) xrarea-seq(-2,9,0.05) yrarea-dgamma(xrarea,shape=7.9862,rate=2.6621) so now yrarea is the gamma distribution and i want to compare it with data.vec to see how well it fits. regards, Dann Remko Duursma-2 wrote: Hi Dann, there is probably a better way to do this, but this works anyway: # your data gamdat - rgamma(1, shape=1, rate=0.5) # comparison to gamma: gamsam - rgamma(1, shape=1, rate=0.6) qqplot(gamsam,gamdat) abline(0,1) greetings Remko - Remko Duursma Post-Doctoral Fellow Centre for Plant and Food Science University of Western Sydney Hawkesbury Campus Richmond NSW 2753 Dept of Biological Science Macquarie University North Ryde NSW 2109 Australia Mobile: +61 (0)422 096908 On Tue, Jan 27, 2009 at 3:38 AM, Dan31415 d.m.mitch...@reading.ac.uk wrote: I'm looking for goodness of fit tests for gamma distributions with large data sizes. I have a matrix with around 10,000 data values in it and i have fitted a gamma distribution over a histogram of the data. The problem is testing how well that distribution fits. Chi-squared seems to be used more for discrete distributions and kolmogorov-smirnov seems that large sample sizes make it had to evaluate the D statistic. Also i haven't found a qq plot for gamma, although i think this might be an appropriate test. in summary -is there a gamma goodness of fit test that doesnt depend on the sample size? -is there a way of using qqplot for gamma distributions, if so how would you calculate it from a matrix of data values? regards, Dann -- View this message in context: http://www.nabble.com/Goodness-of-fit-for-gamma-distributions-tp21668711p21668711.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Goodness-of-fit-for-gamma-distributions-tp21668711p21686095.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Goodness-of-fit-for-gamma-distributions-tp21668711p21725468.html Sent from the R help mailing list
Re: [R] Precision in R
Yes, computing WB.%*%t(WB) may be the problem, by either method. if the goal is to compute the inverse of WB%*%t(WB), you should consider computing the singular value or QR decomposition for the matrix WB. If WB = Q%*%R, where Q is orthogonal, then WB %*% t(WB) =R %*%t(R), and the inverse of WB%*%t(WB) is inv.t(R)%*% inv.R. computing (WB) %*% t(WB) squares the condition number of the matrix. This is similar to the loss of precision that occurs when you compute the variance via mean(X^2)-mean(X)^2. albyn Quoting dos Reis, Marlon marlon.dosr...@agresearch.co.nz: Dear All, I'm preparing a simple algorithm for matrix multiplication for a specific purpose, but I'm getting some unexpected results. If anyone could give a clue, I would really appreciate. Basically what I want to do is a simple matrix multiplication: (WB) %*% t(WB). The WB is in the disk so I compared to approaches: - Load 'WB' using 'read.table' (put it in WB.tmp) and then to the simple matrix multiplication WB.tmp%*%t(WB.tmp) - Scan each row of WB and do the cross products 'sum(WB.i*WB.i)' and 'sum(WB.i*WB.j)', which proper arrangement leads to WBtWB. Comparing these two matrices, I get the very similar values, however when I tried their inverse, WBtWB leads to a singular system. I've tried different tests and my conclusion is that my precision problem is related to cross products 'sum(WB.i*WB.i)' and 'sum(WB.i*WB.j)'. Does it makes sense? Thanks, Marlon WB.tmp%*%t(WB.tmp) WB.i WB.i WB.i WB.i 1916061939 2281366606 678696067 WB.i 2281366606 3098975504 1092911209 WB.i 678696067 1092911209 452399849 WBtWB [,1] [,2] [,3] [1,] 1916061939 2281366606 678696067 [2,] 2281366606 3098975504 1092911209 [3,] 678696067 1092911209 452399849 WBtWB-WB.tmp%*%t(WB.tmp) WB.i WB.i WB.i WB.i 2.861023e-06 4.768372e-07 4.768372e-07 WB.i 4.768372e-07 3.814697e-06 -2.622604e-06 WB.i 4.768372e-07 -2.622604e-06 5.960464e-08 solve(WB.tmp%*%t(WB.tmp)) WB.i WB.i WB.i WB.i -41692.80 58330.89 -78368.17 WB.i 58330.89 -81608.66 109642.09 WB.i -78368.17 109642.09 -147305.32 solve(WBtWB) Error in solve.default(WBtWB) : system is computationally singular: reciprocal condition number = 2.17737e-17 WB.tmp-NULL WBtWB-matrix(NA,n,n) for (i in 1:n) { setwd(Home.dir) WB.i-scan(WB.dat, skip = (i-1), nlines = 1) WB.tmp-rbind(WB.tmp,WB.i) WBtWB[i,i]-sum(WB.i*WB.i) if (in) { for (j in (i+1):n) { setwd(Home.dir) WB.j-scan(WB.dat, skip = (j-1), nlines = 1) WBtWB[i,j]-sum(WB.i*WB.j) WBtWB[j,i]-sum(WB.i*WB.j) } } } === Attention: The information contained in this message and...{{dropped:15}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to plot histogram plot and fitted distributions on the same graph
You are plotting the histogram in the frequency scale. A quick look at the doc page for hist() would reveal the freq option: hist(x,freq=FALSE) then you can add the densities with lines() albyn Quoting Xin Shi jasonshi...@hotmail.com: Dear: I am trying to plot the histogram graph for my observed data. Then plot fitted distribution on the same graph of histogram plot in R. 1. histogram plot y. 2. based on 1, plotting y1 v. x; 3. based on 1, plotting y2 v. x; 4. based on 1, plotting y3 v. x; All of these four plots must be on the same graph. However, I found the difficulty is that the y-axis and x-axis for histogram plot and fitted distribution plot are different. For histogram plot, y presents the frequency and x presents events. For fitted distribution plots, y presents the probability and x presents another variable. However, I found out I need histogram plot rather than barplot. This is major problem of this work. The code I used: par(font=1,font.lab=10,font.axis=6) pts18=barplot(y, ylim=c(0,0.2),xlim=c(2,52),axes=FALSE,border=TRUE,names.arg=x,col=white) axis(2,las=1) lines(spline(pts18,y1,n=300,method=natural),type=l,lty=1) lines(spline(pts18,y2,n=300,method=natural),type=l,lty=2) lines(spline(pts18,y3,n=300,method=natural),type=l,lty=5) The data are: The observed data: y-c(0.098441926, 0.166430595, 0.121813031, 0.104815864, 0.074362606, 0.075779037, 0.055949008, 0.040368272, 0.03470255, 0.029745042, 0.032577904, 0.02266289, 0.014872521, 0.014872521, 0.010623229, 0.01203966, 0.01203966, 0.008498584, 0.009206799, 0.009915014, 0.006373938, 0.003541076, 0.001416431, 0.001416431, 0.005665722, 0.002124646, 0.000708215, 0.001416431, 0.004249292, 0.002832861, 0.004957507, 0.002124646, 0.000708215, 0, 0.000708215, 0.002124646, 0.001416431, 0.001416431, 0.001416431, 0, 0.000708215) Fitted distribution 1: y1-c(0.03419162, 0.154201321, 0.129581481, 0.108892454, 0.091506645, 0.07689666, 0.064619311, 0.054302168, 0.045632264, 0.0383466, 0.032224168, 0.027079245, 0.022755763, 0.01912257, 0.016069453, 0.013503798, 0.01134, 0.009535987, 0.008013468, 0.006734034, 0.005658876, 0.004755378, 0.003996132, 0.003358108, 0.002821952, 0.002371398, 0.00199278, 0.001674612, 0.001407243, 0.001182562, 0.000993753, 0.00083509, 0.00070176, 0.000589716, 0.000495562, 0.00041644, 0.000349951, 0.000294078, 0.000247125, 0.000207669, 0.000174513) Fitted distribution 2: y2-c(0.078909441, 0.188048499, 0.117871979, 0.089827482, 0.072368317, 0.059928019, 0.050453301, 0.042948906, 0.036851702, 0.031809247, 0.027584779, 0.024010745, 0.020963795, 0.01835029, 0.016097393, 0.014147335, 0.012453559, 0.010978051, 0.009689433, 0.008561564, 0.007572497, 0.006703683, 0.005939358, 0.005266055, 0.00467, 0.004147912, 0.003684531, 0.003274633, 0.002911751, 0.00259025, 0.002305216, 0.002052353, 0.001827898, 0.001628552, 0.001451415, 0.001293939, 0.001153881, 0.001029262, 0.000918338, 0.000819567, 0.000731589) Fitted distribution 3: y3-c(0.09844545, 0.174856171, 0.1190666, 0.093021492, 0.075639902, 0.062740817, 0.052668044, 0.044568247, 0.037931599, 0.032423244, 0.027808545, 0.023915327, 0.020612892, 0.01779946, 0.015394205, 0.013331948, 0.011559483, 0.010032949, 0.008715898, 0.007577845, 0.006593146, 0.005740133, 0.005000424, 0.004358371, 0.003800615, 0.003315725, 0.002893892, 0.002526689, 0.002206859, 0.001928146, 0.001685148, 0.001473194, 0.001288243, 0.001126794, 0.00098581, 0.000862657, 0.000755047, 0.000660991, 0.000578759, 0.000506847, 0.000443945) x- c(0, 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 200) Many Thanks! Xin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Drawing from an empirical distribution
the empirical distribution gives probability 1/n to each of n observations. rather than sampling the unit interval, just resample the dataset. If x is your dataset, and you want an independent sample of size k, sample(x,size=k,replace=TRUE) albyn On Tue, Jan 06, 2009 at 02:39:17PM -0800, culpritNr1 wrote: Hi All, Does anybody know if there is a simple way to draw numbers from an empirical distribution? I know that I can plot the empirical cumulative distribution function this easy: plot(ecdf(x)) Now I want to pick a number between 0 and 1 and go back to domain of x. Sounds simple to me. Any suggestion? Thank you, Your culprit (everybody needs a culprit) -- View this message in context: http://www.nabble.com/Drawing-from-an-empirical-distribution-tp21320810p21320810.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] uniroot() problem
One can't tell for sure without seeing the function, but I'd guess that you have a numerical issue. Here is an example to reflect upon: f=function(x) (exp(x)-exp(50))*(exp(x)+exp(50)) uniroot(f,c(0,100)) $root [1] 49.7 $f.root [1] -1.640646e+39 $iter [1] 4 $estim.prec [1] 6.103516e-05 .Machine$double.eps^0.25/2 [1] 6.103516e-05 uniroot thinks it has converged, at least in relative terms. Note that the estimated precision is related to the machine epsilon, used in the default value for tol. try fiddling with the tol argument. uniroot(f,c(0,100),tol=1/10^12) $root [1] 50 $f.root [1] 1.337393e+31 $iter [1] 4 $estim.prec [1] 5.186962e-13 albyn Quoting megh megh700...@yahoo.com: I have a strange problem with uniroot() function. Here is the result : uniroot(th, c(-20, 20)) $root [1] 4.216521e-05 $f.root [1] 16.66423 $iter [1] 27 $estim.prec [1] 6.103516e-05 Pls forgive for not reproducing whole code, here my question is how f.root can be 16.66423? As it is finding root of a function, it must be near Zero. Am I missing something? -- View this message in context: http://www.nabble.com/uniroot%28%29-problem-tp21227702p21227702.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comparing distributions
On Fri, Dec 19, 2008 at 05:44:47AM -0800, kdebusk wrote: I have data sets from three different sites. None of the data sets are normally distributed and can not be log-transformed to a normal distribution. I would like to compare the data sets to see if any of the sites are similar to each other, and would like something more powerful than a non-parametric wilcoxen test. Does anyone have any suggestions on more powerful ways of comparing non-normal data sets? If it helps, the data sets follow a Gamma distribution. Thanks for all input. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If the data are gamma distributed, then a glm with family Gamma should be a good option. albyn __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.