Re: [R] aparchFit()$fitted.value
Lisa wrote: Dear R people, I'm not able to have the component residuals, fitted.value from an aparchFit() estimation as explain in the Value of aparchFit Help, package fSeries. OK, let's try together: # starting with the second example on the cited help page: temp - aparchFit(ts, order = list(alpha.lags = 1, beta.lags = c(1, 5), delta = 1), opt = list(gamma=FALSE, delta=FALSE, disparm=FALSE), doprint=FALSE) temp$fitted.values # Hmmm .. NULL str(temp) # Ahh! List of 13 $ par: num [1:4] 12.807 -1.833 -1.174 -0.687 $ value : num -51190 $ counts : Named int [1:2] 167 NA ..- attr(*, names)= chr [1:2] function gradient $ convergence: int 0 [SNIP] - attr(*, class)= chr fAPARCH Obviously, this is a documentation bug. Please send bug reports re. contributed packages to the package maintainer! I'm CCing to the maintainer, Diethelm Wuertz. Uwe Ligges Could someone help me? Thanks in advance. Lisa __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Issue with predict() for glm models
[EMAIL PROTECTED] wrote: Hello everyone, I am having a problem using the predict (or the predict.glm) function in R. Basically, I run the glm model on a training data set and try to obtain predictions for a set of new predictors from a test data set (i.e., not the predictors that were utilized to obtain the glm parameter estimates). Unfortunately, every time that I attempt this, I obtain the predictions for the predictors that were used to fit the glm model. I have looked at the R mailing list archives and don't believe I am making the same mistakes that have been made in the past and also have tried to closely follow the predict.glm example in the help file. Here is an example of what I am trying to do: set.seed(545345) # Necessary Variables # p - 2 train.n - 20 test.n - 25 mean.vec.1 - c(1,1) mean.vec.2 - c(0,0) Sigma.1 - matrix(c(1,.5,.5,1),p,p) Sigma.2 - matrix(c(1,.5,.5,1),p,p) ### # Load MASS Library # ### library(MASS) ### # Data to Parameters for Logistic Regression Model # ### train.data.1 - mvrnorm(train.n,mu=mean.vec.1,Sigma=Sigma.1) train.data.2 - mvrnorm(train.n,mu=mean.vec.2,Sigma=Sigma.2) train.class.var - as.factor(c(rep(1,train.n),rep(2,train.n))) predictors.train - rbind(train.data.1,train.data.2) ## # Test Data Where Predictions for Probabilities Using Logistic Reg. # # From Training Data are of Interest # ## test.data.1 - mvrnorm(test.n,mu=mean.vec.1,Sigma=Sigma.1) test.data.2 - mvrnorm(test.n,mu=mean.vec.2,Sigma=Sigma.2) predictors.test - rbind(test.data.1,test.data.2) ## # Run Logistic Regression on Training Data # ## log.reg - glm(train.class.var~predictors.train, family=binomial(link=logit)) Well, you haven't specified the data argument, but given the two variables directly. Exactly those variables will be used in the predict() step below! If you want the predict() step to work, use something like: train - data.frame(class = train.class.var, predictors = predictors.train) log.reg - glm(class ~ ., data = train, family=binomial(link=logit)) log.reg # log.reg #Call: glm(formula = train.class.var ~ predictors.train, family = #binomial(link = logit)) # #Coefficients: # (Intercept) predictors.train1 predictors.train2 # 0.5105-0.2945-1.0811 # #Degrees of Freedom: 39 Total (i.e. Null); 37 Residual #Null Deviance: 55.45 #Residual Deviance: 41.67AIC: 47.67 ### # Predicted Probabilities for Test Data # ### New.Data - data.frame(predictors.train1=predictors.test[,1], predictors.train2=predictors.test[,2]) logreg.pred.prob.test - predict.glm(log.reg,New.Data,type=response) logreg.pred.prob.test Instead, use: test - data.frame(predictors = predictors.test) predict(log.reg, newdata = test, type=response) note also: please call the generic predict() rather than its glm method. Uwe Ligges #logreg.pred.prob.test # [1] 0.51106406 0.15597423 0.04948404 0.03863875 0.35587589 0.71331091 # [7] 0.17320087 0.14176632 0.30966718 0.61878952 0.12525988 0.21271139 #[13] 0.70068113 0.18340723 0.10295501 0.44591568 0.72285161 0.31499339 #[19] 0.65789420 0.42750139 0.14435889 0.93008117 0.70798465 0.80109005 #[25] 0.89161472 0.47480625 0.56520952 0.63981834 0.57595189 0.60075882 #[31] 0.96493393 0.77015507 0.87643986 0.62973986 0.63043351 0.45398955 #[37] 0.80855782 0.90835588 0.54809117 0.11568637 Of course, notice that the vector for the predicted probabilities has only 40 elements, while the New.Data has 50 elements (since n.test has 25 per group for 2 groups) and thus should have 50 predicted probabilities. As it turns out, the output is for the training data predictors and not for the New.Data as I would like it to be. I should also note that I have made sure that the names for the predictors in the New.Data are the same as the names for the predictors within the glm object (i.e., within log.reg) as this is what is done in the example for predict.glm() within the help files. Could some one help me understand either what I am doing incorrectly or what problems there might be within the predict() function? I should mention that I tried the same program using predict.glm() and obtained the same problematic results. Thanks and take care, Joe Joe Rausch, M.A. Psychology Liaison Lab for Social Research 917 Flanner Hall University of Notre Dame Notre Dame, IN 46556 (574) 631-3910 If we knew what it was we were doing, it would not be called research, would it? - Albert Einstein
Re: [R] Facing problems with C code compilation - Please help.
Liaw, Andy [EMAIL PROTECTED] writes: Read and follow the instructions in c:\Program Files\R\rw1091\README.packages _very_, _very_ carefully. Stray from it even a bit and you get what you deserve. Well, not what you want, anyway... (In this case, the first problem seems to be that you need make.exe from the toolkit *and* to have it in your path.) Andy From: neha chaudhry Hello, I started using R a month ago - so I am a novice in this area. I am stuck with a problem and need some help urgently. I am using windows version of R 1.9.1. I am trying to compile C code in it. I have my C code - hello.c is lying in C:\Program Files\R\rw1091 This code is - #include R.h void hello(int *n) { int i; for(i=0;i *n; i++) { Rprintf(Hello World ! \n); } } === Code hello1.R is also lying in the same directory. This code is - hello2 - function(n) { .C(hello, as.integer)) Make that as.integer(n) or things will surely come crashing down... } -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] t test problem?
On Wednesday 22 September 2004 13:07, Ted Harding wrote: On 22-Sep-04 kan Liu wrote: Hi, Many thanks for your helpful comments and suggestions. The attached are the data in both log10 scale and original scale. It would be very grateful if you could suggest which version of test should be used. By the way, how to check whether the variation is additive (natural scale) or multiplicative (log scale) in R? How to check whether the distribution of the data is normal? As for additive vs multiplicative, this can only be judged in terms of the process by which the values are created in the real world. Just my 2 cents: I often find it helpful to ask myself (or the client) whether, if there was a difference (something) between the two samples, I/she/he thinks the appropriate model is (please, read the = as approx. equal) sample.1 = sample.2 + something [1] OR sample.1 = sample.2 * something [2] (i.e., the ratio of means is a constant: sample.1/sample.2 = something) which, by log transforming becomes log(sample.1) = log(sample.2) + log(something) I am not including here the issue of error distribution, but often times when the model for the means is like [2] the error terms are multiplicative (i.e., additive in the log scale). At least in many biological and engineering problems it is often evident whether [1] or [2] should be appropriate for the data, given what we know about the subject. Best, R. As for normality vs non-normality, an appraisal can often be made simply by looking at a histogram of the data. In your case, the commands hist(x,breaks=1*(0:100)) hist(y,breaks=1*(0:100)) indicate that the distributions of x and y do not look at all normal, since they both have considerable positive skewness (i.e. long upper tails relative to the main mass of the distribution). This does strongly suggest that a logarithmic transformation would give data which are more nearly normally distributed, as indeed is confirmed by the commands hist(log(x)) hist(log(y)) though in both cases the histograms show some irregularity compared with what you would expect from a sample from a normal distribution: the commands hist(log(x),breaks=0.2*(40:80)) hist(log(y),breaks=0.2*(40:80)) show that log(x) has an excessive peak at around 11.7, while log(y) has holes at around 11.1 and 12.1. Nevertheless, this inspection of the data shows that the use of log(x) and log(y) will come much closer to fulfilling the conditions of validity of the t test than using the raw data x and y. However, it is not merely the *normality* of each which is needed: the conditions for the usual t test also require that the two populations sampled for log(x) and log(y) should have the same standard deviations. In your case, this also turns out to be nearly enough true: sd(log(x)) [1] 0.902579 sd(log(y)) [1] 0.9314807 PS, Can I confirm that do your suggestions mean that in order to check whether there is a difference between x and y in terms of mean I need check the distribution of x and that of y in both natual and log scales and to see which present normal distribution? See above for an approach to this: the answer to your question is, in effect, yes. It could of course have happened that neither the raw nor the log scale would be satisfactory, in which case you would need to consider other possibilities. And, if the SDs had turned out to be very different, you should not use the standard t test but a variant which is adpated to the situation (e.g. the Welch test). You can, of course, also perform formal tests for skewness, for normality, and for equality of variances. Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 22-Sep-04 Time: 12:07:07 -- XFMail -- __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Best device for printing quality
Hi all; Just to ask you for your advise about what is the best way to get the best quality for graphics to be incorporated into a printed article (I'm mainly a Linux useR, but also use the windows R version) Thanks and best regards, Javier Garcia __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Best device for printing quality
Hi! I would first check what kind of graphic format the article publisher of the article accepts. If ps/pdf is ok I would use ?postscript. Otherwise I would change the journal. /E *** REPLY SEPARATOR *** On 9/24/2004 at 11:12 AM javier garcia - CEBAS wrote: Hi all; Just to ask you for your advise about what is the best way to get the best quality for graphics to be incorporated into a printed article (I'm mainly a Linux useR, but also use the windows R version) Thanks and best regards, Javier Garcia __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Dipl. bio-chem. Witold Eryk Wolski @ MPI-Moleculare Genetic Ihnestrasse 63-73 14195 Berlin'v' tel: 0049-30-83875219/ \ mail: [EMAIL PROTECTED]---W-Whttp://www.molgen.mpg.de/~wolski [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aparchFit()$fitted.value
...unfortunately I was suspicious of this. Could You give me some insight to obtain that values? Thanks a lot! Lisa __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] block statistics with POSIX classes
Kahra Hannu kahra at mpsgr.it writes: : : I have a monthly price index series x, the related return series y = diff(log (x)) and a POSIXlt date-time : variable dp. I would like to apply annual blocks to compute for example annual block maxima and mean of y. : : When studying the POSIX classes, in the first stage of the learning curve, I computed the maximum drawdown : of x: : mdd - maxdrawdown(x) : max.dd - mdd$maxdrawdown : from - as.character(dp[mdd$from]) : to - as.character(dp[mdd$to]) : from; to : [1] 2000-08-31 : [1] 2003-03-31 : that gives me the POSIX dates of the start and end of the period and suggests that I have done something correctly. : : Two questions: : (1) how to implement annual blocks and compute e.g. annual max, min and mean of y (each year's max, min, mean)? : (2) how to apply POSIX variables with the 'block' argument in gev in the evir package? : : The S+FinMetrics function aggregateSeries does the job in that module; but I do not know, how handle it in R. : My guess is that (1) is done by using the function aggregate, but how to define the 'by' argument with POSIX variables? 1. To create a ts monthly time series you specify the first month and a frequency of 12 like this. z.m - ts(rep(1:6,4), start = c(2000,1), freq = 12) z.m # Annual aggregate is done using aggregate.ts with nfreq = 1 z.y - aggregate(z.m, nfreq = 1, max) z.y # To create a POSIXct series of times using seq # (This will use GMT. Use tz= arg to ISOdate if you want current tz.) z.y.times - seq(ISOdate(2000,1,1), length = length(z.y), by = year) z.y.times 2. Have not used evir but looking at ?gev it seems you can use block = 12 if you have monthly data and want the blocks to be successive 12 month periods or you can add a POSIXct times attribute to your data as below (also see comment re tz above) and then use block = year in your gev call. attr(z.m, times) - seq(ISOdate(2000,1,1), length=length(z.m), by=month) str(z.m) # display z.m along with attribute info __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] multinomial logistic regression
On 23 Sep 2004, at 01:36, array chip wrote: Hi, how can I do multinomial logistic regression in R? I think glm() can only handle binary response variable, and polr() can only handle ordinal response variable. how to do logistic regression with multinomial response variable? Perhaps multinom() in package nnet (in bundle VR) will do what you want? Regards, David Thanks __ Y! Messenger - Communicate in real time. Download now. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] nnet with weights parameter: odd error
Dear R-users I use nnet for a classification (2 classes) problem. I use the code CVnn1, CVnn2 as described in VR. The thing I changed to the code is: I define the (class) weight for each observation in each cv 'bag' and give the vector of weights as parameter of nnet(..weights = weight.vector...) Unfortunately I get an error during some (but not all!) inner-fold cv runs: Error in model.frame(formula, rownames, variables, varnames, extras, extranames, : variable lengths differ If you just remove the weights parameter in nnet() it runs fine!! I debugged the code but could not resolve the problem- it is really very strange and I need your help! I tried it very simple in defining the weights as = 1 for each obs (as it is by default)!: CVnn2 - function(formula, data, size = c(0,4,4,10,10), lambda = c(0, rep(c(0.001, 0.01),2)), nreps = 1, nifold = 5, verbose = 99, ...) { resmatrix - function(predict.matrix, learn, data, ri, i) { rae.matrix - predict.matrix rae.matrix[,] - 0 rae.vector - as.numeric(as.factor((predict(learn, data[ri == i,], type = class for (k in 1:dim(rae.matrix)[1]) { if (rae.vector[k] == 1) rae.matrix[k,1] - rae.matrix[k,1] + 1 else rae.matrix[k,2] - rae.matrix[k,2] + 1 } rae.matrix } CVnn1 - function(formula, data, nreps=1, ri, verbose, ...) { totalerror - 0 truth - data[,deparse(formula[[2]])] res - matrix(0, nrow(data), length(levels(truth))) if(verbose 20) cat( inner fold) for (i in sort(unique(ri))) { if(verbose 20) cat( , i, sep=) data.training - data[ri != i,]$GROUP weight.vector - rep(1, dim(data[ri !=i,])[1]) for(rep in 1:nreps) { learn - nnet(formula, data[ri !=i,], weights = weight.vector, trace = F, ...) #res[ri == i,] - res[ri == i,] + predict(learn, data[ri == i,]) res[ri == i,] - res[ri == i,] + resmatrix(res[ri == i,], learn, data, ri, i) } } if(verbose 20) cat(\n) sum(as.numeric(truth) != max.col(res/nreps)) } truth - data[,deparse(formula[[2]])] res - matrix(0, nrow(data), length(levels(truth))) choice - numeric(length(lambda)) for (i in sort(unique(rand))) { if(verbose 0) cat(fold , i,\n, sep=) set.seed(i*i) ri - sample(nifold, sum(rand!=i), replace=T) for(j in seq(along=lambda)) { if(verbose 10) cat( size =, size[j], decay =, lambda[j], \n) choice[j] - CVnn1(formula, data[rand != i,], nreps=nreps, ri=ri, size=size[j], decay=lambda[j], verbose=verbose, ...) } decay - lambda[which.is.max(-choice)] csize - size[which.is.max(-choice)] if(verbose 5) cat( #errors:, choice, ) # if(verbose 1) cat(chosen size = , csize, decay = , decay, \n, sep=) for(rep in 1:nreps) { data.training - data[rand != i,]$GROUP weight.vector - rep(1, dim(data[rand !=i,])[1]) learn - nnet(formula, data[rand != i,], weights = weight.vector, trace=F, size=csize, decay=decay, ...) #res[rand == i,] - res[rand == i,] + predict(learn, data[rand == i,]) res[rand == i,] - res[rand == i,] + resmatrix(res[rand == i,],learn,data, rand, i) } } factor(levels(truth)[max.col(res/nreps)], levels = levels(truth)) } res.nn2 - CVnn2(GROUP ~ ., rae.data.subsetted1, skip = T, maxit = 500, nreps = cv.repeat) con(true = rae.data.subsetted$GROUP, predicted = res.nn2) ### Coordinates: platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major1 minor9.1 year 2004 month06 day 21 language R Thanks a lot Best regards Christoph -- Christoph LehmannPhone: ++41 31 930 93 83 Department of Psychiatric NeurophysiologyMobile: ++41 76 570 28 00 University Hospital of Clinical Psychiatry Fax:++41 31 930 99 61 Waldau[EMAIL PROTECTED] CH-3000 Bern 60 http://www.puk.unibe.ch/cl/pn_ni_cv_cl_03.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] multinomial logistic regression
hi all, you could have look at the library(VGAM). http://www.stat.auckland.ac.nz/~yee/VGAM/ hope this helps P.BADY At 17:36 22/09/2004 -0700, array chip wrote: Hi, how can I do multinomial logistic regression in R? I think glm() can only handle binary response variable, and polr() can only handle ordinal response variable. how to do logistic regression with multinomial response variable? Thanks __ Y! Messenger - Communicate in real time. Download now. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Pierre BADY °) Université Claude Bernard Lyon 1 UMR CNRS 5023, LEHF bat Alphonse Forel 43 boulevard du 11 novembre 1918 F-69622 VILLEURBANNE CEDEX FRANCE TEL : +33 (0)4 72 44 62 34 FAX : +33 (0)4 72 43 28 92 MEL : [EMAIL PROTECTED] http://limnologie.univ-lyon1.fr [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ordered probit and cauchit
roger koenker wrote: What is the current state of the R-art for ordered probit models, and more esoterically is there any available R strategy for ordered cauchit models, i.e. ordered multinomial alternatives with a cauchy link function. MCMC is an option, obviously, but for a univariate latent variable model this seems to be overkill... standard mle methods should be preferable. (??) Googling reveals that spss provides such functions... just to wave a red flag. I find polr(MASS) Ordered Logistic or Probit Regression MCMCoprobit(MCMCpack) Markov chain Monte Carlo for Ordered Probit Regression and in Jim Lindsey's gnlm there is nordr Nonlinear Ordinal Regression ordglm Generalized Linear Ordinal Regression Kjetil -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] FYI: DeveloperWorks News article on R
First of a 3 part series, apparently: http://www-106.ibm.com/developerworks/linux/library/l-r1/?ca=dgr-lnxw02aAre http://www-106.ibm.com/developerworks/linux/library/l-r1/?ca=dgr-lnxw02aAre (Apologies if someone has already posted this link to R-help) Best Regards, Bill Bill Pikounis, Ph.D. Biometrics Research Department Merck Research Laboratories PO Box 2000, MailDrop RY33-300 126 E. Lincoln Avenue Rahway, New Jersey 07065-0900 USA Phone: 732 594 3913 Fax: 732 594 1565 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] [Job Ad] Position at Merck Research Laboratories, NJ USA
Please direct *all* inquiries to Vladimir Svetnik, the hiring manager. His contact information is below. (Also, please accept my apologies, for cross-posting to those subscribed to both R-help and S-news, and for any offense since this is a repost I made of the same position month ago. That short time spacing will not be repeated.) Thanks, Bill Job description: Computational statistician/biometrician The Biometrics Research Department at Merck Research Laboratories, Merck Co., Inc. in Rahway, NJ, is seeking a highly motivated statistician/data analyst to work in its basic research and drug discovery area. The applicant should have broad expertise in statistical computing. Experience and/or education relevant to signal processing, image processing, pattern recognition and machine learning are preferred. The position will involve providing statistical, mathematical, and software development support for one or more of following areas: medical imaging, biological signal analysis, and computational chemistry. We are looking for a Ph.D. with a background and/or post-doctoral experience in at least one of the following fields: Statistics, Electrical/Computer or Biomedical Engineering, Computer Science, Applied Mathematics, or Physics. Advanced computer programming skills (including, but not limited to R, S-PLUS, Matlab, C/C++) and excellent communication skills are essential. An ability to lead statistical analysis efforts within a multidisciplinary team is required. The position may also involve general statistical consulting and training. Our dedication to delivering quality medicines in innovative ways and our commitment to bringing out the best in our people are just some of the reasons why we're ranked among Fortune magazine's 100 Best Companies to Work for in America. We offer a competitive salary, an outstanding benefits package, and a professional work environment with a company known for scientific excellence. To apply, please forward your CV or resume and cover letter to ATTENTION: Open Position Vladimir Svetnik, Ph.D. Biometrics Research Dept. Merck Research Laboratories RY33-300 126 E. Lincoln Avenue Rahway, NJ 07065-0900 [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Issue with predict() for glm models
Dear Uwe, Unless I've somehow messed this up, as I mentioned yesterday, what you suggest doesn't seem to work when the predictor is a matrix. Here's a simplified example: X - matrix(rnorm(200), 100, 2) y - (X %*% c(1,2) + rnorm(100)) 0 dat - data.frame(y=y, X=X) mod - glm(y ~ X, family=binomial, data=dat) new - data.frame(X = matrix(rnorm(20),2)) predict(mod, new) 12345 6 1.81224443 -5.92955128 1.98718051 -10.05331521 2.6506 -2.50635812 789 10 11 12 5.63728698 -0.94845276 -3.61657377 -1.63141320 5.03417372 1.80400271 13 14 15 16 17 18 9.32876273 -5.32723406 5.29373023 -3.90822713 -10.95065186 4.90038016 . . . 97 98 99 100 -6.92509812 0.59357486 -1.17205723 0.04209578 Note that there are 100 rather than 10 predicted values. But with individuals predictors (rather than a matrix), x1 - X[,1] x2 - X[,2] dat.2 - data.frame(y=y, x1=x1, x2=x2) mod.2 - glm(y ~ x1 + x2, family=binomial, data=dat.2) new.2 - data.frame(x1=rnorm(10), x2=rnorm(10)) predict(mod.2, new.2) 1 2 3 4 5 6 7 6.5723823 0.6356392 4.0291018 -4.7914650 2.1435485 -3.1738096 -2.8261585 8 9 10 -1.5255329 -4.7087592 4.0619290 works as expected (?). Regards, John -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Uwe Ligges Sent: Thursday, September 23, 2004 1:33 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [R] Issue with predict() for glm models [EMAIL PROTECTED] wrote: Hello everyone, I am having a problem using the predict (or the predict.glm) function in R. Basically, I run the glm model on a training data set and try to obtain predictions for a set of new predictors from a test data set (i.e., not the predictors that were utilized to obtain the glm parameter estimates). Unfortunately, every time that I attempt this, I obtain the predictions for the predictors that were used to fit the glm model. I have looked at the R mailing list archives and don't believe I am making the same mistakes that have been made in the past and also have tried to closely follow the predict.glm example in the help file. Here is an example of what I am trying to do: set.seed(545345) # Necessary Variables # p - 2 train.n - 20 test.n - 25 mean.vec.1 - c(1,1) mean.vec.2 - c(0,0) Sigma.1 - matrix(c(1,.5,.5,1),p,p) Sigma.2 - matrix(c(1,.5,.5,1),p,p) ### # Load MASS Library # ### library(MASS) ### # Data to Parameters for Logistic Regression Model # ### train.data.1 - mvrnorm(train.n,mu=mean.vec.1,Sigma=Sigma.1) train.data.2 - mvrnorm(train.n,mu=mean.vec.2,Sigma=Sigma.2) train.class.var - as.factor(c(rep(1,train.n),rep(2,train.n))) predictors.train - rbind(train.data.1,train.data.2) ## # Test Data Where Predictions for Probabilities Using Logistic Reg. # # From Training Data are of Interest # ## test.data.1 - mvrnorm(test.n,mu=mean.vec.1,Sigma=Sigma.1) test.data.2 - mvrnorm(test.n,mu=mean.vec.2,Sigma=Sigma.2) predictors.test - rbind(test.data.1,test.data.2) ## # Run Logistic Regression on Training Data # ## log.reg - glm(train.class.var~predictors.train, family=binomial(link=logit)) Well, you haven't specified the data argument, but given the two variables directly. Exactly those variables will be used in the predict() step below! If you want the predict() step to work, use something like: train - data.frame(class = train.class.var, predictors = predictors.train) log.reg - glm(class ~ ., data = train, family=binomial(link=logit)) log.reg # log.reg #Call: glm(formula = train.class.var ~ predictors.train, family = #binomial(link = logit)) # #Coefficients: # (Intercept) predictors.train1 predictors.train2 # 0.5105-0.2945-1.0811 # #Degrees of Freedom: 39 Total (i.e. Null); 37 Residual #Null Deviance: 55.45 #Residual Deviance: 41.67AIC: 47.67 ### # Predicted Probabilities for Test Data # ### New.Data - data.frame(predictors.train1=predictors.test[,1], predictors.train2=predictors.test[,2]) logreg.pred.prob.test -
Re: [R] Issue with predict() for glm models
John Fox wrote: Dear Uwe, Unless I've somehow messed this up, as I mentioned yesterday, what you suggest doesn't seem to work when the predictor is a matrix. Here's a simplified example: X - matrix(rnorm(200), 100, 2) y - (X %*% c(1,2) + rnorm(100)) 0 dat - data.frame(y=y, X=X) mod - glm(y ~ X, family=binomial, data=dat) new - data.frame(X = matrix(rnorm(20),2)) predict(mod, new) Dear John, the questioner had a 2 column matrix with 40 and one with 50 observations (not a 100 column matrix with 2 observation) and for those matrices it works ... Best, Uwe 12345 6 1.81224443 -5.92955128 1.98718051 -10.05331521 2.6506 -2.50635812 789 10 11 12 5.63728698 -0.94845276 -3.61657377 -1.63141320 5.03417372 1.80400271 13 14 15 16 17 18 9.32876273 -5.32723406 5.29373023 -3.90822713 -10.95065186 4.90038016 . . . 97 98 99 100 -6.92509812 0.59357486 -1.17205723 0.04209578 Note that there are 100 rather than 10 predicted values. But with individuals predictors (rather than a matrix), x1 - X[,1] x2 - X[,2] dat.2 - data.frame(y=y, x1=x1, x2=x2) mod.2 - glm(y ~ x1 + x2, family=binomial, data=dat.2) new.2 - data.frame(x1=rnorm(10), x2=rnorm(10)) predict(mod.2, new.2) 1 2 3 4 5 6 7 6.5723823 0.6356392 4.0291018 -4.7914650 2.1435485 -3.1738096 -2.8261585 8 9 10 -1.5255329 -4.7087592 4.0619290 works as expected (?). Regards, John -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Uwe Ligges Sent: Thursday, September 23, 2004 1:33 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [R] Issue with predict() for glm models [EMAIL PROTECTED] wrote: Hello everyone, I am having a problem using the predict (or the predict.glm) function in R. Basically, I run the glm model on a training data set and try to obtain predictions for a set of new predictors from a test data set (i.e., not the predictors that were utilized to obtain the glm parameter estimates). Unfortunately, every time that I attempt this, I obtain the predictions for the predictors that were used to fit the glm model. I have looked at the R mailing list archives and don't believe I am making the same mistakes that have been made in the past and also have tried to closely follow the predict.glm example in the help file. Here is an example of what I am trying to do: set.seed(545345) # Necessary Variables # p - 2 train.n - 20 test.n - 25 mean.vec.1 - c(1,1) mean.vec.2 - c(0,0) Sigma.1 - matrix(c(1,.5,.5,1),p,p) Sigma.2 - matrix(c(1,.5,.5,1),p,p) ### # Load MASS Library # ### library(MASS) ### # Data to Parameters for Logistic Regression Model # ### train.data.1 - mvrnorm(train.n,mu=mean.vec.1,Sigma=Sigma.1) train.data.2 - mvrnorm(train.n,mu=mean.vec.2,Sigma=Sigma.2) train.class.var - as.factor(c(rep(1,train.n),rep(2,train.n))) predictors.train - rbind(train.data.1,train.data.2) ## # Test Data Where Predictions for Probabilities Using Logistic Reg. # # From Training Data are of Interest # ## test.data.1 - mvrnorm(test.n,mu=mean.vec.1,Sigma=Sigma.1) test.data.2 - mvrnorm(test.n,mu=mean.vec.2,Sigma=Sigma.2) predictors.test - rbind(test.data.1,test.data.2) ## # Run Logistic Regression on Training Data # ## log.reg - glm(train.class.var~predictors.train, family=binomial(link=logit)) Well, you haven't specified the data argument, but given the two variables directly. Exactly those variables will be used in the predict() step below! If you want the predict() step to work, use something like: train - data.frame(class = train.class.var, predictors = predictors.train) log.reg - glm(class ~ ., data = train, family=binomial(link=logit)) log.reg # log.reg #Call: glm(formula = train.class.var ~ predictors.train, family = #binomial(link = logit)) # #Coefficients: # (Intercept) predictors.train1 predictors.train2 # 0.5105-0.2945-1.0811 # #Degrees of Freedom: 39 Total (i.e. Null); 37 Residual #Null Deviance: 55.45 #Residual Deviance: 41.67AIC: 47.67 ### # Predicted Probabilities for Test Data # ### New.Data - data.frame(predictors.train1=predictors.test[,1], predictors.train2=predictors.test[,2]) logreg.pred.prob.test -
Re: [R] block statistics with POSIX classes
I am not sure that I understand what you are looking for since you did not provide any sample data with corresponding output to clarify your question. Here is another guess. If y is just a numeric vector with monthly data and dp is a POSIXlt vector of the same length then: aggregate(list(y=y), list(dp$year), mean)$y will perform aggregation, as will aggregate(ts(y, start=c(d$year[1],d$mon[1]+1), freq = 12), nfreq=1, mean) which converts y to ts and then performs the aggregation. The first one will work even if y is irregular while the second one assumes that y must be monthly. The second one returns a ts object. By the way, I had a look at gev's source and it seems that despite the documentation it does not use POSIXct anywhere internally. If the block is year or other character value then it simply assumes that whatever datetime class is used has an as.POSIXlt method. If your dates were POSIXct rather than POSIXlt then it would be important to ensure that whatever timezone is assumed (which I did not check) in the conversion is the one you are using. You could use character dates or Date class to avoid this problem. Since you appear to be using POSIXlt datetimes from the beginning I think you should be ok. Kahra Hannu kahra at mpsgr.it writes: : : Thank you Petr and Gabor for the answers. : : They did not, however, solve my original problem. When I have a monthly time series y with a POSIX date : variable dp, the most obvious way to compute e.g. the annual means is to use aggregate(y, 1, mean) that : works with time series. However, I got stuck with the idea of using the 'by' argument as by = dp$year. But in : that case y has to be a data.frame. The easiest way must be the best way. : : Regards, : Hannu : : -Original Message- : From: r-help-bounces at stat.math.ethz.ch : [mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Gabor Grothendieck : Sent: Thursday, September 23, 2004 12:56 PM : To: r-help at stat.math.ethz.ch : Subject: Re: [R] block statistics with POSIX classes : : : Kahra Hannu kahra at mpsgr.it writes: : : : : : I have a monthly price index series x, the related return series y = diff (log : (x)) and a POSIXlt date-time : : variable dp. I would like to apply annual blocks to compute for example : annual block maxima and mean of y. : : : : When studying the POSIX classes, in the first stage of the learning curve, I : computed the maximum drawdown : : of x: : : mdd - maxdrawdown(x) : : max.dd - mdd$maxdrawdown : : from - as.character(dp[mdd$from]) : : to - as.character(dp[mdd$to]) : : from; to : : [1] 2000-08-31 : : [1] 2003-03-31 : : that gives me the POSIX dates of the start and end of the period and : suggests that I have done something correctly. : : : : Two questions: : : (1) how to implement annual blocks and compute e.g. annual max, min and mean : of y (each year's max, min, mean)? : : (2) how to apply POSIX variables with the 'block' argument in gev in the : evir package? : : : : The S+FinMetrics function aggregateSeries does the job in that module; but I : do not know, how handle it in R. : : My guess is that (1) is done by using the function aggregate, but how to : define the 'by' argument with POSIX variables? : : 1. To create a ts monthly time series you specify the first month : and a frequency of 12 like this. : : z.m - ts(rep(1:6,4), start = c(2000,1), freq = 12) : z.m : : # Annual aggregate is done using aggregate.ts with nfreq = 1 : z.y - aggregate(z.m, nfreq = 1, max) : z.y : : # To create a POSIXct series of times using seq : # (This will use GMT. Use tz= arg to ISOdate if you want current tz.) : z.y.times - seq(ISOdate(2000,1,1), length = length(z.y), by = year) : z.y.times : : 2. Have not used evir but looking at ?gev it seems you can : use block = 12 if you have monthly data and want the blocks to be : successive 12 month periods or you can add a POSIXct times attribute to : your data as below (also see comment re tz above) and then use : block = year in your gev call. : : attr(z.m, times) - seq(ISOdate(2000,1,1), length=length(z.m), by=month) : str(z.m) # display z.m along with attribute info : : __ : R-help at stat.math.ethz.ch mailing list : https://stat.ethz.ch/mailman/listinfo/r-help : PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html : : __ : R-help at stat.math.ethz.ch mailing list : https://stat.ethz.ch/mailman/listinfo/r-help : PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html : : __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] gsub
Hi A while back I used gsub to do the following temp-000US00231 gsub(something here, , temp) 00231 I think it involved the `meta characters' somehow. I do not know how to do this anymore. I know strsplit will also work but I remember gsub was much faster. In essence the question is how to delete all characters before a particular pattern. If anyone has some help file for this, it will be greatly appreciated. Jean Eid __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub
You might want to look at ?regex. Sean On Sep 23, 2004, at 10:03 AM, Jean Eid wrote: Hi A while back I used gsub to do the following temp-000US00231 gsub(something here, , temp) 00231 I think it involved the `meta characters' somehow. I do not know how to do this anymore. I know strsplit will also work but I remember gsub was much faster. In essence the question is how to delete all characters before a particular pattern. If anyone has some help file for this, it will be greatly appreciated. Jean Eid __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] nnet and weights: error analysis using VR example
Dear R-users, dear Prof. Ripley as package maintainer I tried to investigate the odd error, when I call nnet together with a 'weights' parameter, using the 'fgl' example in VR p 348 The error I get is: Error in eval(expr, envir, enclos) : Object w not found I think it is a kind of scoping problem, but I really cannot see, what the problem exactly is. and here is my code: the only thing which changed is the definition of a weight-parameter ('w') which is given to the nnet-call. Of course the weight vector with '1's makes no sense here, but it will be defined according to the class sizes later. ### library(MASS) data(flg) con - function(...) { print(tab - table(...)) diag(tab) - 0 cat(error rate = , round(100*sum(tab)/length(list(...)[[1]]), 2), %\n) invisible() } set.seed(123) rand - sample(10, dim(fgl)[1], replace = T) fgl1 - fgl fgl1[1:9] - lapply(fgl[, 1:9], function(x) {r - range(x); (x - r[1])/diff(r)}) CVnn2 - function(formula, data, size = c(0,4,4,10,10), lambda = c(0, rep(c(0.001, 0.01),2)), nreps = 1, nifold = 5, verbose = 99, ...) { CVnn1 - function(formula, data, nreps=1, ri, verbose, ...) { totalerror - 0 truth - data[,deparse(formula[[2]])] res - matrix(0, nrow(data), length(levels(truth))) if(verbose 20) cat( inner fold) for (i in sort(unique(ri))) { if(verbose 20) cat( , i, sep=) data.training - data[ri != i,]$GROUP w - rep(1, dim(data[ri !=i,])[1]) for(rep in 1:nreps) { learn - nnet(formula, data[ri !=i,], weights = w, trace = F, ...) res[ri == i,] - res[ri == i,] + predict(learn, data[ri == i,]) } } if(verbose 20) cat(\n) sum(as.numeric(truth) != max.col(res/nreps)) } truth - data[,deparse(formula[[2]])] res - matrix(0, nrow(data), length(levels(truth))) choice - numeric(length(lambda)) for (i in sort(unique(rand))) { if(verbose 0) cat(fold , i,\n, sep=) set.seed(i*i) ri - sample(nifold, sum(rand!=i), replace=T) for(j in seq(along=lambda)) { if(verbose 10) cat( size =, size[j], decay =, lambda[j], \n) choice[j] - CVnn1(formula, data[rand != i,], nreps=nreps, ri=ri, size=size[j], decay=lambda[j], verbose=verbose, ...) } decay - lambda[which.is.max(-choice)] csize - size[which.is.max(-choice)] if(verbose 5) cat( #errors:, choice, ) # if(verbose 1) cat(chosen size = , csize, decay = , decay, \n, sep=) for(rep in 1:nreps) { data.training - data[rand != i,]$GROUP w - rep(1, dim(data[rand !=i,])[1]) learn - nnet(formula, data[rand != i,], weights = w, trace=F, size=csize, decay=decay, ...) res[rand == i,] - res[rand == i,] + predict(learn, data[rand == i,]) } } factor(levels(truth)[max.col(res/nreps)], levels = levels(truth)) } res.nn2 - CVnn2(type ~ ., fgl1, skip = T, maxit = 500, nreps = 10) con(true = fgl$type, predicted = res.nn2) ## many thanks for your help Christoph ### Coordinates: platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major1 minor9.1 year 2004 month06 day 21 language R -- Christoph LehmannPhone: ++41 31 930 93 83 Department of Psychiatric NeurophysiologyMobile: ++41 76 570 28 00 University Hospital of Clinical Psychiatry Fax:++41 31 930 99 61 Waldau[EMAIL PROTECTED] CH-3000 Bern 60 http://www.puk.unibe.ch/cl/pn_ni_cv_cl_03.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] detection of outliers
Hi, this is both a statistical and a R question... what would the best way / test to detect an outlier value among a series of 10 to 30 values ? for instance if we have the following dataset: 10,11,12,15,20,22,25,30,500 I d like to have a way to identify the last data as an outlier (only one direction). One way would be to calculate abs(mean - median) and if elevated (to what extent ?) delete the extreme data then redo.. but is it valid to do so with so few data ? is the (trimmed mean - mean) more efficient ? if so, what would be the maximal tolerable value to use as a threshold ? (I guess it will be experiment dependent...) tests for skweness will probably required a larger dataset ? any suggestions are very welcome ! thanks for your help Philippe Guardiola, MD __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Issue with predict() for glm models
Dear Uwe, -Original Message- From: Uwe Ligges [mailto:[EMAIL PROTECTED] Sent: Thursday, September 23, 2004 8:06 AM To: John Fox Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [R] Issue with predict() for glm models John Fox wrote: Dear Uwe, Unless I've somehow messed this up, as I mentioned yesterday, what you suggest doesn't seem to work when the predictor is a matrix. Here's a simplified example: X - matrix(rnorm(200), 100, 2) y - (X %*% c(1,2) + rnorm(100)) 0 dat - data.frame(y=y, X=X) mod - glm(y ~ X, family=binomial, data=dat) new - data.frame(X = matrix(rnorm(20),2)) predict(mod, new) Dear John, the questioner had a 2 column matrix with 40 and one with 50 observations (not a 100 column matrix with 2 observation) and for those matrices it works ... Indeed, and in my example the matrix predictor X has 2 columns and 100 rows; I did screw up the matrix for the new data to be used for predictions (in the example I sent today but not yesterday), but even when this is done right -- where the new data has 10 rows and 2 columns -- there are 100 (not 10) predicted values: X - matrix(rnorm(200), 100, 2) # original predictor matrix with 100 rows y - (X %*% c(1,2) + rnorm(100)) 0 dat - data.frame(y=y, X=X) mod - glm(y ~ X, family=binomial, data=dat) new - data.frame(X = matrix(rnorm(20),10, 2)) # corrected -- note 10 rows predict(mod, new) # note 100 predicted values 12345 6 5.75238091 0.31874587 -3.00515893 -3.77282121 -1.97511221 0.54712914 789 10 11 12 1.85091226 4.38465524 -0.41028694 -1.53942869 0.57613555 -1.82761518 . . . 91 92 93 94 95 96 0.36210780 1.71358713 -9.63612775 -4.54257576 -5.29740468 2.64363405 97 98 99 100 -4.45478627 -2.44973209 2.51587537 -4.09584837 Actually, I now see the source of the problem: The data frames dat and new don't contain a matrix named X; rather the matrix is split columnwise: names(dat) [1] y X.1 X.2 names(new) [1] X.1 X.2 Consequently, both glm and predict pick up the X in the global environment (since there is none in dat or new), which accounts for why there are 100 predicted values. Using list() rather than data.frame() produces the originally expected behaviour: new - list(X = matrix(rnorm(20),10, 2)) predict(mod, new) 1 2 3 4 5 6 7 5.9373064 0.3687360 -8.3793045 0.7645584 -2.6773842 2.4130547 0.7387318 8 9 10 -0.4347916 8.4678728 -0.8976054 Regards, John Best, Uwe 12345 6 1.81224443 -5.92955128 1.98718051 -10.05331521 2.6506 -2.50635812 789 10 11 12 5.63728698 -0.94845276 -3.61657377 -1.63141320 5.03417372 1.80400271 13 14 15 16 17 18 9.32876273 -5.32723406 5.29373023 -3.90822713 -10.95065186 4.90038016 . . . 97 98 99 100 -6.92509812 0.59357486 -1.17205723 0.04209578 Note that there are 100 rather than 10 predicted values. But with individuals predictors (rather than a matrix), x1 - X[,1] x2 - X[,2] dat.2 - data.frame(y=y, x1=x1, x2=x2) mod.2 - glm(y ~ x1 + x2, family=binomial, data=dat.2) new.2 - data.frame(x1=rnorm(10), x2=rnorm(10)) predict(mod.2, new.2) 1 2 3 4 5 6 7 6.5723823 0.6356392 4.0291018 -4.7914650 2.1435485 -3.1738096 -2.8261585 8 9 10 -1.5255329 -4.7087592 4.0619290 works as expected (?). Regards, John -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Uwe Ligges Sent: Thursday, September 23, 2004 1:33 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [R] Issue with predict() for glm models [EMAIL PROTECTED] wrote: Hello everyone, I am having a problem using the predict (or the predict.glm) function in R. Basically, I run the glm model on a training data set and try to obtain predictions for a set of new predictors from a test data set (i.e., not the predictors that were utilized to obtain the glm parameter estimates). Unfortunately, every time that I attempt this, I obtain the predictions for the predictors that were used to fit the glm model. I have looked at the R mailing list archives and don't believe I am making the same mistakes that have been made in the past and also have tried to closely follow the predict.glm example in the help file. Here is an example of
Re: [R] gsub
Jean Eid jeaneid at chass.utoronto.ca writes: : : Hi : : A while back I used gsub to do the following : : temp-000US00231 : gsub(something here, , temp) : 00231 : : I think it involved the `meta characters' somehow. : : I do not know how to do this anymore. I know strsplit will also work but I : remember gsub was much faster. In essence the question is how to delete : all characters before a particular pattern. : : If anyone has some help file for this, it will be greatly appreciated. : I think you want sub in this case, not gsub. There are many possibilities here depending on what the general case is. The following all give the desired result for the example but their general cases differ. These are just some of the numerous variations possible. temp-000US00231 sub(.*US, , temp) sub(.*S, , temp) sub([[:digit:]]*[[:alpha:]]*, , temp) sub(.*[[:alpha:]], , temp) sub(.*[[:alpha:]][[:alpha:]], , temp) sub(.*[[:upper:]], , temp) sub(.*[[:upper:]][[:upper:]], , temp) sub(., , temp) substring(temp, 6) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Issue with predict() for glm models
Thanks to John Fox, Andy Liaw, and Uwe Ligges for their help with my problem regarding the use of the predict() function to obtain predictions for a new set of predictor values. It appears that the bottom line (at least for my purposes) is that the names and the setup for the data of the predictors in the glm and the new data need to be consistent. The safest way that I know to do this from reading John, Andy, and Uwe's responses is to label each predictor separately and place them into the glm model separately. Then, when creating a new data frame to utilize in the predict() function, ensure to consistently name the predictors. For illustrative examples, see the reply emails of John, Andy, and Uwe. Thanks again, Joe Joe Rausch, M.A. Psychology Liaison Lab for Social Research 917 Flanner Hall University of Notre Dame Notre Dame, IN 46556 (574) 631-3910 www.nd.edu/~jrausch If we knew what it was we were doing, it would not be called research, would it? - Albert Einstein __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] detection of outliers
Phguardiol at aol.com writes: : : Hi, : this is both a statistical and a R question... : what would the best way / test to detect an outlier value among a series of 10 to 30 values ? for instance if we : have the following dataset: 10,11,12,15,20,22,25,30,500 I d like to have a way to identify the last data : as an outlier (only one direction). One way would be to calculate abs(mean - median) and if elevated (to : what extent ?) delete the extreme data then redo.. but is it valid to do so with so few data ? is the (trimmed : mean - mean) more efficient ? if so, what would be the maximal tolerable value to use as a threshold ? (I guess : it will be experiment dependent...) tests for skweness will probably required a larger dataset ? : any suggestions are very welcome ! : thanks for your help : Philippe Guardiola, MD If z is your vector the following all detect outliers: boxplot(z) # will show the outlier plot(lm(z ~ 1)) # the various plots show this as well require(car) outlier.test(lm(z ~ 1)) # tests most extreme value __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Problem with R 1.9.1
Hi, Could someone help me ? When I´m running the following command in the new R 1.9.1 version the error message appear. It didn´t happen on R 1.8.0 glm(indenizacao/sinistros ~ sexo + plano + faixa, family=Gamma(link=log), weights=sinistros) Error in x[good, ] * w : non-conformable arrays [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Gridbase basic question
All, I have a simple plot(x,y) and I would like to then insert rectangles of some length (in native coordinates) and height fixed to 0.5 in native coordinates. I can't quite get the code right to do this. Can anyone give me a quick example of how to do this? I looked the gridBase index and the tutorial (from R-news?) but just haven't gotten it down yet. plot(1:10,1:10) par(new=T);vps - baseViewports() pushViewport(vps$inner,vps$figure,vps$plot) viewport[GRID.VP.28] pushViewport(viewport(x=unit(1,native),y=unit(2,native))) viewport[GRID.VP.29] grid.rect(height=unit(0.5,native),width=unit(1.5,native),just='botto m') This draws a very large rectangle going from 2 to 7 (y) and to 8 (x). Thanks, Sean __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] R vs EViews - serial correlation
Dear all, I met with some problems when dealing with a time series with serial correlation. FIRST, I generate a series with correlated errors set.seed(1) x=1:50 y=x+arima.sim(n = 50, list(ar = c(0.47))) SECOND, I estimate three constants (a, b and rho) in the model Y=a+b*X+u, where u=rho*u(-1)+eps library(nlme) gls(y~x,correlation = corAR1(0.5)) # Is it the right procedure? Coefficients: (Intercept) x 0.1410465 1.0023341 Correlation Structure: AR(1) Formula: ~1 Parameter estimate(s): Phi 0.440594 Degrees of freedom: 50 total; 48 residual Residual standard error: 0.9835158 THIRD, I do the same procedure with EViews as LS Y C X AR(1) and get Y = 0.1375 + 1.0024*X + [AR(1)=0.3915] My problem is actually connected with the fitting procedure. As far as I understand gls(y~x,correlation = corAR1(0.5))$fit is obtained through the linear equation 0.1410+1.0023*X while in EViews through the nonlinear equation Y=rho*Y(-1) + (1-rho)*a+(X-rho*X(-1))*b where either dynamic or static fitting procedures are applied. X YYF_DYF_S gls.fit 1 1 1.1592 NA NA 1.1434 2 2 3.5866 2.1499 2.1499 2.1457 3 3 4.1355 3.1478 3.7103 3.1480 4 4 3.9125 4.1484 4.5352 4.1504 5 5 2.7442 5.1502 5.0578 5.1527 6 6 6.0647 6.1523 5.2103 6.1551 7 7 6.9855 7.1547 7.1203 7.1574 . 47 47 49.4299 47.2521 47.5288 47.2507 48 48 48.7748 48.2545 49.1072 48.2531 49 49 48.3200 49.2570 49.4607 49.2554 50 50 50.2501 50.2594 49.8926 50.2578 All gls.fit values are on a line, YF_D (D for dynamic) soon begin to follow a line and YF_S (S for static) try to mimic Y. My question is: do R and EViews estimate the same model? If yes, why the estimates are different and which of the two (three?) procedures is correct? Thanking you in advance, Rem __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Problem with R 1.9.1
On Thu, 23 Sep 2004 11:56:37 -0300, Jacques Mendes Meyohas [EMAIL PROTECTED] wrote : Hi, Could someone help me ? When I´m running the following command in the new R 1.9.1 version the error message appear. It didn´t happen on R 1.8.0 glm(indenizacao/sinistros ~ sexo + plano + faixa, family=Gamma(link=log), weights=sinistros) Error in x[good, ] * w : non-conformable arrays Someone might recognize this problem, but you're likely to get a better response (or figure out the problem yourself!) if you try to simplify things to something completely self-contained. I'm guessing there's something wrong with your weight vector sinistros, but I can't check to be sure. Duncan Murdoch __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] How to improve the quality of curve/line plots?
Dear list, I'm using the windows version of R. When plotting a curve or a line for time series with annual data , e. g. GDP growth 1991-2003, the line seems to exist of a lot of smaller lines. Printing the results the curves and lines seems to be unclean (because of using small resolution bitmaps?). Comparing the result of R with the same results of Excel the lines in excel seems to havve a higher qualitiy. In Excel you also can produce curves instead of lines. Are there any possibilities how to improve the quality of the plots in R? How can R be influenced to plot clean lines with a higher resolution on the screen (I think it's not a question of the pdf- or png command.). Perhaps, it's a problem of the graphical possibilites of R because the most line plots which can be seen on the web have these problems. Thanks, Dr. Michael Wolf Bezirksregierung Münster Dezernat 61 Domplatz 1-348161 Münster Tel.: ++ 49 (02 51) / 4 11 - 17 95 Fax.: ++ 49 (02 51) / 4 11 - 8 17 95 E-Mail: [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] detection of outliers
On Thu, 23 Sep 2004 [EMAIL PROTECTED] wrote: Hi, this is both a statistical and a R question... what would the best way / test to detect an outlier value among a series of 10 to 30 values ? for instance if we have the following dataset: 10,11,12,15,20,22,25,30,500 I d like to have a way to identify the last data as an outlier (only one direction). One way would be to calculate abs(mean - median) and if elevated (to what extent ?) delete the extreme data then redo.. but is it valid to do so with so few data ? is the (trimmed mean - mean) more efficient ? if so, what would be the maximal tolerable value to use as a threshold ? (I guess it will be experiment dependent...) tests for skweness will probably required a larger dataset ? any suggestions are very welcome ! thanks for your help Philippe Guardiola, MD __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html You may want to read Davies and Gather, The identification of multiple outliers, JASA 88 (1993), 782-801. The simplest recommendation is to nominate all points with distance larger than c*mad(data) from the median as outliers. Choices of c depending on n are given in the above paper. This is somewhat better founded theoretically than the boxplot method recommended by Gabor G., but it is based on the assumption that the distribution on the non-outliers is close to the normal and especially not strongly skewed (the boxplot method seems to be a bit more robust against skewness). Christian *** Christian Hennig Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg [EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/ ### ich empfehle www.boag-online.de __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] detection of outliers
Hi, give a look to: http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm it's the Grubbs' Test for Outliers. It is based on the assumption of normality of data. Other methods of outliers' could: Run-Sequence Plot Histogram Normal Probability Plot Box-plot Best Vito you wrote: Hi, this is both a statistical and a R question... what would the best way / test to detect an outlier value among a series of 10 to 30 values ? for instance if we have the following dataset: 10,11,12,15,20,22,25,30,500 I d like to have a way to identify the last data as an outlier (only one direction). One way would be to calculate abs(mean - median) and if elevated (to what extent ?) delete the extreme data then redo.. but is it valid to do so with so few data ? is the (trimmed mean - mean) more efficient ? if so, what would be the maximal tolerable value to use as a threshold ? (I guess it will be experiment dependent...) tests for skweness will probably required a larger dataset ? any suggestions are very welcome ! thanks for your help Philippe Guardiola, MD = Diventare costruttori di soluzioni Visitate il portale http://www.modugno.it/ e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml ___ http://it.seriea.fantasysports.yahoo.com/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] detection of outliers
Not to oversimplify ... 1. (At least) dozens of books and thousands of papers have been written on this... 2. Most important question is: What is an outlier? (Many smart folks says that the concept is illogical/flawed -- there is no mystical boundary that one crosses to become a statistical pariah; many other smart folks disagree). 3. Equivalently: What is the model with respect to which values are outlying? (with apologies to Winston Churchill's: That is an indignity up with which I will not put.) So good advice here is: Beware of good advice about this. (Of course, I may just be an outlier ...) ;) Cheers, -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Thursday, September 23, 2004 7:22 AM To: [EMAIL PROTECTED] Subject: [R] detection of outliers Hi, this is both a statistical and a R question... what would the best way / test to detect an outlier value among a series of 10 to 30 values ? for instance if we have the following dataset: 10,11,12,15,20,22,25,30,500 I d like to have a way to identify the last data as an outlier (only one direction). One way would be to calculate abs(mean - median) and if elevated (to what extent ?) delete the extreme data then redo.. but is it valid to do so with so few data ? is the (trimmed mean - mean) more efficient ? if so, what would be the maximal tolerable value to use as a threshold ? (I guess it will be experiment dependent...) tests for skweness will probably required a larger dataset ? any suggestions are very welcome ! thanks for your help Philippe Guardiola, MD __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] block statistics with POSIX classes
I have followed Gabor's instructions: aggregate(list(y=y), list(dp$year), mean)$y # returns NULL since y is a time series NULL aggregate(list(y=as.vector(y)), list(dp$year), mean)$y# returns annual means [1] 0.0077656696 0.0224050294 0.001898 0.0240550925 -0.0084085867 [6] -0.0170950194 -0.0355641251 0.0065873997 0.0008253111 aggregate(list(y=y), list(dp$year), mean) # returns the same as the previous one Group.1 Series.1 1 96 0.0077656696 2 97 0.0224050294 3 98 0.001898 4 99 0.0240550925 5 100 -0.0084085867 6 101 -0.0170950194 7 102 -0.0355641251 8 103 0.0065873997 9 104 0.0008253111 Gabor's second suggestion returns different results: aggregate(ts(y, start=c(dp$year[1],dp$mon[1]+1), freq = 12), nfreq=1, mean) Time Series: Start = 96.3 End = 103. Frequency = 1 Series 1 [1,] 0.016120895 [2,] 0.024257131 [3,] 0.007526997 [4,] 0.017466118 [5,] -0.016024846 [6,] -0.017145159 [7,] -0.036047765 [8,] 0.014198501 aggregate(y, 1, mean) # verifies the result above Time Series: Start = 1996.333 End = 2003.333 Frequency = 1 Series 1 [1,] 0.016120895 [2,] 0.024257131 [3,] 0.007526997 [4,] 0.017466118 [5,] -0.016024846 [6,] -0.017145159 [7,] -0.036047765 [8,] 0.014198501 The data is from 1996:5 to 2004:8. The difference of the results must depend on the fact that the beginning of the data is not January and the end is not December? The first two solutions give nine annual means while the last two give only eight means. The block size in the last two must be 12 months, as is said in ?aggregate, instead of a calender year that I am looking for. Gabor's first suggestion solved my problem. Thank you, Hannu -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Gabor Grothendieck Sent: Thursday, September 23, 2004 3:52 PM To: [EMAIL PROTECTED] Subject: Re: [R] block statistics with POSIX classes I am not sure that I understand what you are looking for since you did not provide any sample data with corresponding output to clarify your question. Here is another guess. If y is just a numeric vector with monthly data and dp is a POSIXlt vector of the same length then: aggregate(list(y=y), list(dp$year), mean)$y will perform aggregation, as will aggregate(ts(y, start=c(d$year[1],d$mon[1]+1), freq = 12), nfreq=1, mean) which converts y to ts and then performs the aggregation. The first one will work even if y is irregular while the second one assumes that y must be monthly. The second one returns a ts object. By the way, I had a look at gev's source and it seems that despite the documentation it does not use POSIXct anywhere internally. If the block is year or other character value then it simply assumes that whatever datetime class is used has an as.POSIXlt method. If your dates were POSIXct rather than POSIXlt then it would be important to ensure that whatever timezone is assumed (which I did not check) in the conversion is the one you are using. You could use character dates or Date class to avoid this problem. Since you appear to be using POSIXlt datetimes from the beginning I think you should be ok. Kahra Hannu kahra at mpsgr.it writes: : : Thank you Petr and Gabor for the answers. : : They did not, however, solve my original problem. When I have a monthly time series y with a POSIX date : variable dp, the most obvious way to compute e.g. the annual means is to use aggregate(y, 1, mean) that : works with time series. However, I got stuck with the idea of using the 'by' argument as by = dp$year. But in : that case y has to be a data.frame. The easiest way must be the best way. : : Regards, : Hannu : : -Original Message- : From: r-help-bounces at stat.math.ethz.ch : [mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Gabor Grothendieck : Sent: Thursday, September 23, 2004 12:56 PM : To: r-help at stat.math.ethz.ch : Subject: Re: [R] block statistics with POSIX classes : : : Kahra Hannu kahra at mpsgr.it writes: : : : : : I have a monthly price index series x, the related return series y = diff (log : (x)) and a POSIXlt date-time : : variable dp. I would like to apply annual blocks to compute for example : annual block maxima and mean of y. : : : : When studying the POSIX classes, in the first stage of the learning curve, I : computed the maximum drawdown : : of x: : : mdd - maxdrawdown(x) : : max.dd - mdd$maxdrawdown : : from - as.character(dp[mdd$from]) : : to - as.character(dp[mdd$to]) : : from; to : : [1] 2000-08-31 : : [1] 2003-03-31 : : that gives me the POSIX dates of the start and end of the period and : suggests that I have done something correctly. : : : : Two questions: : : (1) how to implement annual blocks and compute e.g. annual max, min and mean : of y (each year's max, min, mean)? : : (2) how to
[R] R glm
Hello: would you please help me with the following glm question? for the R function glm, what I understand is: once you specify the family, then the link function is fixed. My question is: is it possible I use, for example, log link function, but the estimation approach for the guassian family? Thanks, Shuangge Ma, Ph.D. * CHSCC, Department of Biostatistics * * University of Washington * * Building 29, Suite 310, 6200 NE 74th ST. * * Seattle, WA 98115* * Tel: 206-685-7123 Fax: 206-616-4075 * __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Problem with R 1.9.1
Jacques Mendes Meyohas wrote: Hi, Could someone help me ? When I´m running the following command in the new R 1.9.1 version the error message appear. It didn´t happen on R 1.8.0 glm(indenizacao/sinistros ~ sexo + plano + faixa, family=Gamma(link=log), weights=sinistros) Error in x[good, ] * w : non-conformable arrays We cannot help if we don't know anything about the data you are using. You also might want to try out R-2.0.0 beta (but I guess it is user error). Uwe Ligges [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] fitting weibull distribution
Dear all, I get the following error message. And I cannot quite work out what is wrong. I think the optim gets infinite values. Certainly my data do not have any infinite values. How can I solve this? fitdistr(A1, weibull) Error in optim(start, mylogfn, x = x, hessian = TRUE, ...) : non-finite value supplied by optim I am using R version 1.9.1 on RedHat Linux, Kernel 2.6.8. Ulrich __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Issue with predict() for glm models
John Fox wrote: Dear Uwe, -Original Message- From: Uwe Ligges [mailto:[EMAIL PROTECTED] Sent: Thursday, September 23, 2004 8:06 AM To: John Fox Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [R] Issue with predict() for glm models John Fox wrote: Dear Uwe, Unless I've somehow messed this up, as I mentioned yesterday, what you suggest doesn't seem to work when the predictor is a matrix. Here's a simplified example: X - matrix(rnorm(200), 100, 2) y - (X %*% c(1,2) + rnorm(100)) 0 dat - data.frame(y=y, X=X) mod - glm(y ~ X, family=binomial, data=dat) new - data.frame(X = matrix(rnorm(20),2)) predict(mod, new) Dear John, the questioner had a 2 column matrix with 40 and one with 50 observations (not a 100 column matrix with 2 observation) and for those matrices it works ... Indeed, and in my example the matrix predictor X has 2 columns and 100 rows; I did screw up the matrix for the new data to be used for predictions (in the example I sent today but not yesterday), but even when this is done right -- where the new data has 10 rows and 2 columns -- there are 100 (not 10) predicted values: X - matrix(rnorm(200), 100, 2) # original predictor matrix with 100 rows y - (X %*% c(1,2) + rnorm(100)) 0 dat - data.frame(y=y, X=X) mod - glm(y ~ X, family=binomial, data=dat) John, note that I used glm(y ~ .) (the dot!), because the names are automatically chosen to be X.1 and X.2, hence you cannot use X in the formula in this case ... Best, Uwe new - data.frame(X = matrix(rnorm(20),10, 2)) # corrected -- note 10 rows predict(mod, new) # note 100 predicted values __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] R glm
In Venables Ripley: Modern Applied Statistics with S (MASS), (4th edition), on page 184 there is a table Families and link functions that gives you the available links with different families. The default and the only link with the gaussian family is identity. ciao, Hannu Kahra -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Shuangge Ma Sent: Thursday, September 23, 2004 6:26 PM To: [EMAIL PROTECTED] Subject: [R] R glm Hello: would you please help me with the following glm question? for the R function glm, what I understand is: once you specify the family, then the link function is fixed. My question is: is it possible I use, for example, log link function, but the estimation approach for the guassian family? Thanks, Shuangge Ma, Ph.D. * CHSCC, Department of Biostatistics * * University of Washington * * Building 29, Suite 310, 6200 NE 74th ST. * * Seattle, WA 98115* * Tel: 206-685-7123 Fax: 206-616-4075 * __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R glm
Dear Shuangge - There exists a default link function for each family, but you can specify the link, as well. see ?family (e.g., in html help click on the family link from the glm help page). Hank On Sep 23, 2004, at 12:25 PM, Shuangge Ma wrote: Hello: would you please help me with the following glm question? for the R function glm, what I understand is: once you specify the family, then the link function is fixed. My question is: is it possible I use, for example, log link function, but the estimation approach for the guassian family? Thanks, Shuangge Ma, Ph.D. * CHSCC, Department of Biostatistics * * University of Washington * * Building 29, Suite 310, 6200 NE 74th ST. * * Seattle, WA 98115* * Tel: 206-685-7123 Fax: 206-616-4075 * __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Dr. Martin Henry H. Stevens, Assistant Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/botany/bot/henry.html http://www.muohio.edu/ecology/ http://www.muohio.edu/botany/ E Pluribus Unum __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] block statistics with POSIX classes
Kahra Hannu kahra at mpsgr.it writes: : : I have followed Gabor's instructions: : : aggregate(list(y=y), list(dp$year), mean)$y # returns NULL since y is a time series : NULL : : aggregate(list(y=as.vector(y)), list(dp$year), mean)$y # returns annual means : [1] 0.0077656696 0.0224050294 0.001898 0.0240550925 -0.0084085867 : [6] -0.0170950194 -0.0355641251 0.0065873997 0.0008253111 : : aggregate(list(y=y), list(dp$year), mean) # returns the same as the previous one : Group.1 Series.1 : 1 96 0.0077656696 : 2 97 0.0224050294 : 3 98 0.001898 : 4 99 0.0240550925 : 5 100 -0.0084085867 : 6 101 -0.0170950194 : 7 102 -0.0355641251 : 8 103 0.0065873997 : 9 104 0.0008253111 : : Gabor's second suggestion returns different results: : : aggregate(ts(y, start=c(dp$year[1],dp$mon[1]+1), freq = 12), nfreq=1, mean) : Time Series: : Start = 96.3 : End = 103. : Frequency = 1 : Series 1 : [1,] 0.016120895 : [2,] 0.024257131 : [3,] 0.007526997 : [4,] 0.017466118 : [5,] -0.016024846 : [6,] -0.017145159 : [7,] -0.036047765 : [8,] 0.014198501 : : aggregate(y, 1, mean) # verifies the result above : Time Series: : Start = 1996.333 : End = 2003.333 : Frequency = 1 : Series 1 : [1,] 0.016120895 : [2,] 0.024257131 : [3,] 0.007526997 : [4,] 0.017466118 : [5,] -0.016024846 : [6,] -0.017145159 : [7,] -0.036047765 : [8,] 0.014198501 : : The data is from 1996:5 to 2004:8. The difference of the results must depend on the fact that the beginning of : the data is not January and the end is not December? The first two solutions give nine annual means while the : last two give only eight means. The block size in the last two must be 12 months, as is said in ?aggregate, : instead of a calender year that I am looking for. Gabor's first suggestion solved my problem. Yes, that seems to be the case. Using length instead of mean we find that the aggregate.data.frame example used calendar years as the basis of aggregation whereas the aggregate.ts example used successive 12 month periods starting from the first month discarding the 4 points at the end which do not fill out a full year. R set.seed(1) R dp - as.POSIXlt(seq(from=as.Date(1996-5-1), to=as.Date(2004-8-1), + by=month)) R y - rnorm(length(dp$year)) R aggregate(list(y=y), list(dp$year), length)$y [1] 8 12 12 12 12 12 12 12 8 R aggregate(ts(y, start=c(dp$year[1],dp$mon[1]+1), freq = 12), nfreq=1, length) Time Series: Start = 96.3 End = 103. Frequency = 1 [1] 12 12 12 12 12 12 12 12 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
followup: Re: [R] Issue with predict() for glm models
I have a follow up question that fits with this thread. Can you force an overlaid plot showing predicted values to follow the scaling of the axes of the plot over which it is laid? Here is an example based on linear regression, just for clarity. I have followed the procedure described below to create predictions and now want to plot the predicted values on top of a small section of the x-y scatterplot. x - rnorm(100, 10, 10) e - rnorm(100, 0, 5) y - 5 + 10 *x + e myReg1 - lm (y~x) plot(x,y) newX - seq(1,10,1) myPred - predict(myReg1,data.frame(x=newX)) Now, if I do this, I get 2 graphs overlaid but their axes do not line up. par(new=T) plot(newX,myPred$fit) The problem is that the second one uses the whole width of the graph space, when I'd rather just have it go from the small subset where its x is defined, from 1 to 10. Its stretching the range (1,10) for newX to use the same scale that goes from (-15, 35) where it plots x I know abline() can do this for lm, but for some other kinds of models, no lines() method is provided, and so I am doing this the old fashioned way. pj John Fox wrote: Dear Uwe, -Original Message- From: Uwe Ligges [mailto:[EMAIL PROTECTED] Sent: Thursday, September 23, 2004 8:06 AM To: John Fox Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [R] Issue with predict() for glm models John Fox wrote: Dear Uwe, Unless I've somehow messed this up, as I mentioned yesterday, what you suggest doesn't seem to work when the predictor is a matrix. Here's a simplified example: X - matrix(rnorm(200), 100, 2) y - (X %*% c(1,2) + rnorm(100)) 0 dat - data.frame(y=y, X=X) mod - glm(y ~ X, family=binomial, data=dat) new - data.frame(X = matrix(rnorm(20),2)) predict(mod, new) Dear John, the questioner had a 2 column matrix with 40 and one with 50 observations (not a 100 column matrix with 2 observation) and for those matrices it works ... Indeed, and in my example the matrix predictor X has 2 columns and 100 rows; I did screw up the matrix for the new data to be used for predictions (in the example I sent today but not yesterday), but even when this is done right -- where the new data has 10 rows and 2 columns -- there are 100 (not 10) predicted values: X - matrix(rnorm(200), 100, 2) # original predictor matrix with 100 rows y - (X %*% c(1,2) + rnorm(100)) 0 dat - data.frame(y=y, X=X) mod - glm(y ~ X, family=binomial, data=dat) new - data.frame(X = matrix(rnorm(20),10, 2)) # corrected -- note 10 rows predict(mod, new) # note 100 predicted values 12345 6 5.75238091 0.31874587 -3.00515893 -3.77282121 -1.97511221 0.54712914 789 10 11 12 1.85091226 4.38465524 -0.41028694 -1.53942869 0.57613555 -1.82761518 . . . 91 92 93 94 95 96 0.36210780 1.71358713 -9.63612775 -4.54257576 -5.29740468 2.64363405 97 98 99 100 -4.45478627 -2.44973209 2.51587537 -4.09584837 Actually, I now see the source of the problem: The data frames dat and new don't contain a matrix named X; rather the matrix is split columnwise: names(dat) [1] y X.1 X.2 names(new) [1] X.1 X.2 Consequently, both glm and predict pick up the X in the global environment (since there is none in dat or new), which accounts for why there are 100 predicted values. Using list() rather than data.frame() produces the originally expected behaviour: new - list(X = matrix(rnorm(20),10, 2)) predict(mod, new) 1 2 3 4 5 6 7 5.9373064 0.3687360 -8.3793045 0.7645584 -2.6773842 2.4130547 0.7387318 8 9 10 -0.4347916 8.4678728 -0.8976054 Regards, John Best, Uwe 12345 6 1.81224443 -5.92955128 1.98718051 -10.05331521 2.6506 -2.50635812 789 10 11 12 5.63728698 -0.94845276 -3.61657377 -1.63141320 5.03417372 1.80400271 13 14 15 16 17 18 9.32876273 -5.32723406 5.29373023 -3.90822713 -10.95065186 4.90038016 . . . 97 98 99 100 -6.92509812 0.59357486 -1.17205723 0.04209578 Note that there are 100 rather than 10 predicted values. But with individuals predictors (rather than a matrix), x1 - X[,1] x2 - X[,2] dat.2 - data.frame(y=y, x1=x1, x2=x2) mod.2 - glm(y ~ x1 + x2, family=binomial, data=dat.2) new.2 - data.frame(x1=rnorm(10), x2=rnorm(10)) predict(mod.2, new.2) 1 2 3 4 5 6 7 6.5723823 0.6356392 4.0291018 -4.7914650 2.1435485 -3.1738096 -2.8261585 8 9 10 -1.5255329 -4.7087592 4.0619290 works as expected (?). Regards, John -Original Message- From: [EMAIL
Re: [R] R glm
No! ?family The 'gaussian' family accepts the links 'identity', 'log' and 'inverse'; Kahra Hannu wrote: In Venables Ripley: Modern Applied Statistics with S (MASS), (4th edition), on page 184 there is a table Families and link functions that gives you the available links with different families. The default and the only link with the gaussian family is identity. ciao, Hannu Kahra -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Shuangge Ma Sent: Thursday, September 23, 2004 6:26 PM To: [EMAIL PROTECTED] Subject: [R] R glm Hello: would you please help me with the following glm question? for the R function glm, what I understand is: once you specify the family, then the link function is fixed. My question is: is it possible I use, for example, log link function, but the estimation approach for the guassian family? Thanks, Shuangge Ma, Ph.D. * CHSCC, Department of Biostatistics * * University of Washington * * Building 29, Suite 310, 6200 NE 74th ST. * * Seattle, WA 98115* * Tel: 206-685-7123 Fax: 206-616-4075 * __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: followup: Re: [R] Issue with predict() for glm models
Could you just use lines(newX, myPred, col=2) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Paul Johnson Sent: Thursday, September 23, 2004 10:3 AM To: r help Subject: followup: Re: [R] Issue with predict() for glm models I have a follow up question that fits with this thread. Can you force an overlaid plot showing predicted values to follow the scaling of the axes of the plot over which it is laid? Here is an example based on linear regression, just for clarity. I have followed the procedure described below to create predictions and now want to plot the predicted values on top of a small section of the x-y scatterplot. x - rnorm(100, 10, 10) e - rnorm(100, 0, 5) y - 5 + 10 *x + e myReg1 - lm (y~x) plot(x,y) newX - seq(1,10,1) myPred - predict(myReg1,data.frame(x=newX)) Now, if I do this, I get 2 graphs overlaid but their axes do not line up. par(new=T) plot(newX,myPred$fit) The problem is that the second one uses the whole width of the graph space, when I'd rather just have it go from the small subset where its x is defined, from 1 to 10. Its stretching the range (1,10) for newX to use the same scale that goes from (-15, 35) where it plots x I know abline() can do this for lm, but for some other kinds of models, no lines() method is provided, and so I am doing this the old fashioned way. pj John Fox wrote: Dear Uwe, -Original Message- From: Uwe Ligges [mailto:[EMAIL PROTECTED] Sent: Thursday, September 23, 2004 8:06 AM To: John Fox Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [R] Issue with predict() for glm models John Fox wrote: Dear Uwe, Unless I've somehow messed this up, as I mentioned yesterday, what you suggest doesn't seem to work when the predictor is a matrix. Here's a simplified example: X - matrix(rnorm(200), 100, 2) y - (X %*% c(1,2) + rnorm(100)) 0 dat - data.frame(y=y, X=X) mod - glm(y ~ X, family=binomial, data=dat) new - data.frame(X = matrix(rnorm(20),2)) predict(mod, new) Dear John, the questioner had a 2 column matrix with 40 and one with 50 observations (not a 100 column matrix with 2 observation) and for those matrices it works ... Indeed, and in my example the matrix predictor X has 2 columns and 100 rows; I did screw up the matrix for the new data to be used for predictions (in the example I sent today but not yesterday), but even when this is done right -- where the new data has 10 rows and 2 columns -- there are 100 (not 10) predicted values: X - matrix(rnorm(200), 100, 2) # original predictor matrix with 100 rows y - (X %*% c(1,2) + rnorm(100)) 0 dat - data.frame(y=y, X=X) mod - glm(y ~ X, family=binomial, data=dat) new - data.frame(X = matrix(rnorm(20),10, 2)) # corrected -- note 10 rows predict(mod, new) # note 100 predicted values 12345 6 5.75238091 0.31874587 -3.00515893 -3.77282121 -1.97511221 0.54712914 789 10 11 12 1.85091226 4.38465524 -0.41028694 -1.53942869 0.57613555 -1.82761518 . . . 91 92 93 94 95 96 0.36210780 1.71358713 -9.63612775 -4.54257576 -5.29740468 2.64363405 97 98 99 100 -4.45478627 -2.44973209 2.51587537 -4.09584837 Actually, I now see the source of the problem: The data frames dat and new don't contain a matrix named X; rather the matrix is split columnwise: names(dat) [1] y X.1 X.2 names(new) [1] X.1 X.2 Consequently, both glm and predict pick up the X in the global environment (since there is none in dat or new), which accounts for why there are 100 predicted values. Using list() rather than data.frame() produces the originally expected behaviour: new - list(X = matrix(rnorm(20),10, 2)) predict(mod, new) 1 2 3 4 5 6 7 5.9373064 0.3687360 -8.3793045 0.7645584 -2.6773842 2.4130547 0.7387318 8 9 10 -0.4347916 8.4678728 -0.8976054 Regards, John Best, Uwe 12345 6 1.81224443 -5.92955128 1.98718051 -10.05331521 2.6506 -2.50635812 789 10 11 12 5.63728698 -0.94845276 -3.61657377 -1.63141320 5.03417372 1.80400271 13 14 15 16 17 18 9.32876273 -5.32723406 5.29373023 -3.90822713 -10.95065186 4.90038016 . . . 97 98 99 100 -6.92509812 0.59357486 -1.17205723 0.04209578 Note that there are 100 rather than 10 predicted values. But with individuals predictors (rather than a matrix), x1 - X[,1] x2 - X[,2] dat.2 - data.frame(y=y, x1=x1, x2=x2) mod.2 - glm(y ~ x1 + x2, family=binomial, data=dat.2)
Re: [R] gsub
Thank you all for the help, specially Gabor that is exactly what I needed. A few examples that do the same thing is very helpful in understanding the structure of the call. Thank you again, Jean On Thu, 23 Sep 2004, Gabor Grothendieck wrote: Jean Eid jeaneid at chass.utoronto.ca writes: : : Hi : : A while back I used gsub to do the following : : temp-000US00231 : gsub(something here, , temp) : 00231 : : I think it involved the `meta characters' somehow. : : I do not know how to do this anymore. I know strsplit will also work but I : remember gsub was much faster. In essence the question is how to delete : all characters before a particular pattern. : : If anyone has some help file for this, it will be greatly appreciated. : I think you want sub in this case, not gsub. There are many possibilities here depending on what the general case is. The following all give the desired result for the example but their general cases differ. These are just some of the numerous variations possible. temp-000US00231 sub(.*US, , temp) sub(.*S, , temp) sub([[:digit:]]*[[:alpha:]]*, , temp) sub(.*[[:alpha:]], , temp) sub(.*[[:alpha:]][[:alpha:]], , temp) sub(.*[[:upper:]], , temp) sub(.*[[:upper:]][[:upper:]], , temp) sub(., , temp) substring(temp, 6) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R glm
Upon examining Table 7.1 on page 184 of VR 4th edition, I can see where the ambiguity would arise, however. Always best to check the help, I guess. Hank On Sep 23, 2004, at 1:04 PM, Paul Johnson wrote: No! ?family The 'gaussian' family accepts the links 'identity', 'log' and 'inverse'; Kahra Hannu wrote: In Venables Ripley: Modern Applied Statistics with S (MASS), (4th edition), on page 184 there is a table Families and link functions that gives you the available links with different families. The default and the only link with the gaussian family is identity. ciao, Hannu Kahra -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Shuangge Ma Sent: Thursday, September 23, 2004 6:26 PM To: [EMAIL PROTECTED] Subject: [R] R glm Hello: would you please help me with the following glm question? for the R function glm, what I understand is: once you specify the family, then the link function is fixed. My question is: is it possible I use, for example, log link function, but the estimation approach for the guassian family? Thanks, Shuangge Ma, Ph.D. * CHSCC, Department of Biostatistics * * University of Washington * * Building 29, Suite 310, 6200 NE 74th ST. * * Seattle, WA 98115* * Tel: 206-685-7123 Fax: 206-616-4075 * __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Dr. Martin Henry H. Stevens, Assistant Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/botany/bot/henry.html http://www.muohio.edu/ecology/ http://www.muohio.edu/botany/ E Pluribus Unum __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: followup: Re: [R] Issue with predict() for glm models
On Thu, 2004-09-23 at 12:02, Paul Johnson wrote: I have a follow up question that fits with this thread. Can you force an overlaid plot showing predicted values to follow the scaling of the axes of the plot over which it is laid? Here is an example based on linear regression, just for clarity. I have followed the procedure described below to create predictions and now want to plot the predicted values on top of a small section of the x-y scatterplot. x - rnorm(100, 10, 10) e - rnorm(100, 0, 5) y - 5 + 10 *x + e myReg1 - lm (y~x) plot(x,y) newX - seq(1,10,1) myPred - predict(myReg1,data.frame(x=newX)) Now, if I do this, I get 2 graphs overlaid but their axes do not line up. par(new=T) plot(newX,myPred$fit) The problem is that the second one uses the whole width of the graph space, when I'd rather just have it go from the small subset where its x is defined, from 1 to 10. Its stretching the range (1,10) for newX to use the same scale that goes from (-15, 35) where it plots x I know abline() can do this for lm, but for some other kinds of models, no lines() method is provided, and so I am doing this the old fashioned way. Paul, Instead of using plot() for the second set of points, use points(): x - rnorm(100, 10, 10) e - rnorm(100, 0, 5) y - 5 + 10 * x + e myReg1 - lm (y ~ x) plot(x, y) newX - seq(1, 10, 1) myPred - predict(myReg1, data.frame(x = newX)) points(newX, myPred$fit, pch = 19) This will preserve the axis scaling. If you use plot() without explicitly indicating xlim and ylim, it will automatically scale the axes based upon your new data, even if you indicated that the underlying plot should not be cleared. Alternatively, you could also use the lines() function, which will draw point to point lines: lines(newX, myPred$fit, col = red) If you want fitted lines and prediction/confidence intervals, you could use a function like matlines(), presuming that a predict method exists for the model type you want to use. There is an example of using this in R Help Desk in R News Vol 3 Number 2 (October 2003), in the first example, with a standard linear regression model. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] fitting weibull distribution
Hi. I think there may be one or more zeros in your data set, causing the problem: x - rgamma(100) fitdistr(x, weibull) fitdistr(c(x,0), weibull) Maybe you should omit the zeros. Christian __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Issue with predict() for glm models
Dear Uwe, -Original Message- From: Uwe Ligges [mailto:[EMAIL PROTECTED] Sent: Thursday, September 23, 2004 11:37 AM To: John Fox Cc: [EMAIL PROTECTED] Subject: Re: [R] Issue with predict() for glm models . . . John, note that I used glm(y ~ .) (the dot!), because the names are automatically chosen to be X.1 and X.2, hence you cannot use X in the formula in this case ... Best, Uwe OK -- I see. I did notice that you used . in the formula but didn't make the proper connection! Thanks, John __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] decompose a correlation matrix
Is there a simple way to decompose the upper triangle of a correlation matrix to a linear list; For example: X Y Z X 1 2 3 Y 2 1 4 Z 3 4 1 so you get a list like: xy 2 XZ 3 YZ 4 I suspect you can do it with a matrix transformation, but that beyond me at present. Many thanks Mark _ Department of Molecular and Human Genetics, Baylor College of Medicine, __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R vs EViews - serial correlation
The issue here is that you have confused an AR(1) process for the variable of interest with an AR(1) process for its residuals. The former is a true AR(1) process, while the latter is really an MA(1) process, in terms of Box-Jenkins style ARIMA models. Interpreting your original post: I met with some problems when dealing with a time series with serial correlation. FIRST, I generate a series with correlated errors set.seed(1) x=1:50 y=x+arima.sim(n = 50, list(ar = c(0.47))) This generates an ARIMA(1,0,0) dataset of 50 observations. So the d.v. has an AR(1) process. SECOND, I estimate three constants (a, b and rho) in the model Y=a+b*X+u, where u=rho*u(-1)+eps library(nlme) gls(y~x,correlation = corAR1(0.5)) # Is it the right procedure? Coefficients: (Intercept) x 0.1410465 1.0023341 Correlation Structure: AR(1) Formula: ~1 Parameter estimate(s): Phi 0.440594 Degrees of freedom: 50 total; 48 residual Residual standard error: 0.9835158 This estimates an AR(1) error process model -- or an ARIMA(0,0,1). By the Wold decomposition, the AR(1) dgp we should expect the MA(q) process to have a longer order than 1. THIRD, I do the same procedure with EViews as LS Y C X AR(1) and get Y = 0.1375 + 1.0024*X + [AR(1)=0.3915] This fits an AR(1) error process (ARIMA(0,0,1)) if I recall my Eviews syntax. My problem is actually connected with the fitting procedure. As far as I understand gls(y~x,correlation = corAR1(0.5))$fit is obtained through the linear equation 0.1410+1.0023*X while in EViews through the nonlinear equation Y=rho*Y(-1) + (1-rho)*a+(X-rho*X(-1))*b where either dynamic or static fitting procedures are applied. X YYF_DYF_S gls.fit 1 1 1.1592 NA NA 1.1434 2 2 3.5866 2.1499 2.1499 2.1457 3 3 4.1355 3.1478 3.7103 3.1480 4 4 3.9125 4.1484 4.5352 4.1504 5 5 2.7442 5.1502 5.0578 5.1527 6 6 6.0647 6.1523 5.2103 6.1551 7 7 6.9855 7.1547 7.1203 7.1574 . 47 47 49.4299 47.2521 47.5288 47.2507 48 48 48.7748 48.2545 49.1072 48.2531 49 49 48.3200 49.2570 49.4607 49.2554 50 50 50.2501 50.2594 49.8926 50.2578 Again, both of the models you fit are NOT the DGP you simulated. All gls.fit values are on a line, YF_D (D for dynamic) soon begin to follow a line and YF_S try to mimic Y. Correct since you are fitting a model of correlated INNOVATIONS, rather than a model of correlated Y's My question is: do R and EViews estimate the same model? If yes, why the estimates are different and which of the two (three?) procedures is correct? If you want to fit the DGP you proposed above, you should use arima(y, order=c(1,0,0), xreg=x) Hope that helps. -- Patrick T. Brandt Assistant Professor Department of Political Science University of North Texas [EMAIL PROTECTED] http://www.psci.unt.edu/~brandt __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] decompose a correlation matrix
On Thu, 23 Sep 2004 13:08:53 -0500, Mark Strivens [EMAIL PROTECTED] wrote : Is there a simple way to decompose the upper triangle of a correlation matrix to a linear list; For example: X Y Z X 1 2 3 Y 2 1 4 Z 3 4 1 so you get a list like: xy 2 XZ 3 YZ 4 I suspect you can do it with a matrix transformation, but that beyond me at present. x[ row(x) col(x) ] will give the entries you want. Duncan Murdoch __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] R glm
As ?glm says, see ?family. Andy From: Kahra Hannu In Venables Ripley: Modern Applied Statistics with S (MASS), (4th edition), on page 184 there is a table Families and link functions that gives you the available links with different families. The default and the only link with the gaussian family is identity. ciao, Hannu Kahra From: Shuangge Ma Hello: would you please help me with the following glm question? for the R function glm, what I understand is: once you specify the family, then the link function is fixed. My question is: is it possible I use, for example, log link function, but the estimation approach for the guassian family? Thanks, Shuangge Ma, Ph.D. * CHSCC, Department of Biostatistics * * University of Washington * * Building 29, Suite 310, 6200 NE 74th ST. * * Seattle, WA 98115* * Tel: 206-685-7123 Fax: 206-616-4075 * __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] decompose a correlation matrix
This may not have things in the order you want but you can see if it gets close to what you want: x - matrix(1:16,ncol=4) x [,1] [,2] [,3] [,4] [1,]159 13 [2,]26 10 14 [3,]37 11 15 [4,]48 12 16 y - x[row(x) col(x)] y [1] 5 9 10 13 14 15 bob -Original Message- From: Mark Strivens [mailto:[EMAIL PROTECTED] Sent: Thursday, September 23, 2004 2:09 PM To: [EMAIL PROTECTED] Subject: [R] decompose a correlation matrix Is there a simple way to decompose the upper triangle of a correlation matrix to a linear list; For example: X Y Z X 1 2 3 Y 2 1 4 Z 3 4 1 so you get a list like: xy 2 XZ 3 YZ 4 I suspect you can do it with a matrix transformation, but that beyond me at present. Many thanks Mark _ Department of Molecular and Human Genetics, Baylor College of Medicine, __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] decompose a correlation matrix
Thanks for the answers the appear to be just right. i.e. corrmat[upper.tri(corrmat)] or x[col(x)row(x)] The slight problem is I lose all the column row labels from the correlation matrix, so I am not sure of the order of the values (i.e. which combination of markers the value represents). Is there anyway to drag the labels along with the value extraction? _ Department of Molecular and Human Genetics, Baylor College of Medicine, __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] decompose a correlation matrix
From: Duncan Murdoch On Thu, 23 Sep 2004 13:08:53 -0500, Mark Strivens [EMAIL PROTECTED] wrote : Is there a simple way to decompose the upper triangle of a correlation matrix to a linear list; For example: X Y Z X 1 2 3 Y 2 1 4 Z 3 4 1 so you get a list like: xy 2 XZ 3 YZ 4 I suspect you can do it with a matrix transformation, but that beyond me at present. x[ row(x) col(x) ] will give the entries you want. Duncan Murdoch Perhaps slightly more transparent: x[upper.tri(x)] or x[lower.tri(x)] Best, Andy __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] decompose a correlation matrix
From: Mark Strivens Thanks for the answers the appear to be just right. i.e. corrmat[upper.tri(corrmat)] or x[col(x)row(x)] The slight problem is I lose all the column row labels from the correlation matrix, so I am not sure of the order of the values (i.e. which combination of markers the value represents). Is there anyway to drag the labels along with the value extraction? _ Department of Molecular and Human Genetics, Baylor College of Medicine, idx - upper.tri(x) cbind(row=row(x)[idx], col=col(x)[idx], value=x[idx]) Best, Andy __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] decompose a correlation matrix
On 23-Sep-04 Mark Strivens wrote: Is there a simple way to decompose the upper triangle of a correlation matrix to a linear list; For example: X Y Z X 1 2 3 Y 2 1 4 Z 3 4 1 so you get a list like: xy 2 XZ 3 YZ 4 I suspect you can do it with a matrix transformation, but that beyond me at present. Let C be your correlation matrix. Then: C[lower.tri(C)] or equivalently C[upper.tri(C)] E.g. (in your notation): C [,1] [,2] [,3] [,4] [1,]1234 [2,]2156 [3,]3517 [4,]4671 (C[lower.tri(C)]) [1] 2 3 4 5 6 7 See ?lower.tri Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 23-Sep-04 Time: 20:37:36 -- XFMail -- __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Issue with predict() for glm models
John Fox wrote: Dear Uwe, -Original Message- From: Uwe Ligges [mailto:[EMAIL PROTECTED] Sent: Thursday, September 23, 2004 11:37 AM To: John Fox Cc: [EMAIL PROTECTED] Subject: Re: [R] Issue with predict() for glm models . . . John, note that I used glm(y ~ .) (the dot!), because the names are automatically chosen to be X.1 and X.2, hence you cannot use X in the formula in this case ... Best, Uwe OK -- I see. I did notice that you used . in the formula but didn't make the proper connection! Sorry, my first reply was too short and imprecisely. Thank you to help clarifying things. Uwe Thanks, John __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] folding table into a matrix
I'm just getting started with R, so feel free to point me to the appropriate documentation if this is already answered somewhere (though I've been unable to find it myself). This does seem like a rather basic question. I want to fold a table into a matrix. The table is formatted like so: Column_Index Value 1 486 2 688 3 447 4 555 5 639 1 950 2 881 3 1785 4 1216 1 612 2 790 3 542 4 1310 5 976 And I want to end up with something like this: [,1] [,2] [,3] [,4] [,5] [1,] 486 688 447 555 639 [2,] 950 881 1785 1216NA [3,] 612 790 512 1310 976 Since not all the rows are complete, I can't just reformat using matrix(), I need to go by the index information in the Column_Index column. This seems like something simple to do, but I'm stumped. Thanks. -- Gene __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] folding table into a matrix
Have you looked at index arrays in An Introduction to R, available as the first menu option after help.start() in R? The version I got with R 1.9.1 for Windows includes the following: x - array(1:20,dim=c(4,5)) # Generate a 4 by 5 array. x [,1] [,2] [,3] [,4] [,5] [1,]159 13 17 [2,]26 10 14 18 [3,]37 11 15 19 [4,]48 12 16 20 i - array(c(1:3,3:1),dim=c(3,2)) i # i is a 3 by 2 index array. [,1] [,2] [1,]13 [2,]22 [3,]31 x[i] # Extract those elements [1] 9 6 3 x[i] - 0 # Replace those elements by zeros. x [,1] [,2] [,3] [,4] [,5] [1,]150 13 17 [2,]20 10 14 18 [3,]07 11 15 19 [4,]48 12 16 20 The key point here is that i is a 3x2 array, and x[i] references the 3 elements of x that have the row and column indices in x. Does this provide the information you need? hope this helps. spencer graves Gene Cutler wrote: I'm just getting started with R, so feel free to point me to the appropriate documentation if this is already answered somewhere (though I've been unable to find it myself). This does seem like a rather basic question. I want to fold a table into a matrix. The table is formatted like so: Column_Index Value 1 486 2 688 3 447 4 555 5 639 1 950 2 881 3 1785 4 1216 1 612 2 790 3 542 4 1310 5 976 And I want to end up with something like this: [,1] [,2] [,3] [,4] [,5] [1,] 486 688 447 555 639 [2,] 950 881 1785 1216NA [3,] 612 790 512 1310 976 Since not all the rows are complete, I can't just reformat using matrix(), I need to go by the index information in the Column_Index column. This seems like something simple to do, but I'm stumped. Thanks. -- Gene __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Spencer Graves, PhD, Senior Development Engineer O: (408)938-4420; mobile: (408)655-4567 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Problems with boot and optim
Hi Luke, I think your problem is in the function lik.hetprobit and not remembering that R is case sensitive so X and x are not the same. The parameters passed in are called X, Y and Z which change for each bootstrap dataset. Within the function, however, your first three lines are Y - as.matrix(y) X - as.matrix(x) Z - as.matrix(z) since x, y, z (lowercase) do not exist in the function, they are being sought in the global workspace which remains the same for each bootstrap dataset so that after these three lines your X, Y and Z (uppercase) take on these values no matter what was input to the function. Replace x, y and z in the function by X, Y and Z and it should work. HTH, Angelo On Tue, 21 Sep 2004, Luke Keele wrote: I am trying to bootstrap the parameters for a model that is estimated through the optim() function and find that when I make the call to boot, it runs but returns the exact same estimate for all of the bootstrap estimates. I managed to replicate the same problem using a glm() model but was able to fix it when I made a call to the variables as data frame by their exact names. But no matter how I refer to the variables in the het.fit function (see below) I get the same result. I could bootstrap it with the sample command and a loop, but then the analysis in the next step isn't as nice The code for the likelihood and the call to boot is below. I have tried numerous other permutations as well. I am using R 1.9.1 on Windows XP pro. Thanks Luke Keele #Define Likelihood lik.hetprobit -function(par, X, Y, Z){ #Pull Out Parameters Y - as.matrix(y) X - as.matrix(x) Z - as.matrix(z) K -ncol(X) M -ncol(Z) b - as.matrix(par[1:K]) gamma - as.matrix(par[K+1:M]) mu - (X%*%b) sd - exp(Z%*%gamma) mu.sd -(mu/sd) #Form Likelihood log.phi - pnorm(ifelse(Y == 0, -1, 1) * mu.sd, log.p = TRUE) 2 * sum(log.phi) } y - as.matrix(abhlth) x - as.matrix(reliten) ones = rep(1, nrow(x)) x = cbind(ones,x) z = as.matrix(abinfo) data.het - as.matrix(cbind(y,x,z)) het.fit - function(data){ mod - optim(c(1,0,0), lik.hetprobit, Y=data$y, X=data$x, Z=data$z, method=BFGS, control=list(fnscale=-1), hessian=T) c(mod$par) } case.fun - function(d,i) het.fit(d[i,]) het.case - boot(data.het, case.fun, R=50) Luke Keele Post-Doctoral Fellow in Quantitative Methods Nuffield College, Oxford University Oxford, UK [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- -- | Angelo J. CantyEmail: [EMAIL PROTECTED] | | Mathematics and Statistics Phone: (905) 525-9140 x 27079 | | McMaster UniversityFax : (905) 522-0935 | | 1280 Main St. W. | | Hamilton ON L8S 4K1 | __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] folding table into a matrix
Gene Cutler [EMAIL PROTECTED] writes: I'm just getting started with R, so feel free to point me to the appropriate documentation if this is already answered somewhere (though I've been unable to find it myself). This does seem like a rather basic question. I want to fold a table into a matrix. The table is formatted like so: Column_Index Value 1 486 2 688 3 447 4 555 5 639 1 950 2 881 3 1785 4 1216 1 612 2 790 3 542 4 1310 5 976 And I want to end up with something like this: [,1] [,2] [,3] [,4] [,5] [1,] 486 688 447 555 639 [2,] 950 881 1785 1216NA [3,] 612 790 512 1310 976 Since not all the rows are complete, I can't just reformat using matrix(), I need to go by the index information in the Column_Index column. This seems like something simple to do, but I'm stumped. It's not completely trivial, since you're relying on ordering information: the missing col.5 value goes in the 2nd row, but you only know that because values are ordered in row blocks. If you supply the rows that things belong in, the task does becomes simple: Row_Index - rep(1:3,c(5,4,5)) M - matrix(NA, 3, 5) M[cbind(Row_Index,Column_Index)] - Value Now how to compute Row_Index from Column_Index? If you know that each group starts with a 1, you might use (rename Column_Index as cc for brevity) cumsum(cc==1) [1] 1 1 1 1 1 2 2 2 2 3 3 3 3 3 If you can't make that assumption, you might consider something like cumsum(c(1,diff(cc)0)) [1] 1 1 1 1 1 2 2 2 2 3 3 3 3 3 -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] decompose a correlation matrix
Thanks guys for the help, here's the final (grizzly?) solution: #generate a correlation matrix cm-cor(someDataFrame, y = NULL ...) # get the list of labels (included in the dataframe) labels-labels(cm)[[1]] # retrieve the uppper portion of the correlation matrix (as logical values) idx - upper.tri(cm) # combine the logical values with the marker list mcm-cbind(marker=paste((markers[col(t(cm))[idx]],'',markers[row(t(cm))[idx ]]), row=row(t(cm))[idx], col=col(t(cm))[idx], value=cm[idx]) gives the format: label2 label1 1 2 0.97712 label3 label1 1 3 0.84587 label4 label1 1 4 0.92184 label5 label1 1 5 0.54698 There seemed to be some issues with the row()and col() commands not thinking the output of cor() was a matrix so I used t() to force it into a table... _ Department of Molecular and Human Genetics, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 77030. 713-798-1947 (work) 832-876-7956 (cell) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Cox proportional hazards model
Min-Han Tan wrote: Good afternoon, I am currently trying to do some work on survival analysis. - I hope to seek your advice re: 2 questions (1 general and 1 specific) (1) I'm trying to do a stratified Cox analysis and subsequently plot(survfit(object)). It seems to work for some strata, but not for others. I have tumor grade, which is a range of 1 - 4. When I divide this range of 1:4 into 2 groups, it works fine for strata(grade2) and strata(grade 3). However, if I do a strata(grade1), there is an error when I do a survfit( coxph object ) Call: survfit.coxph(object = s) Error in print.survfit(structure(list(n = as.integer(46), time = c(22, : length of dimnames [1] not equal to array extent (2) As a general question, is it possible to distinguish between confounding and interaction in the Cox proportional hazards model? Thanks! Min-Han The Design package provides another way to use the survival package. library(Design) # also issues library(Hmisc) so need to install both d - datadist(mydataframe); options(datadist='d') f - cph(Surv(...) ~ ... + strat(z), surv=TRUE) # note strat instead of strata survplot(f, z=vector of strata values to plot survival curves for) survplot has many options -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] moving average method for time series objects
Dear R-Help readers, I suspect that this question must be a FAQ, but my investigation of the archives has not been very revealing. Is there an R function for calculating moving averages of time series objects? Thank you for your time and patience. -Paul Paul Schwarz, Ph.D. Associate Director of Methodology Gartner Vendor Marketing Solutions, Custom Research +1 503 241 8036 x186 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] RMySQL and Blob
Hi, I tried your suggestion to blindly import the blob into R, doing the following: con - dbConnect(MySQL, host=host, user=user, dbname=db) rs - dbGetQuery(con, statement=paste(select picture from db where id=1) It didn't crash R. But rs is not usable. It seems that it has been converted to a character object. In addition, we need to pass an image format to R, like png, bmp or something. It's unclear to me how to achieve this even if we can read rs as a binary object as you suggested. Any ideas? Thanks for suggestions! Jonathan -Original Message- From: David James [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 22, 2004 10:06 AM To: LI,JONATHAN (A-Labs,ex1) Cc: David James Subject: Re: [R] RMySQL and Blob [EMAIL PROTECTED] wrote: Hi David, The application I have in mind is for images. In my case, size of images is known and they are not big. As an example, a 64*32 image will have 2048 pixels. If they are 8-bit grey-level pixels, the image occupies 2KB memory. I may venture to guess that the unknown size and type of a blob object in MySQL prevent it from being very usable in R since R doesn't have a datatype for a binary blob? You could just blindly try to import it into R (but do it on a clean workspace, since it may crash R and you could loose your data!). The underlying C code clearly identifies FIELD_TYPE_BLOB and goes ahead and puts it in an R character vector (with comments clearly stating that it is a hack). Once it moves the data from the MySQL result set buffer to the R vector, it computes the length in both places and prints a warning if they differ. Or you could try to hack something. For instance, what happens if instead of bringing the blob you import, say, as a string? con - dbConnect(MySQL, ) rs - dbSendQuery(con, select SUBSTRING(blob, 0) from table) dd - fetch(rs) One possible general solution would be to define a new class binaryConnection simmilar to textConnection, so that you can readBin() and writeBin() from it. In this way, blobs could return a binary buffer (just a pointer to a block of C memory) that could be given to binaryConnection: data - fetch(rs) for(i in seq(nrow(data)){ ## extract blobs from each row and create a binary connection bcon = binaryConnection(blobs$image[1]) img = readBin(bcon, integer, n = 2048) ## work with the image } let me know what happens if you try to naively import a blob... -- David Thanks! Jonathan -Original Message- From: David James [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 22, 2004 7:05 AM To: LI,JONATHAN (A-Labs,ex1) Cc: [EMAIL PROTECTED] Subject: Re: [R] RMySQL and Blob Hi Jonathan, Currently RMySQL doesn't handle blob objects. The mechanics of inserting and extracting blob objects by itself is not too hard, but issues such as how should blobs be made available to R, how to prevent buffers overflows, how to prevent huge blobs from exhausting the available memory, should R callback functions be invoked as chunks of the blob are brought in, etc., need more consideration. And these issues are not R/MySQL specific, but also relevant to other databases and other non-dbms interfaces. BTW there are R facilities (e.g., external pointers, finalizers) that seems quite important for this type of implementation. What type and how big are the blobs that want to import? -- David [EMAIL PROTECTED] wrote: Dear R experts, Does RMySQL package handle Blob datatype in a MySQL database? Blob can represent an image, a sound or some other large and complex binary objects. In an article published by R-database special interest group, named A common database interface (DBI) (updated June 2003), it's mentioned in open issues and limitations that We need to carefully plan how to deal with binary objects. Before I invest time to try, I would appreciate any experts' opinions. Thanks, Jonathan __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- David A. James Statistics Research, Room 2C-276Phone: (908) 582-3082 Bell Labs, Lucent Technologies Fax:(908) 582-3340 Murray Hill, NJ 09794-0636 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Intrduction of function
Hi all: I've written a function and saved the worksapce.When the workspace of my function is re-open,I want display some introduction or step by step guidance of the function for the users.How can I do it? Thanks a lot! My best regards! __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] folding table into a matrix
Gene Cutler gcutler at amgen.com writes: : : I'm just getting started with R, so feel free to point me to the : appropriate documentation if this is already answered somewhere (though : I've been unable to find it myself). This does seem like a rather : basic question. : I want to fold a table into a matrix. The table is formatted like so: : : Column_Index Value : 1 486 : 2 688 : 3 447 : 4 555 : 5 639 : 1 950 : 2 881 : 3 1785 : 4 1216 : 1 612 : 2 790 : 3 542 : 4 1310 : 5 976 : : And I want to end up with something like this: : :[,1] [,2] [,3] [,4] [,5] : [1,] 486 688 447 555 639 : [2,] 950 881 1785 1216NA : [3,] 612 790 512 1310 976 : : Since not all the rows are complete, I can't just reformat using : matrix(), I need to go by the index information in the Column_Index : column. This seems like something simple to do, but I'm stumped. Suppose z is the two column matrix which we wish to reshape. Using Peter's cumsum expression to distinguish the groups, tapply ts to each group turning each into a ts object. cbind these ts objects together (which has the effect of adding the NAs at the end automatically) and finally transpose the result (which also turns it back into a matrix). The following one liner solves the example problem: t(do.call(cbind,tapply(z[,2], cumsum(z[,1]==1), ts))) This approach also extends to the case where the indices in column 1 have gaps. In this case we need an irregular time series package. Using zoo we carry out the analogous operations using the indices in column 1 for a group as the times and the values in column 2 as the time series values. We make use of zoo's multiway merge and Peter's second, more general, cumsum expression to define the groups: require(zoo) f - function(i) zoo(z[i,2], z[i,1]) g - cumsum(c(1,diff(z[,1])0)) t(as.matrix(do.call(merge, tapply(1:nrow(z), g, f __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] moving average method for time series objects
Schwarz,Paul Paul.Schwarz at gartner.com writes: : : Dear R-Help readers, : : I suspect that this question must be a FAQ, but my investigation of the archives has not been very revealing. : Is there an R function for calculating moving averages of time series objects? : : Thank you for your time and patience. : : -Paul Check out: https://stat.ethz.ch/pipermail/r-sig-finance/2004q3/000104.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] can I get these result with R?
i want to check if the previously stock of social capital has impact on the present economoc performace.so i get the following variables:economic performace ,stock of socail capital ;each of them from year 1996 to 2002,from 30 different areas. the data as following: performance stockyearereas 12,321996 1 11 12 1996 2 123 34 1996 3 ¡¡¡¡¡¡ 12,3321997 1 11 212 1997 2 123 32 1997 3 ¡¡¡¡ 13,2322002 1 12 212 2002 2 1233 2002 3 so i want to see if the 1/stock[1996],stock[1997],stock[1998],stock[1999],sotck[2000],stock[2001],stock[2002] has impact on performance[2002]? 2/stock[1996],stock[1997],stock[1998],stock[1999],sotck[2000],stock[2001] has impact on performance[2001], 3/stock[1996],stock[1997],stock[1998],stock[1999],sotck[2000], has impact on performance[2000], ¡¡ and to see if the 1/performace of year 1996,1997,1998,1999,2000,2001 has impact on performace of year 2002, 2/performace of year 1996,1997,1998,1999,2000 has impact on performace of year 2001, 1/performace of year 1996,1997,1998,1999, has impact on performace of year 2000, ¡¡ arima model seems not work,as i have data from 31 eareas,and i want to see if the differeces in economic performace of differenct areas has relation to the differences in the stock of social capital in each areas. is it impossible mission? any help will be appreciated. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] SJava Issue
Hi, I have downloaded and installed SJava-0.68-0 on my linux box. When I try to initialise the JVM from within R this is what I get: .JavaInit() (1) error initializing manager class can't find class java/lang/Hashtable Error in .JavaInit() : Couldn't start Java Virtual Machine: can't find class java/lang/Hashtable My JAVA_HOME variable is set in my .tcshrc script, so it shouldn't really have any problem finding java. (And its strange that its looking for Hashtable in java.lang!) This problem was posted on the mailing list by someone else also using the j2sdk1.4.2_01 JDK from Sun last year (http://www.mail-archive.com/[EMAIL PROTECTED]/msg09080.html) and I was wondering if anyone had figured out how to resolve it? Cheers, Anar __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html