[R] Difficult survival model
Hi all, I would appreciate your advice how to model the survival model for the following data, especially if it can be modeled in one model or if I should (have to) break it up (i.e. assume that some events are independent of each other, etc.). Data is on an experimental stock market simulation. Dependent variable is the hazard of transition from either holding the stock to closing the position (-1) or from not holding the stock to opening a position in that stock (1) in the variable CHNG. Main variables of interest are the regressors IV1 and IV2 (which are covariates, not factors). BUY and SELL indicate the number of stocks of type STOCK that were purchased/sold in ROUND. Accordingly, NBT and NET record the number of STOCK held at the beginning and the end of period ROUND, respectively. HP is a 0/1 dummy indicating whether or not shares of STOCK were held by an investor in any given period. And these are the issues I see in the data: 1. Over time, the stock may be held several times. This suggests a conditional hazard. TIMESHELD measures the number of times a stock has been held (meaning that the stock position must have been closed inbetween). Alternatively, it could measure the number of prior transactions in the stock (all kinds). This I intend to model with strata. 2. I also have repeated measures for individuals. As some investor may be more of a trader than others, I want to model this with a random effect (frailty). 3. Different stocks may have a different risk of being traded. This I would model with another frailty term (or would I rather model it differently?). 4. In case that an investor holds a stock, there are competing risks between increasing or decreasing the position. 5. There are also competing risks, so to speak, between the different stocks. That is, when an investor makes a buy or sell decision, s/he decides between all available stocks. One stocks competes with all other stocks to be purchased. When selling a stock though, one stock only competes with the other different stocks in the portfolio to be sold, of course. I have no idea how to model 4. and 5. though and if this is possible. I would also appreciate your feedback on the suggested way of modeling 1., 2., and 3. Apologies for the lengthy description and thanks much for your support, Daniel ID ROUND STOCK BUY SELLNBT NET IV1 IV2 HP CHNGTIMESHELD 1 1 A 0 0 0 0 3 4 0 0 0 1 2 A 10 0 0 10 3 5 1 1 1 1 3 A 0 0 10 10 5 4 1 0 1 1 4 A 0 10 10 0 2 1 1 -1 1 1 5 A 0 0 0 0 3 4 0 0 1 1 1 A 20 0 0 20 2 5 1 1 2 1 2 A 0 0 20 20 4 5 1 0 2 1 3 A 0 0 20 20 2 3 1 0 2 1 4 A 0 0 20 20 5 3 1 0 2 1 5 A 0 20 20 0 4 5 1 -1 2 2 1 B 15 0 0 15 1 2 1 1 1 2 2 B 0 0 15 15 1 2 1 0 1 2 3 B 0 0 15 15 2 2 1 0 1 2 4 B 0 15 15 0 5 1 1 -1 1 2 5 B 0 0 0 0 4 4 0 0 1 2 1 B 5 0 0 5 3 4 1 1 2 2 2 B 0 0 5 5 5 2 1 0 2 2 3 B 0 5 5 0 4 2 1 -1 2 2 4 B 0 0 0 0 4 1 0 0 2 2 5 B 0 0 0 0 4 1 0 0 2 PhD Program Strategy Dept. of Management and Organization Robert H. Smith School of Business University of Maryland Van Munching Hall College Park, MD 20742 www.rhsmith.umd.edu www.umd.edu mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] residual plots for lmer in lme4 package
Hi, I was wondering if I might be able to ask some advice about doing residual plots for the lmer function in the lme4 package. Our group's aim is to find if the expression staining of a particular gene in a sample (or core) is related to the pathology of the core. To do this, we used the lmer function to perform a logistic mixed model below. I apologise in advance for the lack of subscripts. logit P(yij=1) = â0 + Ui + â1Patholij where Ui~N(0, óu2), i indexes patient, j indexes measurement, Pathol is an indicator variable (0,1) for benign epithelium versus cancer and yij is the staining indicator (0,1) for each core where yij equals 1 if the core stains positive and 0 otherwise. (I have inserted some example R code at the end of this message) I was wondering if you knew how I could test that the errors Ui are normally distributed in my fit. I am not familiar with how to do residual plots for a mixed logistic regression (or even for any logistic regression!). Any advice would be greatly appreciated! Thanks and Regards Marg Example code: lmer(Intensity.over2.hyp.canc~Pathology + (1|Patient.ID), data= HSD17beta4.hyp.canc, family=binomial, na.action=na.omit) #Family: binomial(logit link) #AIC BIClogLik deviance # 414.1101 431.4147 -203.0550 406.1101 #Random effects: # GroupsNameVarianceStd.Dev. # Patient.ID (Intercept) 4.9558 2.2262 # of obs: 559, groups: Patient.ID, 177 #Estimated scale (compare to 1) 0.6782544 #Fixed effects: #Estimate Std. Error z value Pr(|z|) #(Intercept) -2.057340.24881 -8.2686 2.2e-16 *** #PathologyHyperplasia -1.766270.44909 -3.9330 8.389e-05 *** NB. Intensity.over2.hyp.canc is the staining of the core (ie 0 or 1) Pathology is Hyperplasia or Cancer Dr Margaret Gardiner-Garden Garvan Institute of Medical Research 384 Victoria Street Darlinghurst Sydney NSW 2010 Australia Phone: 61 2 9295 8348 Fax: 61 2 9295 8321 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] binomial simulation
On Wed, 15 Aug 2007, Moshe Olshansky wrote: Thank you - I wasn't aware of this function. One can even use lchoose which allows really huge arguments (more than 2^1000)! Using dbinom() for binomial probabilities would be even better, and that has a log=TRUE argument to return results on natural log scale. dbinom(k,N,p,log=TRUE) + dbinom(m,k,q,log=TRUE) [1] -92.52584 log(choose(N,k)*p^k*(1-p)^(N-k)*choose(k,m)*q^m*(1-q)^(k-m)) [1] -92.52584 --- Lucke, Joseph F [EMAIL PROTECTED] wrote: C is an R function for setting contrasts in a factor. Hence the funky error message. ?C Use choose() for your C(N,k) ?choose choose(200,2) 19900 choose(200,100) 9.054851e+58 N=200; k=100; m=50; p=.6; q=.95 choose(N,k)*p^k*(1-p)^(N-k)*choose(k,m)*q^m*(1-q)^(k-m) 6.554505e-41 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Moshe Olshansky Sent: Wednesday, August 15, 2007 2:06 AM To: sigalit mangut-leiba; r-help Subject: Re: [R] binomial simulation No wonder that you are getting overflow, since gamma(N+1) = n! and 200! (200/e)^200 10^370. There exists another way to compute C(N,k). Let me know if you need this and I will explain to you how this can be done. But do you really need to compute the individual probabilities? May be you need something else and there is no need to compute the individual probabilities? Regards, Moshe. --- sigalit mangut-leiba [EMAIL PROTECTED] wrote: Thank you, I'm trying to run the joint probabilty: C(N,k)*p^k*(1-p)^(N-k)*C(k,m)*q^m*(1-q)^(k-m) and get the error: Error in C(N, k) : object not interpretable as a factor so I tried the long way: gamma(N+1)/(gamma(k+1)*(gamma(N-k))) and the same with k, and got the error: 1: value out of range in 'gammafn' in: gamma(N + 1) 2: value out of range in 'gammafn' in: gamma(N - k) Do you know why it's not working? Thanks again, Sigalit. On 8/14/07, Moshe Olshansky [EMAIL PROTECTED] wrote: As I understand this, P(T+ | D-)=1-P(T+ | D+)=0.05 is the probability not to detect desease for a person at ICU who has the desease. Correct? What I asked was whether it is possible to mistakenly detect the desease for a person who does not have it? Assuming that this is impossible the formula is below: If there are N patients, each has a probability p to have the desease (p=0.6 in your case) and q is the probability to detect the desease for a person who has it (q = 0.95 for ICU and q = 0.8 for a regular unit), then P(k have the desease AND m are detected) = P(k have the desease)*P(m are detected / k have the desease) = C(N,k)*p^k*(1-p)^(N-k)*C(k,m)*q^m*(1-q)^(k-m) where C(a,b) is the Binomial coefficient a above b - the number of ways to choose b items out of a (when the order does not matter). You of course must assume that N = k = m = 0 (otherwise the probability is 0). To generate such pairs (k infected and m detected) you can do the following: k - rbinom(N,1,p) m - rbinom(k,1,q) Regards, Moshe. --- sigalit mangut-leiba [EMAIL PROTECTED] wrote: Hi, The probability of false detection is: P(T+ | D-)=1-P(T+ | D+)=0.05. and I want to find the joint probability P(T+,D+)=P(T+|D+)*P(D+) Thank you for your reply, Sigalit. On 8/13/07, Moshe Olshansky [EMAIL PROTECTED] wrote: Hi Sigalit, Do you want to find the probability P(T+ = t AND D+ = d) for all the combinations of t and d (for ICU and Reg.)? Is the probability of false detection (when there is no disease) always 0? Regards, Moshe. --- sigalit mangut-leiba [EMAIL PROTECTED] wrote: hello, I asked about this simulation a few days ago, but still i can't get what i need. I have 2 units: icu and regular. from icu I want to take 200 observations from binomial distribution, when probability for disease is: p=0.6. from regular I want to take 300 observation with the same probability: p=0.6 . the distribution to detect disease when disease occurred- *for someone from icu* - is: p(T+ | D+)=0.95. the distribution to detect disease when disease occurred- *for someone from reg.unit* - is: p(T+ | D+)=0.8. I want to compute the joint distribution for each === message truncated === __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list
Re: [R] Possible to import histograms in R?
On 15/08/07, Ted Harding [EMAIL PROTECTED] wrote: So you if you want the density plot, you would need to calculate this for yourself. E.g. H1$density - counts/sum(counts) plot.histogram(H1,freq=FALSE) Oddly, plot.histogram doesn't work in my version of R (which is 2.5.1), even though R can give me the help page for it. plot() works fine though. Thanks again! NC And so on ... there are many relevant details in the help pages ?hist and ?plot.histogram Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 15-Aug-07 Time: 11:53:14 -- XFMail -- [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] function to find coodinates in an array
Dear list, I am looking for a function/way to get the array coordinates of given elements in an array. What I mean is the following: - Let X be a 3D array - I find the ordering of the elements of X by ord - order(X) (this returns me a vector) - I now want to find the x,y,z coordinates of each element of ord Can anyone help me? Thanks! Ana __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] summarising systemfit with saveMemory
Hi all - I'm on R 2.5.1 for XP. in the systemfit package, the summary is set to print the McElroy's measure of fit unless it's NULL. When the option saveMemory = TRUE, the McElroy isn't included, instead it defaults to NA. Thus I am unable to use summary.systemfit. library(systemfit) example(systemfit) surfit2 - systemfit(SUR,system,data=Kmenta,saveMemory=T) summary(surfit2) As far as I can tell, this is a result of the following code. print.summary.systemfit --SNIP-- if (!is.null(x$mcelr2)) { cat(McElroy's R-squared value for the system: ) cat(formatC(x$mcelr2, digits = digits, width = -1)) cat(\n) } --SNIP-- combined with systemfit --SNIP-- if (!saveMemory) { --SNIP-- mcelr2 - 1 - (rtOmega %*% resids)/denominator } else { mcelr2 - NA } --SNIP-- Or am I missing something here. Thanks in advance. -- Tim Calkins 0406 753 997 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] an easy way to construct this special matirx
Hi, Sorry if this is a repost. I searched but found no results. I am wondering if it is an easy way to construct the following matrix: r 1 0 00 r^2 r 1 00 r^3 r^2 r 10 r^4 r^3 r^2 r1 where r could be any number. Thanks. Wen [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] time series with quality codes
list(...), I am working with environmental time series (eg rainfall, stream flow) that have attached quality codes for each data point. The quality codes have just a few factor levels, like good, suspect, poor, imputed. I use the quality codes in plots and summaries. They are carried through when a time series is aggregated to a longer time-step, according to rules like worst, median or mode. I need to support time steps of anything from hours to years. I can assume the data are regular time series -- they might be irregular initially but could be 'regularized'. But I would want to plot irregular time series along with regular ones. So far I have been using a data frame with a POSIXct column, a numeric column and a factor column. However I would like to use zoo instead, because of its many utility functions and easy conversion to ts. Is there any prospect of zoo handling such numeric + factor data? Other suggestions on elegant ways to do it are also welcome. Felix -- Felix Andrews / 安福立 PhD candidate Integrated Catchment Assessment and Management Centre The Fenner School of Environment and Society The Australian National University (Building 48A), ACT 0200 Beijing Bag, Locked Bag 40, Kingston ACT 2604 http://www.neurofractal.org/felix/ xmpp:[EMAIL PROTECTED] 3358 543D AAC6 22C2 D336 80D9 360B 72DD 3E4C F5D8 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] an easy way to construct this special matirx
On 16-Aug-07 03:10:17, [EMAIL PROTECTED] wrote: Hi, Sorry if this is a repost. I searched but found no results. I am wondering if it is an easy way to construct the following matrix: r 1000 r^2 r100 r^3 r^2 r10 r^4 r^3 r^2 r1 where r could be any number. Thanks. Wen I dare say there's an even simpler way (and I feel certain someone will post one ... ); but (example): r-0.1 r1-r^((-3):0); r2-r^(3:0) R-r1%*%t(r2) R ## [,1] [,2] [,3] [,4] ##[1,] 1.000 10.00 100.0 1000 ##[2,] 0.100 1.00 10.0 100 ##[3,] 0.010 0.10 1.0 10 ##[4,] 0.001 0.01 0.11 R[upper.tri(R)]-0 R ## [,1] [,2] [,3] [,4] ##[1,] 1.000 0.00 0.00 ##[2,] 0.100 1.00 0.00 ##[3,] 0.010 0.10 1.00 ##[4,] 0.001 0.01 0.11 Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 16-Aug-07 Time: 09:37:28 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] an easy way to construct this special matirx
?toeplitz ?lower.tri since it is the lower triangle of a Toeplitz matrix (or drop the top row) r - 0.95 R - toeplitz(r^(0:4)) R[upper.tri(R)] - 0 R[-1,] On Thu, 16 Aug 2007, [EMAIL PROTECTED] wrote: Hi, Sorry if this is a repost. I searched but found no results. I am wondering if it is an easy way to construct the following matrix: r 1 0 00 r^2 r 1 00 r^3 r^2 r 10 r^4 r^3 r^2 r1 where r could be any number. Thanks. Wen -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] time series with quality codes
On Thu, 16 Aug 2007, Felix Andrews wrote: list(...), I am working with environmental time series (eg rainfall, stream flow) that have attached quality codes for each data point. The quality codes have just a few factor levels, like good, suspect, poor, imputed. I use the quality codes in plots and summaries. They are carried through when a time series is aggregated to a longer time-step, according to rules like worst, median or mode. I need to support time steps of anything from hours to years. I can assume the data are regular time series -- they might be irregular initially but could be 'regularized'. But I would want to plot irregular time series along with regular ones. So far I have been using a data frame with a POSIXct column, a numeric column and a factor column. However I would like to use zoo instead, because of its many utility functions and easy conversion to ts. Is there any prospect of zoo handling such numeric + factor data? Other suggestions on elegant ways to do it are also welcome. There is some limited support for this in zoo. You can do z - zoo(myfactor, myindex) and work with it like a zoo series and then coredata(z) will recover a factor. However, you cannot bind this to other series without losing the factor structure. At least not in a plain zoo series. But you can do df - merge(z, Z, retclass = data.frame) where every column of the resulting data.frame is a univariate zoo series. The final option would be to just have a data.frame as usual and put your data/index into one column. But then it's more difficult to leverage zoo's functionality. I would like to have more support for things like this, but currently this is what we have. Best, Z Felix -- Felix Andrews / 安福立 PhD candidate Integrated Catchment Assessment and Management Centre The Fenner School of Environment and Society The Australian National University (Building 48A), ACT 0200 Beijing Bag, Locked Bag 40, Kingston ACT 2604 http://www.neurofractal.org/felix/ xmpp:[EMAIL PROTECTED] 3358 543D AAC6 22C2 D336 80D9 360B 72DD 3E4C F5D8 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (coxph, se) Obtaining standard errors of coefficients from coxph to store
Hi all, I'm wanting to be able to find and store the z-score of coxph below: - modz=coxph(Surv(TSURV,STATUS)~RAGE+DAGE+REG_WTIME_M+CLD_ISCH+POLY_VS, data=kidneyT,method=breslow) I know summary(modz) will give me this, but how do i extract the standard error or z-score values in a similar way to obtaining the coefficients by coef(modz) ? I think it must be something to do with modz$var but I'm having a complete mental blank. I need this info so I can write a function to use within a bootstrap so I can record the number of times (proportion) each variable in the Cox PH model is actually significant over all the bootstrap resamples. Any assistance is greatly appreciated DL Click to find local singles for dating, romance and fun. http://tagline.bidsystem.com/fc/Ioyw36XJJVs581mfqGSywy0Z69Mq8VM03oVytPu 8otqP84CBZmNX2G/ span id=m2wTlpfont face=Arial, Helvetica, sans-serif size=2 style=font-size:13.5px___BRGet the Free email that has everyone talking at a href=http://www.mail2world.com target=newhttp://www.mail2world.com/abr font color=#99Unlimited Email Storage #150; POP3 #150; Calendar #150; SMS #150; Translator #150; Much More!/font/font/span [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] to combine bwplot + srt option?
Hi R-users, Could someone help me to combine bwplot and srt option (exemple srt = 45 degree or srt 90 degree)? My graphic contains 146 boxplots, I would like to label all of them. As you know, labels are not readable. Thank you for your help in advance! Lassana KOITA Chargé d'Etudes de Sécurité Aéroportuaire et d'Analyse Statistique / Project Engineer Airport Safety Studies Statistical analysis Service Technique de l'Aviation Civile (STAC) / Civil Aviation Technical Department Direction Générale de l'Aviation Civile (DGAC) / French Civil Aviation Headquarters Tel: 01 49 56 80 60 Fax: 01 49 56 82 14 E-mail: [EMAIL PROTECTED] http://www.stac.aviation-civile.gouv.fr/ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] time series periodic data
Thank you. I will try to get the book, althoug I am not sure if I with my tiny knowledge of mathematics will be able to digest it. Meanwhile I tried to make 7 min average and then to reanalyze by spectrum, but the output was not very convincing. Regards Petr Pikal [EMAIL PROTECTED] Rolf Turner [EMAIL PROTECTED] napsal dne 15.08.2007 22:12:15: On 16/08/2007, at 12:26 AM, Petr PIKAL wrote: Dear all Please help me with analysis of some periodic data. I have an output from measurement each minute and this output is modulated by rotation of the equipment (approx 6.5 min/revolution). I can easily spot this frequency from spectrum(mydata, some suitable span) However from other analysis I suspect there is a longer term oscilation (about 70-80 min) I am not able to find it from mentioned data. Plese give me some hint how I could prove that such long term modulation of my data exist in presence of quite strong modulation by this short term oscilations. You may be able to get some mileage out of complex demodulation (at, say, frequencies of 1/70, 1/71, ..., 1/80 cycles per minute). See Peter Bloomfield's book (``Fourier Analysis of Time Series: An Introduction'', 2nd ed., Wiley, 2000) for a good introduction to complex demodulation. cheers, Rolf Turner ## Attention: This e-mail message is privileged and confidential. If you are not the intended recipient please delete the message and notify the sender. Any views or opinions presented are solely those of the author. This e-mail has been scanned and cleared by MailMarshal www.marshalsoftware.com ## __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] to combine bwplot + srt option?
KOITA Lassana - STAC/ACE lassana.koita at aviation-civile.gouv.fr writes: Could someone help me to combine bwplot and srt option (exemple srt = 45 degree or srt 90 degree)? My graphic contains 146 boxplots, I would like to label all of them. As you know, labels are not readable. You cannot, since bwplot is part of lattice and srt is standard R-graphics. I agree, this can be very confusing, lattice was added later to R and I am happy we have it now (thanks, Deepayan). So when you see that the function you are using (bwplot) is documented under lattice, always use the other options available in this package. These are mostly documented under xyplot, which is worth a fixed link on the desktop. Dieter library(lattice) bwplot(voice.part ~ height, data = singer, horizontal=FALSE, scales = list(x = list(rot = 45))) ## see xyplot __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lmer coefficient distributions and p values
Daniel Lakeland dlakelan at street-artists.org writes: We have used the lmer package to fit various models for the various experiments that she has done (random effects from multiple measurements for each animal or each trial, and fixed effects from developmental stage, and genotype etc). The results are fairly clear cut to me, and I suggested that she publish the results as coefficient estimates for the relevant contrasts, and their standard error estimates. However, she has read the statistical guidelines for the journal and they insist on p values. I personally think that p values, and sharp-null hypothesis tests are misguided and should be banned from publications, but it doesn't much matter what I think compared to what the editors want. Based on searching the archives, and finding this message: https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html . From what you describe, using the stable function lme in nlme by the same author Douglas Bates would do the job better for you. Remember lmer is under development, which does not mean it's bad, but some nice things like weight are still missing. For lme, you have excellent documentation in the book by Pinheiro/Bates. Dieter __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Polynomial fitting
Remember that polynomials of the form y = b1*x + b2*x^2 + ... + bm*x^m fit the linear regression equation form Y = beta_1*x_1 + beta_2*x_2 + ... + beta_m*x_m If one sets (from the 1st to the 2nd equation) x - x_1 x^2 - x_2 x^3 - x_3 etc. In R this is easy, just use the identity operator I() when specifying the equation. e.g. for a 3rd order polynomial: model - lm(Y ~ x + I(x^2) + I(x^3) + I(x^4)) hth, Jon *** I'm looking some way to do in R a polynomial fit, say like polyfit function of Octave/MATLAB. For who don't know, c = polyfit(x,y,m) finds the coefficients of a polynomial p(x) of degree m that fits the data, p(x[i]) to y[i], in a least squares sense. The result c is a vector of length m+1 containing the polynomial coefficients in descending powers: p(x) = c[1]*x^n + c[2]*x^(n-1) + ... + c[n]*x + c[n+1] For prediction, one can then use function polyval like the following: y0 = polyval( polyfit( x, y, degree ), x0 ) y0 are the prediction values at points x0 using the given polynomial. In R, we know there is lm for 1-degree polynomial: lm( y ~ x ) == polyfit( x, y, 1 ) and for prediction I can just create a function like: lsqfit - function( model, xx ) return( xx * coefficients(model)[2] + coefficients(model)[1] ); and then: y0 - lsqfit(x0) (I've tried with predict.lm( model, newdata=x0 ) but obtain a bad result) For a degree greater than 1, say m, what can I use.?? I've tried with lm( y ~ poly(x, degree=m) ) I've also looked at glm, nlm, approx, ... but with these I can't specify the polynomial degree. Thank you so much! Sincerely, -- Marco Checked by AVG Free Edition. 16:55 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] to combine bwplot + srt option?
Thank you for you quite and useful explanation. And do know how to sort them by median? best regargs Lassana KOITA Chargé d'Etudes de Sécurité Aéroportuaire et d'Analyse Statistique / Project Engineer Airport Safety Studies Statistical analysis Service Technique de l'Aviation Civile (STAC) / Civil Aviation Technical Department Direction Générale de l'Aviation Civile (DGAC) / French Civil Aviation Headquarters Tel: 01 49 56 80 60 Fax: 01 49 56 82 14 E-mail: [EMAIL PROTECTED] http://www.stac.aviation-civile.gouv.fr/ Dieter Menne [EMAIL PROTECTED] Envoyé par : [EMAIL PROTECTED] 16/08/2007 12:16 A r-help@stat.math.ethz.ch cc Objet Re: [R] to combine bwplot + srt option? KOITA Lassana - STAC/ACE lassana.koita at aviation-civile.gouv.fr writes: Could someone help me to combine bwplot and srt option (exemple srt = 45 degree or srt 90 degree)? My graphic contains 146 boxplots, I would like to label all of them. As you know, labels are not readable. You cannot, since bwplot is part of lattice and srt is standard R-graphics. I agree, this can be very confusing, lattice was added later to R and I am happy we have it now (thanks, Deepayan). So when you see that the function you are using (bwplot) is documented under lattice, always use the other options available in this package. These are mostly documented under xyplot, which is worth a fixed link on the desktop. Dieter library(lattice) bwplot(voice.part ~ height, data = singer, horizontal=FALSE, scales = list(x = list(rot = 45))) ## see xyplot __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ADF test
Hi all, Hope you people do not feel irritated for repeatedly sending mail on Time series. Here I got another problem on the same, and hope I would get some answer from you. I have following dataset: data[,1] [1] 4.96 4.95 4.96 4.96 4.97 4.97 4.97 4.97 4.97 4.98 4.98 4.98 4.98 4.98 4.99 4.99 5.00 5.01 [19] 5.01 5.00 5.01 5.01 5.01 5.01 5.02 5.01 5.02 5.02 5.03 5.03 5.03 5.03 5.03 5.04 5.04 5.04 [37] 5.04 5.04 5.04 5.05 5.05 5.06 5.06 5.06 5.07 5.07 5.07 5.07 5.08 5.07 5.08 5.08 5.09 5.10 [55] 5.10 5.09 5.10 5.10 5.10 5.10 5.10 5.10 5.10 5.10 5.11 5.11 5.11 5.11 5.11 5.11 5.11 5.12 [73] 5.12 5.12 5.12 5.13 5.14 5.14 5.14 5.14 5.14 5.15 5.15 5.15 5.15 5.14 5.15 5.15 5.15 5.16 [91] 5.16 5.16 5.16 5.16 5.16 5.16 5.16 5.16 5.16 5.16 5.17 5.17 5.17 5.17 5.17 5.18 5.18 5.18 [109] 5.18 5.18 5.19 5.19 5.20 5.20 5.20 5.20 5.20 5.21 5.21 5.21 5.21 5.21 5.21 5.22 5.22 5.23 [127] 5.23 5.23 5.23 5.24 5.24 5.24 5.25 5.24 5.24 5.25 5.26 5.26 5.26 5.26 5.26 5.26 5.26 5.27 [145] 5.27 5.26 5.27 5.27 5.28 5.29 5.29 5.29 5.29 5.30 5.30 5.30 5.31 5.31 5.31 5.32 5.32 5.33 [163] 5.33 Now I want to conduct a test for stationarity using ADF test : adf.test((data[,1]), stationary, 0) Augmented Dickey-Fuller Test data: (data[, 1]) Dickey-Fuller = -3.7351, Lag order = 0, p-value = 0.02394 alternative hypothesis: stationary But surprisingly it leads towards rejestion of NULL [p-value is less than 0.05], i.e. indicates a possible stationary series. However ploting a graph of actual data set it doesn't seem so. Am I making any mistakes ? Can anyone give me any suggestion? Regards, Megh - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] combine 2 data frames with missing values
Hi All, I have 2 data frames as follows: abc 1 NA 1 2 NA 2 NA 3 3 So a, b are the input values and c is the output which I am interested in. NA - Missing values. I used rbind, but its not working. Let me know if anyone can help me Thanks, Pratap - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Combine matrix
Hi R user, I am new to R, and I have a very simple question for you. I have two matrix A and B, with internally redundant rownames (but variables are different). Some, but not all the rownames are shared among the two matrix. I want to create a greater matrix that combines the previuos two, and has all the possible combinations of matching rownames lines among matrix A and B. looking for the solution I bumped in merge but actually works on data.frame, and in dataframe there could be no redundancy in names. can you help me?? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ungültige Versionsspezifikation
Thank you. Hereby I send you the return to sessionInfo(). I have meanwhile updated to 2.5.1. R ist ein Gemeinschaftsprojekt mit vielen Beitragenden. Tippen Sie 'contributors()' für mehr Information und 'citation()', um zu erfahren, wie R oder R packages in Publikationen zitiert werden können. Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder 'help.start()' für eine HTML Browserschnittstelle zur Hilfe. Tippen Sie 'q()', um R zu verlassen. library(cairo) Fehler in package_version(vers) : ungültige Versionsspezifikation sessionInfo() R version 2.5.1 (2007-06-27) i486-pc-linux-gnu locale: LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;LC_COLLATE=de_DE.UTF-8;LC_MONETARY=de_DE.UTF-8;LC_MESSAGES=de_DE.UTF-8;LC_PAPER=de_DE.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods [7] base Am Mittwoch, den 15.08.2007, 15:26 +0200 schrieb Martin Maechler: JK == John Kane [EMAIL PROTECTED] on Wed, 15 Aug 2007 08:55:59 -0400 (EDT) writes: JK I think we need more information about your system. JK Please run JK sessionInfo() JK and include the information in another posting. Yes, indeed. However R version 2.3.1 seems a bit too old to fit to a current version of cairo (rather 'cairoDevice' ??). And BTW: It is a *package*, not a library!!! Martin Maechler, ETH Zurich JK --- Mag. Ferri Leberl [EMAIL PROTECTED] wrote: Dear everybody, excuse me if this question ist trivial, however, I have now looked for an answer for quite a while and therefore dare placing it here. I want to export .svg-files and got here the advice to employ the cairo-library. I downloaded the *current*-version here and expanded it to /usr/local/lib/R/site-library. library(cairo) returned ungültige Versionsspezifikation (INVALID VERSION SPECIFICATION). I got some answer here (EPM39), but, stupid enough, I could not employ it as I don't know where to look for the variable named there. The R-version I employ is 2.3.1. Can anybody solve the (probably simple) problem? Thank you in advance. Yours, Ferri JK JK __ JK R-help@stat.math.ethz.ch mailing list JK https://stat.ethz.ch/mailman/listinfo/r-help JK PLEASE do read the posting guide http://www.R-project.org/posting-guide.html JK and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: combine 2 data frames with missing values
Hi for this particular task rowSums(cbind(a,b), na.rm=T) gives you c column Petr [EMAIL PROTECTED] [EMAIL PROTECTED] napsal dne 16.08.2007 13:03:51: Hi All, I have 2 data frames as follows: abc 1 NA 1 2 NA 2 NA 3 3 So a, b are the input values and c is the output which I am interested in. NA - Missing values. I used rbind, but its not working. Let me know if anyone can help me Thanks, Pratap - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combine 2 data frames with missing values
I have 2 data frames as follows: abc 1 NA 1 2 NA 2 NA 3 3 So a, b are the input values and c is the output which I am interested in. NA - Missing values. I used rbind, but its not working. Let me know if anyone can help me What exactly is your problem? To create such a data frame or what? What did you do with rbind? Please be more explicit if you ask questions, sometimes its hard to guess what the people want... Stefan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regression tree: labels in the terminal nodes
Dear everybody, I'm a new user of R 2.4.1 and I'm searching for information on improving the output of regression tree graphs. In the terminal nodes I am up to now able to indicate the number of values (n) and the mean of all values in this terminal node by the command text(tree, use.n=T, xpd=T) Yet I would like to indicate automatically in the output graph of the tree some quality measure, e.g. impurity (or standard deviation .) and a character to identify which cases are in one terminal node, e.g. given a ID number or name. Until now I did not discover in my R help scripts or in the R programm help how to do it. So I calculate impurity by hand, and I'm identifying the cases in each node by hand, and I am adding these values with a graphical software to the graph (as e.g. given *.jpg file). Therefore anybody can see that I added these values by hand. I'm quite sure that there is a more easy and faster way which looks (and is!) more professional. Could anybody help me? That would be great! Thank you very much for your support! Dr. Jürgen Kühn Leibniz-Centre for Agricultural Landscape Research (ZALF) Institute of Soil Landscape Research_ http://www.zalf.de/home_zalf/institute/blf/blf_e/index.html_ Eberswalder Str. 84 D-15374 Müncheberg Tel.: ++49/(0)33432/82-123 Fax: ++49/(0)33432/82-280 E-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Internet: http://www.zalf.de/home_zalf/institute/blf/blf/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] use AnnBuilderSourceUrls with local path insted of ftp adress
Hallo everybody. I want to build my own GO package using the function GOPkgBuilder of AnnBuilder. It uses AnnBuilderSourceUrls to collect data from different ftp sites. These data are not complete for my organism, so I would like to change the ftp adresses to a local one. The changing itself is working but when I run the script I get the following Error: Error in loadFromUrl(file.path(sourceURLs[[EG]], gene2go.gz)) : URL /path/to/file/gene2go.gz is incorrect or the target site is not responding! The full code of the script is this: library(AnnBuilder) list-options(AnnBuilderSourceUrls) list$AnnBuilderSourceUrls$EG-/path/to/file/ options(AnnBuilderSourceUrls=list$AnnBuilderSourceUrls) GOPkgBuilder(pkgName = GO, pkgPath = ., filename = go_200706-termdb.obo-xml.gz, version = v.1.0, author = list(author = author, maintainer = maintainer [EMAIL PROTECTED]) ) Thanks in advance for any help. Daniela Albrecht -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ADF test
Hi Megh i hope you have confused with 'what is my NULL hypothesis' ? i suggest you to take any ideal dataset about which you know that whether it is stationary or not ? apply the test to know what is the NULL hypothesis used in any software :) usually in many softwares the NULL hypothesis is in negative sense. Please everybody comment on this :) hoping that you series is return series and not price series :). Thus applying adf test for your series :) my test show that your series is not stationary at all as my correlalogram comes as follows. 1 0.998283718 0.997582959 0.99703921 0.99665648 0.996548006 0.99647617 0.995925698 0.995317271 0.994746317 0.994727781 0.99508777 0.99501576 0.99437404 0.993338292 0.992684933 0.992310313 @@@ m Although if i assume that your series is a price series and defining return = 100*ln(pt/pt-1). Returns become as follows 0 -0.201816416 0.201816416 0 0.201409937 0 0 0 0 0.201005093 0 0 0 0 0.200601873 0 0.200200267 0.199800266 0 -0.199800266 0.199800266 0 0 0 0.199401861 -0.199401861 0.199401861 0 0.199005041 0 0 0 0 0.198609797 0 0 0 0 0 0.19821612 0 0.197824001 0 0 0.19743343 0 0 0 0.197044399 -0.197044399 0.197044399 0 0.196656897 0.196270917 0 -0.196270917 0.196270917 0 0 0 0 0 0 0 0.195886449 0 0 0 0 0 0 0.195503484 0 0 0 0.195122013 0.194742028 0 0 0 0 0.194363521 0 0 0 -0.194363521 0.194363521 0 0 0.193986482 0 0 0 0 0 0 0 0 0 0 0.193610903 0 0 0 0 0.193236775 0 0 0 0 0.192864091 0 0.192492841 0 0 0 0 0.192123018 0 0 0 0 0 0.191754613 0 0.191387618 0 0 0 0.191022026 0 0 0.190657827 -0.190657827 0 0.190657827 0.190295015 0 0 0 0 0 0 0.18993358 0 -0.18993358 0.18993358 0 0.189573516 0.189214815 0 0 0 0.188857469 0 0 0.18850147 0 0 0.18814681 0 0.187793482 0 then the value of autocorrelations i.e. correlalogram comes as approx 1 0.089252308 0.058227292 0.017934984 0.025264591 -0.014925678 -0.004668544 0.014890995 0.001625333 0.010669589 -0.010587179 -0.03000206 -0.011863654 0.00772247 0.024272208 -0.019521244 -0.035998575 -0.061608877 -0.048401231 -0.008594859 which show that the values are quite likely to make series stationary :) data[1:10,] V1 V2 1 4.96 0.000 2 4.95 -0.2018164 3 4.96 0.2018164 4 4.96 0.000 5 4.97 0.2014099 6 4.97 0.000 7 4.97 0.000 8 4.97 0.000 9 4.97 0.000 10 4.98 0.2010051 adf.test(data[,1]) Augmented Dickey-Fuller Test data: data[, 1] Dickey-Fuller = -1.1052, Lag order = 5, p-value = 0.9188 alternative hypothesis: stationary adf.test(data[,2]) Augmented Dickey-Fuller Test data: data[, 2] Dickey-Fuller = -6.2265, Lag order = 5, p-value = 0.01 alternative hypothesis: stationary Warning message: p-value smaller than printed p-value in: adf.test(data[, 2]) this explains everything clearly :) your NULL hypothesis is Series is not stationary - hence hypothesis in negative sense prooved by taking ideal data data1-rnorm(1) #normal data adf.test(data1) Augmented Dickey-Fuller Test data: data1 Dickey-Fuller = -21.2118, Lag order = 21, p-value = 0.01 alternative hypothesis: stationary Warning message: p-value smaller than printed p-value in: adf.test(data1) HTH Megh Dal [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 08/16/2007 04:27 PM To r-help@stat.math.ethz.ch cc Subject [R] ADF test Hi all, Hope you people do not feel irritated for repeatedly sending mail on Time series. Here I got another problem on the same, and hope I would get some answer from you. I have following dataset: data[,1] [1] 4.96 4.95 4.96 4.96 4.97 4.97 4.97 4.97 4.97 4.98 4.98 4.98 4.98 4.98 4.99 4.99 5.00 5.01 [19] 5.01 5.00 5.01 5.01 5.01 5.01 5.02 5.01 5.02 5.02 5.03 5.03 5.03 5.03 5.03 5.04 5.04 5.04 [37] 5.04 5.04 5.04 5.05 5.05 5.06 5.06 5.06 5.07 5.07 5.07 5.07 5.08 5.07 5.08 5.08 5.09 5.10 [55] 5.10 5.09 5.10 5.10 5.10 5.10 5.10 5.10 5.10 5.10 5.11 5.11 5.11 5.11 5.11 5.11 5.11 5.12 [73] 5.12 5.12 5.12 5.13 5.14 5.14 5.14 5.14 5.14 5.15 5.15 5.15 5.15 5.14 5.15 5.15 5.15 5.16 [91] 5.16 5.16 5.16 5.16 5.16 5.16 5.16 5.16 5.16 5.16 5.17 5.17 5.17 5.17 5.17 5.18 5.18 5.18 [109] 5.18 5.18 5.19 5.19 5.20 5.20 5.20 5.20 5.20 5.21 5.21 5.21 5.21 5.21 5.21 5.22 5.22 5.23 [127] 5.23 5.23 5.23 5.24 5.24 5.24 5.25 5.24 5.24 5.25 5.26 5.26 5.26 5.26 5.26 5.26 5.26 5.27 [145] 5.27 5.26 5.27 5.27 5.28 5.29 5.29 5.29 5.29 5.30 5.30 5.30 5.31 5.31 5.31 5.32 5.32 5.33 [163] 5.33 Now I want to conduct a test for stationarity using ADF test : adf.test((data[,1]), stationary, 0) Augmented Dickey-Fuller Test data: (data[, 1]) Dickey-Fuller = -3.7351, Lag order = 0, p-value = 0.02394 alternative hypothesis: stationary But surprisingly it leads towards rejestion of NULL [p-value is less than 0.05], i.e. indicates a possible stationary series. However ploting a graph of actual data set it doesn't seem so. Am I making any mistakes ? Can anyone give me any suggestion? Regards, Megh
Re: [R] an easy way to construct this special matirx
Here are two solutions. In the first lo has TRUE on the lower diagonal and diagonal. Then we compute the exponents, multiplying by lo to zero out the upper triangle. In the second rn is a matrix of row numbers and rn = t(rn) is the same as lo in the first solution. r - 2; n - 5 # test data lo - lower.tri(diag(n), diag = TRUE) lo * r ^ (row(lo) - col(lo) + 1) Here is another one: rn - row(diag(n)) (rn = t(rn)) * r ^ (rn - t(rn) + 1) On 8/15/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi, Sorry if this is a repost. I searched but found no results. I am wondering if it is an easy way to construct the following matrix: r 1 0 00 r^2 r 1 00 r^3 r^2 r 10 r^4 r^3 r^2 r1 where r could be any number. Thanks. Wen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] an easy way to construct this special matirx
Hi wen, I don't think it is easy to construct this matrix in a simple way. I tried and found a way to do it. Try the following codes: i-1:4 j-5 aa-matrix(0,4,5) for (j in 1:5){aa[i,j]-(i+1-j)} r-4 #r could be any number bb-r^aa bb[aa0]=0 bb The matrix bb is what you want. Furthermore,I packaged this process into a function called mtrx as below: mtrx-function(row,clm,r){ i-1:row j-clm aa-matrix(row*clm,row,clm) for (j in 1:clm){aa[i,j]-(i+1-j)} #r could be any number bb-r^aa bb[aa0]=0 bb } Now you can use the function to produce the matrix.The above-mentioned matrix is mtrx(4,5,4) Dejian Zhao On Thu, Aug 16, 2007 11:10, [EMAIL PROTECTED] wrote: Hi, Sorry if this is a repost. I searched but found no results. I am wondering if it is an easy way to construct the following matrix: r 1 0 00 r^2 r 1 00 r^3 r^2 r 10 r^4 r^3 r^2 r1 where r could be any number. Thanks. Wen [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- De-Jian Zhao Institute of Zoology,Chinese Academy of Sciences +86-10-64807217 [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Polynomial fitting
It is easier to use poly(raw=TRUE), and better to use poly() with orthogonal polynomials. The original poster shows signs of having read neither the help for predict.lm nor the posting guide, and so almost certainly misused the predict method. On Thu, 16 Aug 2007, Jon Minton wrote: Remember that polynomials of the form y = b1*x + b2*x^2 + ... + bm*x^m fit the linear regression equation form Y = beta_1*x_1 + beta_2*x_2 + ... + beta_m*x_m If one sets (from the 1st to the 2nd equation) x - x_1 x^2 - x_2 x^3 - x_3 etc. In R this is easy, just use the identity operator I() when specifying the equation. e.g. for a 3rd order polynomial: model - lm(Y ~ x + I(x^2) + I(x^3) + I(x^4)) hth, Jon *** I'm looking some way to do in R a polynomial fit, say like polyfit function of Octave/MATLAB. For who don't know, c = polyfit(x,y,m) finds the coefficients of a polynomial p(x) of degree m that fits the data, p(x[i]) to y[i], in a least squares sense. The result c is a vector of length m+1 containing the polynomial coefficients in descending powers: p(x) = c[1]*x^n + c[2]*x^(n-1) + ... + c[n]*x + c[n+1] For prediction, one can then use function polyval like the following: y0 = polyval( polyfit( x, y, degree ), x0 ) y0 are the prediction values at points x0 using the given polynomial. In R, we know there is lm for 1-degree polynomial: lm( y ~ x ) == polyfit( x, y, 1 ) and for prediction I can just create a function like: lsqfit - function( model, xx ) return( xx * coefficients(model)[2] + coefficients(model)[1] ); and then: y0 - lsqfit(x0) (I've tried with predict.lm( model, newdata=x0 ) but obtain a bad result) For a degree greater than 1, say m, what can I use.?? I've tried with lm( y ~ poly(x, degree=m) ) I've also looked at glm, nlm, approx, ... but with these I can't specify the polynomial degree. Thank you so much! Sincerely, -- Marco -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] time series with quality codes
In addition, we could create a function to.df which converts a zoo object to a data frame assuming that any column that only contains 1:nlevels is a factor with the indicated level names. Use to.df just before plotting: library(zoo) set.seed(1) f - zoo(factor(sample(3, 10, replace = TRUE))) x - zoo(rnorm(10)) y - zoo(rnorm(10)) z - merge(x, y, f) to.df - function(z, levels = letters[1:3], time = FALSE) { zz - as.data.frame(z) for(i in ncol(zz)) if (all(zz[,i] %in% seq_along(levels))) z[,i] - factor(levels[z[,i]]) if (time) cbind(index = index(z), zz) else zz } library(lattice) xyplot(y ~ x | f, data = to.df(z)) On 8/16/07, Achim Zeileis [EMAIL PROTECTED] wrote: On Thu, 16 Aug 2007, Felix Andrews wrote: list(...), I am working with environmental time series (eg rainfall, stream flow) that have attached quality codes for each data point. The quality codes have just a few factor levels, like good, suspect, poor, imputed. I use the quality codes in plots and summaries. They are carried through when a time series is aggregated to a longer time-step, according to rules like worst, median or mode. I need to support time steps of anything from hours to years. I can assume the data are regular time series -- they might be irregular initially but could be 'regularized'. But I would want to plot irregular time series along with regular ones. So far I have been using a data frame with a POSIXct column, a numeric column and a factor column. However I would like to use zoo instead, because of its many utility functions and easy conversion to ts. Is there any prospect of zoo handling such numeric + factor data? Other suggestions on elegant ways to do it are also welcome. There is some limited support for this in zoo. You can do z - zoo(myfactor, myindex) and work with it like a zoo series and then coredata(z) will recover a factor. However, you cannot bind this to other series without losing the factor structure. At least not in a plain zoo series. But you can do df - merge(z, Z, retclass = data.frame) where every column of the resulting data.frame is a univariate zoo series. The final option would be to just have a data.frame as usual and put your data/index into one column. But then it's more difficult to leverage zoo's functionality. I would like to have more support for things like this, but currently this is what we have. Best, Z Felix -- Felix Andrews / �� PhD candidate Integrated Catchment Assessment and Management Centre The Fenner School of Environment and Society The Australian National University (Building 48A), ACT 0200 Beijing Bag, Locked Bag 40, Kingston ACT 2604 http://www.neurofractal.org/felix/ xmpp:[EMAIL PROTECTED] 3358 543D AAC6 22C2 D336 80D9 360B 72DD 3E4C F5D8 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] CCC statistic in cluster analysis
Has anyone implemented CCC statistic (Sarle, 1983) in R? If so I would greatly appreciate the relevant script file. Any help will be much appreciated Best regards, Silvia Mg. Silvia Graciela Valdano Departamento de Ciencias Naturales Facultad de Cs. Exactas, Físico-Químicas y Naturales Universidad Nacional de Río Cuarto [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] several plots on several pages
Hi version _ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 5.1 year 2007 month 06 day27 svn rev42083 language R version.string R version 2.5.1 (2007-06-27) I want to create a pdf withe three graphs on a page and with two pages: - | 1 | - | 2 | - | 3 | - NEW PAGE - | 4 | - | 5 | - | 6 | - Graph 1 should ALWAYS be at that spot, graph two also, even if graph one produces an error when plotting (the area can be empty, but doesn't have to.) I produced the foolowing code below, but I have a few problems: 1) how can I create a new page in the pdf? 2) how can I make sure that the second graph is in position 2 when graph one produces an error when plotting I(as in the example)? Everything works OK (for the firsat page) when graph one is plotted. I have the feeling, that I am thinking to complicated. Any help welcome, Rainer pdf(test.pdf) try( { ## Set layout to three rows and only oine column par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) ) ## First row par(mfg=c(1,1)) try( plot(runif(ff)) ) ## Second row par(mfg=c(2,1)) try( plot(runif(100)) ) ## Third row par(mfg=c(3,1)) plot(runif(1000)) ## THE NEXT THREE SHOULD BE ON A NEW PAGE IN THE PDF ## Set layout to three rows and only oine column par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) ) ## First row par(mfg=c(1,1)) try( plot(runif(ff)) ) ## Second row par(mfg=c(2,1)) try( plot(runif(100)) ) ## Third row par(mfg=c(3,1)) plot(runif(1000)) } ) dev.off() -- NEW EMAIL ADDRESS AND ADDRESS: [EMAIL PROTECTED] [EMAIL PROTECTED] WILL BE DISCONTINUED END OF MARCH Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation Biology (UCT) Plant Conservation Unit Department of Botany University of Cape Town Rondebosch 7701 South Africa Tel:+27 - (0)21 650 5776 (w) Fax:+27 - (0)86 516 2782 Fax:+27 - (0)21 650 2440 (w) Cell: +27 - (0)83 9479 042 Skype: RMkrug email: [EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] several plots on several pages
Dear Rainer, Have you considered using Sweave? HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 [EMAIL PROTECTED] www.inbo.be Do not put your faith in what statistics say until you have carefully considered what they do not say. ~William W. Watt A statistical analysis, properly conducted, is a delicate dissection of uncertainties, a surgery of suppositions. ~M.J.Moroney -Oorspronkelijk bericht- Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Namens Rainer M. Krug Verzonden: donderdag 16 augustus 2007 14:58 Aan: r-help Onderwerp: [R] several plots on several pages Hi version _ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 5.1 year 2007 month 06 day27 svn rev42083 language R version.string R version 2.5.1 (2007-06-27) I want to create a pdf withe three graphs on a page and with two pages: - | 1 | - | 2 | - | 3 | - NEW PAGE - | 4 | - | 5 | - | 6 | - Graph 1 should ALWAYS be at that spot, graph two also, even if graph one produces an error when plotting (the area can be empty, but doesn't have to.) I produced the foolowing code below, but I have a few problems: 1) how can I create a new page in the pdf? 2) how can I make sure that the second graph is in position 2 when graph one produces an error when plotting I(as in the example)? Everything works OK (for the firsat page) when graph one is plotted. I have the feeling, that I am thinking to complicated. Any help welcome, Rainer pdf(test.pdf) try( { ## Set layout to three rows and only oine column par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) ) ## First row par(mfg=c(1,1)) try( plot(runif(ff)) ) ## Second row par(mfg=c(2,1)) try( plot(runif(100)) ) ## Third row par(mfg=c(3,1)) plot(runif(1000)) ## THE NEXT THREE SHOULD BE ON A NEW PAGE IN THE PDF ## Set layout to three rows and only oine column par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) ) ## First row par(mfg=c(1,1)) try( plot(runif(ff)) ) ## Second row par(mfg=c(2,1)) try( plot(runif(100)) ) ## Third row par(mfg=c(3,1)) plot(runif(1000)) } ) dev.off() -- NEW EMAIL ADDRESS AND ADDRESS: [EMAIL PROTECTED] [EMAIL PROTECTED] WILL BE DISCONTINUED END OF MARCH Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation Biology (UCT) Plant Conservation Unit Department of Botany University of Cape Town Rondebosch 7701 South Africa Tel: +27 - (0)21 650 5776 (w) Fax: +27 - (0)86 516 2782 Fax: +27 - (0)21 650 2440 (w) Cell: +27 - (0)83 9479 042 Skype:RMkrug email:[EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] an easy way to construct this special matirx
Hi Gabor, I am glad to see your answer,which gives a hope to resove this question in an easy way. I replied to this question in a more complex way before seeing your answer. However,I think your code needs some revision, because the original matrix is not a diagonal matrix. It has 4 rows and 5 columns.Looking forward to your revised codes. Best regards, On Thu, Aug 16, 2007 20:22, Gabor Grothendieck wrote: Here are two solutions. In the first lo has TRUE on the lower diagonal and diagonal. Then we compute the exponents, multiplying by lo to zero out the upper triangle. In the second rn is a matrix of row numbers and rn = t(rn) is the same as lo in the first solution. r - 2; n - 5 # test data lo - lower.tri(diag(n), diag = TRUE) lo * r ^ (row(lo) - col(lo) + 1) Here is another one: rn - row(diag(n)) (rn = t(rn)) * r ^ (rn - t(rn) + 1) On 8/15/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi, Sorry if this is a repost. I searched but found no results. I am wondering if it is an easy way to construct the following matrix: r 1 0 00 r^2 r 1 00 r^3 r^2 r 10 r^4 r^3 r^2 r1 where r could be any number. Thanks. Wen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- De-Jian Zhao Institute of Zoology,Chinese Academy of Sciences +86-10-64807217 [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combine matrix
let say something like this a=matrix(1:25, nrow=5) rownames(a)=letters[1:5] colnames(a)=rep(A, 5) a A A A A A a 1 6 11 16 21 b 2 7 12 17 22 c 3 8 13 18 23 d 4 9 14 19 24 e 5 10 15 20 25 b=matrix(1:40, nrow=8) rownames(b)=c(rep(a,4),rep(b,4)) colnames(b)=rep(B, 5) b B B B B B a 1 9 17 25 33 a 2 10 18 26 34 a 3 11 19 27 35 a 4 12 20 28 36 b 5 13 21 29 37 b 6 14 22 30 38 b 7 15 23 31 39 b 8 16 24 32 40 as a results I wold like something like A A A A A B B B B B a 1 6 11 16 21 1 9 17 25 33 a 1 6 11 16 21 2 10 18 26 34 a 1 6 11 16 21 3 11 19 27 35 a 1 6 11 16 21 4 12 20 28 36 b 2 7 12 17 22 5 13 21 29 37 b 2 7 12 17 22 6 14 22 30 38 b 2 7 12 17 22 7 15 23 31 39 b 2 7 12 17 22 8 16 24 32 40 does it is clear? is there a function that automate this operation? thank you very much! On 8/16/07, jim holtman [EMAIL PROTECTED] wrote: Can you provide an example of what you mean; e.g., the two input matrices and the desired output. On 8/16/07, Gianni Burgin [EMAIL PROTECTED] wrote: Hi R user, I am new to R, and I have a very simple question for you. I have two matrix A and B, with internally redundant rownames (but variables are different). Some, but not all the rownames are shared among the two matrix. I want to create a greater matrix that combines the previuos two, and has all the possible combinations of matching rownames lines among matrix A and B. looking for the solution I bumped in merge but actually works on data.frame, and in dataframe there could be no redundancy in names. can you help me?? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about sm.options sm.survival
Hi, there: It's my first time to post question in this forum, so thanks for your tolerance if my question is too naive. I am using a nonparametric smoothing procedure in sm package to generate smoothed survival curves for continuous covariate. I want to truncate the suvival curve and only display the part with covariate value between 0 and 7. The following is the code I wrote: sm.options(list(xlab=log_BSI_min3_to_base, xlim=c(0,7), ylab=Median Progression Prob)) sm.survival(min3.base.prog.cen[,3],min3.base.prog.cen[,2],min3.base.prog.cen[,1],h=sd(min3.base.prog.cen[,3]),status.code=1 ) But the xlim option does not work. Can anyone help me with this problem? Thanks a lot. Rachel -- View this message in context: http://www.nabble.com/Question-about-sm.options---sm.survival-tf4279605.html#a12181263 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] several plots on several pages
Hi Thierry ONKELINX, Thierry wrote: Dear Rainer, Have you considered using Sweave? No - and I am sure it will do what I want, but I guess it might be an overkill. These arew just draft outputs for myself for different datasets which should be easy to compare. SO I guess that Sweave might be an overkill (especially as I found plot.new() which jumpd to a new page). Thanks and I will keep it in mind for the future, Rainer HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 [EMAIL PROTECTED] www.inbo.be Do not put your faith in what statistics say until you have carefully considered what they do not say. ~William W. Watt A statistical analysis, properly conducted, is a delicate dissection of uncertainties, a surgery of suppositions. ~M.J.Moroney -Oorspronkelijk bericht- Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Namens Rainer M. Krug Verzonden: donderdag 16 augustus 2007 14:58 Aan: r-help Onderwerp: [R] several plots on several pages Hi version _ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 5.1 year 2007 month 06 day27 svn rev42083 language R version.string R version 2.5.1 (2007-06-27) I want to create a pdf withe three graphs on a page and with two pages: - | 1 | - | 2 | - | 3 | - NEW PAGE - | 4 | - | 5 | - | 6 | - Graph 1 should ALWAYS be at that spot, graph two also, even if graph one produces an error when plotting (the area can be empty, but doesn't have to.) I produced the foolowing code below, but I have a few problems: 1) how can I create a new page in the pdf? 2) how can I make sure that the second graph is in position 2 when graph one produces an error when plotting I(as in the example)? Everything works OK (for the firsat page) when graph one is plotted. I have the feeling, that I am thinking to complicated. Any help welcome, Rainer pdf(test.pdf) try( { ## Set layout to three rows and only oine column par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) ) ## First row par(mfg=c(1,1)) try( plot(runif(ff)) ) ## Second row par(mfg=c(2,1)) try( plot(runif(100)) ) ## Third row par(mfg=c(3,1)) plot(runif(1000)) ## THE NEXT THREE SHOULD BE ON A NEW PAGE IN THE PDF ## Set layout to three rows and only oine column par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) ) ## First row par(mfg=c(1,1)) try( plot(runif(ff)) ) ## Second row par(mfg=c(2,1)) try( plot(runif(100)) ) ## Third row par(mfg=c(3,1)) plot(runif(1000)) } ) dev.off() -- NEW EMAIL ADDRESS AND ADDRESS: [EMAIL PROTECTED] [EMAIL PROTECTED] WILL BE DISCONTINUED END OF MARCH Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation Biology (UCT) Plant Conservation Unit Department of Botany University of Cape Town Rondebosch 7701 South Africa Tel: +27 - (0)21 650 5776 (w) Fax: +27 - (0)86 516 2782 Fax: +27 - (0)21 650 2440 (w) Cell:+27 - (0)83 9479 042 Skype: RMkrug email: [EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- NEW EMAIL ADDRESS AND ADDRESS: [EMAIL PROTECTED] [EMAIL PROTECTED] WILL BE DISCONTINUED END OF MARCH Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation Biology (UCT) Plant Conservation Unit Department of Botany University of Cape Town Rondebosch 7701 South Africa Tel:+27 - (0)21 650 5776 (w) Fax:+27 - (0)86 516 2782 Fax:+27 - (0)21 650 2440 (w) Cell: +27 - (0)83 9479 042 Skype: RMkrug email: [EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regression tree: labels in the terminal nodes
On Thu, 16 Aug 2007, Juergen Kuehn wrote: Dear everybody, I'm a new user of R 2.4.1 and I'm searching for information on improving the output of regression tree graphs. In the terminal nodes I am up to now able to indicate the number of values (n) and the mean of all values in this terminal node by the command text(tree, use.n=T, xpd=T) Note that this is specific to the rpart implementation. There are also other recursive partitioning algorithms available, see e.g., the MachineLearning task view at http://CRAN.R-project.org/src/contrib/Views/MachineLearning.html The ctree() algorithm in package party provides more flexibility concerning visualization. The plotting methods are all extensible although this is not extensively documented. Maybe you also want to look at Simon Urbanek's interactive KLIMT software for tree visualization. hth, Z Yet I would like to indicate automatically in the output graph of the tree some quality measure, e.g. impurity (or standard deviation .) and a character to identify which cases are in one terminal node, e.g. given a ID number or name. Until now I did not discover in my R help scripts or in the R programm help how to do it. So I calculate impurity by hand, and I'm identifying the cases in each node by hand, and I am adding these values with a graphical software to the graph (as e.g. given *.jpg file). Therefore anybody can see that I added these values by hand. I'm quite sure that there is a more easy and faster way which looks (and is!) more professional. Could anybody help me? That would be great! Thank you very much for your support! Dr. Jürgen Kühn Leibniz-Centre for Agricultural Landscape Research (ZALF) Institute of Soil Landscape Research_ http://www.zalf.de/home_zalf/institute/blf/blf_e/index.html_ Eberswalder Str. 84 D-15374 Müncheberg Tel.: ++49/(0)33432/82-123 Fax: ++49/(0)33432/82-280 E-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Internet: http://www.zalf.de/home_zalf/institute/blf/blf/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multiple colors within same line of text
Hi, I'm interested in using mtext(), but with the option of having multiple colors in the same line of text. For example, creating a line of text where: Red is red and blue is blue How do you create a text argument that lets you do this within mtext()? Thanks, Andrew MGH Cancer Center [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] an easy way to construct this special matirx
It was pointed out that the required matrix may not be square and the superdiagonal was missing in my prior post. Here is a revision: r - 2; nr - 4; nc - 5 # test data x - matrix(nr = nr, nc = nc) x - row(x) - col(x) + 1 (x = 0) * r ^ x On 8/16/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: Here are two solutions. In the first lo has TRUE on the lower diagonal and diagonal. Then we compute the exponents, multiplying by lo to zero out the upper triangle. In the second rn is a matrix of row numbers and rn = t(rn) is the same as lo in the first solution. r - 2; n - 5 # test data lo - lower.tri(diag(n), diag = TRUE) lo * r ^ (row(lo) - col(lo) + 1) Here is another one: rn - row(diag(n)) (rn = t(rn)) * r ^ (rn - t(rn) + 1) On 8/15/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi, Sorry if this is a repost. I searched but found no results. I am wondering if it is an easy way to construct the following matrix: r 1 0 00 r^2 r 1 00 r^3 r^2 r 10 r^4 r^3 r^2 r1 where r could be any number. Thanks. Wen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combine matrix
[Gianni Burgin] let say something like this a=matrix(1:25, nrow=5) rownames(a)=letters[1:5] colnames(a)=rep(A, 5) a A A A A A a 1 6 11 16 21 b 2 7 12 17 22 c 3 8 13 18 23 d 4 9 14 19 24 e 5 10 15 20 25 b=matrix(1:40, nrow=8) rownames(b)=c(rep(a,4),rep(b,4)) colnames(b)=rep(B, 5) b B B B B B a 1 9 17 25 33 a 2 10 18 26 34 a 3 11 19 27 35 a 4 12 20 28 36 b 5 13 21 29 37 b 6 14 22 30 38 b 7 15 23 31 39 b 8 16 24 32 40 as a results I wold like something like A A A A A B B B B B a 1 6 11 16 21 1 9 17 25 33 a 1 6 11 16 21 2 10 18 26 34 a 1 6 11 16 21 3 11 19 27 35 a 1 6 11 16 21 4 12 20 28 36 b 2 7 12 17 22 5 13 21 29 37 b 2 7 12 17 22 6 14 22 30 38 b 2 7 12 17 22 7 15 23 31 39 b 2 7 12 17 22 8 16 24 32 40 does it is clear? is there a function that automate this operation? Like, maybe: cbind(a[rownames(b),], b) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] residual plots for lmer in lme4 package
Hi Margaret, Have a look at qqmath in the lattice package. ?qqmath Hank On Aug 16, 2007, at 2:45 AM, Margaret Gardiner-Garden wrote: Hi, I was wondering if I might be able to ask some advice about doing residual plots for the lmer function in the lme4 package. Our group's aim is to find if the expression staining of a particular gene in a sample (or core) is related to the pathology of the core. To do this, we used the lmer function to perform a logistic mixed model below. I apologise in advance for the lack of subscripts. logit P(yij=1) = â0 + Ui + â1Patholij where Ui~N(0, óu2), i indexes patient, j indexes measurement, Pathol is an indicator variable (0,1) for benign epithelium versus cancer and yij is the staining indicator (0,1) for each core where yij equals 1 if the core stains positive and 0 otherwise. (I have inserted some example R code at the end of this message) I was wondering if you knew how I could test that the errors Ui are normally distributed in my fit. I am not familiar with how to do residual plots for a mixed logistic regression (or even for any logistic regression!). Any advice would be greatly appreciated! Thanks and Regards Marg Example code: lmer(Intensity.over2.hyp.canc~Pathology + (1|Patient.ID), data= HSD17beta4.hyp.canc, family=binomial, na.action=na.omit) #Family: binomial(logit link) #AIC BIClogLik deviance # 414.1101 431.4147 -203.0550 406.1101 #Random effects: # GroupsNameVarianceStd.Dev. # Patient.ID (Intercept) 4.9558 2.2262 # of obs: 559, groups: Patient.ID, 177 #Estimated scale (compare to 1) 0.6782544 #Fixed effects: #Estimate Std. Error z value Pr(|z|) #(Intercept) -2.057340.24881 -8.2686 2.2e-16 *** #PathologyHyperplasia -1.766270.44909 -3.9330 8.389e-05 *** NB. Intensity.over2.hyp.canc is the staining of the core (ie 0 or 1) Pathology is Hyperplasia or Cancer Dr Margaret Gardiner-Garden Garvan Institute of Medical Research 384 Victoria Street Darlinghurst Sydney NSW 2010 Australia Phone: 61 2 9295 8348 Fax: 61 2 9295 8321 [[alternative HTML version deleted]] ATT1 Dr. Hank Stevens, Associate Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/~stevenmh/ http://www.muohio.edu/ecology/ http://www.muohio.edu/botany/ E Pluribus Unum If you send an attachment, please try to send it in a format anyone can read, such as PDF, text, Open Document Format, HTML, or RTF. Please try not to send me MS Word or PowerPoint attachments- Why? See: http://www.gnu.org/philosophy/no-word-attachments.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] to combine bwplot + srt option?
On 8/16/07, KOITA Lassana - STAC/ACE [EMAIL PROTECTED] wrote: Thank you for you quite and useful explanation. And do know how to sort them by median? See ?reorder.factor Note that traditional practice with bwplot() is to have the categorical variable on the y-axis, in which case the default string rotation is optimal. -Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error message when using zero-inflated count regression model in package zicounts
Dr. Stevens, I've double-checked my variable lengths. All of my variables (Total.vines, Site, Species, and DBH) came in at 549. I did correct one problem in the data entry that had escaped my previous notice: somehow the undergrad who entered all the data managed to make the Acer negundo data split into two separate categories while still appearing to use the same ACNE abbreviation. When I made that correction and re-ran zicounts, R gave me the following error messages: vines.zip-zicounts(resp=Total.vines~.,x=~Site+Species+DBH,z=~Site +Species+DBH,distname=ZIP,data=sycamores.1) Error in ifelse(y == 0, 1, y/mu) : dim-: dims [product 12] do not match the length of object [549] In addition: Warning messages: 1: longer object length is not a multiple of shorter object length in: eta + offset 2: longer object length is not a multiple of shorter object length in: y/mu In addition, zicounts would not run a normal poisson regression on the data, giving me the same error messages as the ZIP regression. Doing a poisson regression with glm did not show any error messages. However, the glm model with full interactions was still over-dispersed. Could the zicounts problem be that the individual sites and species had different population sizes? For instance, Site A had 149 trees, site B had 55 trees, site C had 270 trees, and site D had 75 trees. The species had similar discrepancies in population sizes, with Platanus occidentalis and Acer negundo forming the majority of the trees. Thanks for your help. Jim Milks Graduate Student Environmental Sciences Ph.D. Program 136 Biological Sciences Wright State University 3640 Colonel Glenn Hwy Dayton, OH 45435 On Aug 15, 2007, at 5:58 AM, Martin Henry H. Stevens wrote: Hi Jim, With regard to same number, I simply wanted to make sure that each variable was the same length. The error message you show is what you would get if, for instance, you misspelled one of the variables and it doesn't exist, in which case it would be NULL, while your other variables would each be 550 elements in length. Hank On Aug 14, 2007, at 4:47 PM, James Milks wrote: Dr. Stevens, Unfortunately, Poisson gives me an over-dispersed model with only 3 out of 14 variables/interactions significant. Doing a step-wise poisson regression still ended up with the same over-dispersed model. Given the high number of zeros in the response variable, Dr. Thad Tarpey (one of our statisticians on campus) suggested zero-inflated poisson regression as a possible solution to the over-dispersion problem. As for variables of the same length, there are different numbers of trees for each species and site since we ran different numbers of transects at each site (some sites were larger than others) and there were different numbers of species and trees within each transect. Acer negundo made up ~33% of all the trees we measured; Platanus occidentalis had 25%; Fraxinus americana was another 12% and ~11% was Ulmus americana. The remaining 25% was divided among 16 other species, all of which were excluded from the analysis due to singularities in the model when they were included (something about glm not liking singularities in the model). So if the zicounts requires that each species and site have the same length, then I will not be able to use it unless I can get R to randomly select x trees from each species and site combination. Thanks for your input. Jim Milks Graduate Student Environmental Sciences Ph.D. Program 136 Biological Sciences Wright State University 3640 Colonel Glenn Hwy Dayton, OH 45435 On Aug 14, 2007, at 9:37 AM, Hank Stevens wrote: Hi Jim, Two thoughts come to me, unencumbered by the thought process or knowledge of zicounts: 1. Is Poisson really NOT appropriate? (do you have to use zicounts?) 2. Are you 110% certain that all variables are the same length? Would NA's interfere? Cheers, Hank On Aug 13, 2007, at 5:10 PM, James Milks wrote: I have data on number of vines per tree for ~550 trees. Over half of the trees did not have any vines and the data is fairly skewed (median = 0, mean = 1.158, 3rd qu. = 1.000). I am attempting to investigate whether plot location (four sites), species (I'm using only the four most common species), or tree dbh has a significant influence on the number of vines per tree. When I attempted to use the zicounts function, R gave me the following error message: vines.zip-zicounts(resp=Total.vines~.,x=~Site+Species+DBH,z=~Site +Species+DBH,distrname=ZIP,data=sycamores.1) Error in ifelse(y == 0, 1, y/mu) : dim- : dims [product 12] do not match the length of object [549] In addition: Warning messages: 1: longer object length is not a multiple of shorter object length in: x[good, ] * w 2: longer object length is not a multiple of shorter object length in: eta + offset 3: longer object
[R] Newbie
Hello, I'm a bit new to the world of R so forgive my ignorance. I'm trying to do a zero-inflated negative binomial regression and have received an error message and i'm not sure what it means. I'm running R 2.5.1 on XP. I have just tried a really simple version of the model to see if it would run before I put all the variables in. I have attached all the variables to the object alan. Here is the table: Date Location Deer Code Sex Age Pel Per Lac Scars Emb Lar Nym Bee Mi Mass 1 12/06/2007 Ballysallagh0 B1 FALSE 1 A 1 1 4 0 0 0 0 0 21.59 2 15/06/2007 Ballysallagh0 B10 FALSE 1 A 1 1 5 0 0 0 0 2 26.19 3 15/06/2007 Ballysallagh0 B12 FALSE 1 A 1 1 0 5 0 0 0 1 28.55 4 15/06/2007 Ballysallagh0 B13 FALSE 1 A 0 1 5 0 0 0 4 45 23.93 5 15/06/2007 Ballysallagh0 B16 FALSE 1 A 1 1 0 4 0 0 0 1 34.19 6 15/06/2007 Ballysallagh0 B17 FALSE 1 A 1 1 5 0 0 0 0 0 25.02 7 15/06/2007 Ballysallagh0 B18 FALSE 1 A 1 1 5 0 0 0 0 7 33.06 8 12/06/2007 Ballysallagh0 B5 FALSE 1 A 1 1 4 0 0 0 0 2 23.55 9 12/06/2007 Ballysallagh0 B6 FALSE 1 A 1 1 4 0 0 0 0 2 22.67 10 12/06/2007 Ballysallagh0 B7 FALSE 1 A 1 1 5 0 0 0 0 2 22.57 11 12/06/2007 Ballysallagh0 B8 FALSE 1 A 1 1 5 0 0 0 0 10 24.01 12 24/06/2007 Caledon1 C12 FALSE 1 A 1 0 0 0 1 0 0 3 21.68 13 25/06/2007 Caledon1 C17 FALSE 1 A 1 0 4 0 1 0 0 23 20.56 14 25/06/2007 Caledon1 C18 FALSE 1 A 1 0 0 0 0 0 2 2 17.47 15 25/06/2007 Caledon1 C20 FALSE 1 A 1 0 8 0 0 0 0 3 19.97 16 25/06/2007 Caledon1 C21 FALSE 1 A 1 0 3 0 2 0 0 20 20.58 17 25/06/2007 Caledon1 C24 FALSE 1 A 1 0 0 4 2 0 0 7 17.37 18 25/06/2007 Caledon1 C27 FALSE 1 A 1 0 0 0 3 0 0 6 24.14 19 25/06/2007 Caledon1 C28 FALSE 1 A 1 0 0 5 1 0 0 6 21.58 20 25/06/2007 Caledon1 C33 FALSE 1 A 0 0 0 0 1 0 0 0 19.06 21 25/06/2007 Caledon1 C35 FALSE 1 A 1 0 3 0 2 0 0 20 24.55 22 24/06/2007 Caledon1 C4 FALSE 1 A 1 1 4 3 0 0 0 5 22.65 23 24/06/2007 Caledon1 C8 FALSE 1 A 0 0 0 0 3 0 0 3 23.08 24 01/06/2007 Hillsborough0 H11 FALSE 1 A 1 1 4 0 0 0 0 0 23.01 25 01/06/2007 Hillsborough0 H16 FALSE 1 A 1 1 5 0 0 0 0 0 18.18 26 01/06/2007 Hillsborough0 H17 FALSE 1 A 1 1 4 0 0 0 0 0 24.45 27 01/06/2007 Hillsborough0 H19 FALSE 1 A 0 1 0 0 0 0 0 3 21.01 28 01/06/2007 Hillsborough0 H2 FALSE 1 A 1 1 5 0 0 0 0 1 24.86 29 01/06/2007 Hillsborough0 H21 FALSE 1 A 1 1 5 0 0 0 0 0 20.48 30 01/06/2007 Hillsborough0 H25 FALSE 1 A 1 1 5 0 0 0 0 0 22.09 31 01/06/2007 Hillsborough0 H26 FALSE 1 A 1 1 5 0 0 0 0 0 20.18 32 01/06/2007 Hillsborough0 H3 FALSE 1 A 0 1 5 0 0 0 0 0 25.82 33 01/06/2007 Hillsborough0 H34 FALSE 1 A 1 0 0 4 0 0 0 1 18.78 34 01/06/2007 Hillsborough0 H35 FALSE 1 A 0 1 4 0 0 0 0 0 20.02 35 01/06/2007 Hillsborough0 H5 FALSE 1 A 0 1 5 0 0 0 0 0 19.58 36 01/06/2007 Hillsborough0 H6 FALSE 1 A 1 1 5 0 0 0 0 6 28.84 37 16/06/2007 LoughNavar1 LN1 FALSE 1 A 1 1 5 0 3 0 0 1 22.71 38 16/06/2007 LoughNavar1 LN10 FALSE 1 A 1 1 5 0 0 0 0 0 23.53 39 16/06/2007 LoughNavar1 LN11 FALSE 1 A 1 1 4 0 0 0 0 2 33.24 40 16/06/2007 LoughNavar1 LN13 FALSE 1 A 1 1 5 0 0 0 0 0 20.20 41 16/06/2007 LoughNavar1 LN14 FALSE 1 A 1 1 0 7 0 0 1 1 31.35 42 16/06/2007 LoughNavar1 LN16 FALSE 1 A 1 1 0 5 0 0 0 2 19.43 43 16/06/2007 LoughNavar1 LN18 FALSE 1 A 1 0 6 0 0 0 0 12 24.30 44 16/06/2007 LoughNavar1 LN23 FALSE 1 A 1 1 5 0 0 0 0 1 24.34 45 16/06/2007 LoughNavar1 LN7 FALSE 1 A 1 1 5 0 1 0 0 0 28.01 46 16/06/2007 LoughNavar1 LN9 FALSE 1 A 1 1 4 0 8 0 0 5 25.43 47 11/07/2007 Mt.Stewart0 M1 FALSE 1 A 1 0 5 0 0 0 0 1 25.31 48 11/07/2007 Mt.Stewart0 M18 FALSE 1 A 1 1 0 0 0 0 0 2 26.66 49 11/07/2007 Mt.Stewart0 M21 FALSE 1 A 0 0 3 0 0 0 0
[R] Trim trailng space from data.frame factor variables
Hi folks, I would like to trim the trailing spaces in my factor variables using lapply (described in this post by Marc Schwartz: http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22826.html) but the code is not functioning (in this example there is only one factor with trailing spaces): y1 - rnorm(20) + 6.8 y2 - rnorm(20) + (1:20*1.7 + 1) y3 - rnorm(20) + (1:20*6.7 + 3.7) y - c(y1,y2,y3) x - gl(5,12) f - gl(3,20, labels=paste(lev, 1:3,, sep=)) d - data.frame(x=x,y=y, f=f) str(d) d[] - lapply(d, function(x) ifelse(is.factor(x), sub( +$, , x), x)) str(d) How should I modify this? -Lauri [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Linear models and a simple time series
Working on modeling a wild animal population. Two data vectors: the herd count from year to year (estimated by a sampling procedure), and the number of animals killed by hunters. Task is to find the natural growth rate of the herd (A simplification, but preserves the essentials.) My question is whether the R procedure lm() is an appropriate tool to estimate the growth rate. year -1991:2007 killed -c(7008,6663,8545,7868,9286,9365,10443,6389,6004,8631,13277,12029,10081,989 9,11023,9926,7000) herdsize -c(50697,54804,46462,42410,43593,42138,43037,44495,45968,47376,45469,38815, 37186,37135,31760,31206,28563) year.0 -which(year==1991) year.1 -year.0+1 year.ult -length(year) year.penult-length(year)-1 y-heardsize[year.1:year.ult] x-herdsize[year.0:year.penult]-killed[year.0:year.penult] LM-lm(y~bb-1) summary(LM) #Call: #lm(formula = y ~ x - 1) # #Residuals: # Min 1Q Median 3QMax #-11893 -1114 1137 3553 6069 # #Coefficients: # Estimate Std. Error t value Pr(|t|) #bb 1.212170.03185 38.05 2.45e-16 *** #--- #Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # #Residual standard error: 4372 on 15 degrees of freedom #Multiple R-Squared: 0.9897, Adjusted R-squared: 0.9891 #F-statistic: 1448 on 1 and 15 DF, p-value: 2.453e-16 The model seems to fit the data very well, and I am willing to believe that a growth rate of 1.21217 gives the best fit in a least-squares sense. However, because the dependent and independent variables are highly correlated, I question whether the variance calculations are accurate in this case. Is lm() really the appropriate tool to be using here? Any insights would be welcome. mail2web.com - Microsoft® Exchange solutions from a leading provider - http://link.mail2web.com/Business/Exchange __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Newbie
I'm a bit new to the world of R so forgive my ignorance. That has nothing to do with the knowledge of R but with the model. Age has only 2 different values: 0 and 1 and if it is 0 there is no scars, so what exactly have you expected from the model? I would say if you just want to prove that older deer have more scars try the Mann Whitney non parametric test... Stefan I'm trying to do a zero-inflated negative binomial regression and have received an error message and i'm not sure what it means. I'm running R 2.5.1 on XP. I have just tried a really simple version of the model to see if it would run before I put all the variables in. I have attached all the variables to the object alan. Here is the table: Date Location Deer Code Sex Age Pel Per Lac Scars Emb Lar Nym Bee Mi Mass 1 12/06/2007 Ballysallagh0 B1 FALSE 1 A 1 1 4 0 0 0 0 0 21.59 2 15/06/2007 Ballysallagh0 B10 FALSE 1 A 1 1 5 0 0 0 0 2 26.19 3 15/06/2007 Ballysallagh0 B12 FALSE 1 A 1 1 0 5 0 0 0 1 28.55 4 15/06/2007 Ballysallagh0 B13 FALSE 1 A 0 1 5 0 0 0 4 45 23.93 5 15/06/2007 Ballysallagh0 B16 FALSE 1 A 1 1 0 4 0 0 0 1 34.19 6 15/06/2007 Ballysallagh0 B17 FALSE 1 A 1 1 5 0 0 0 0 0 25.02 7 15/06/2007 Ballysallagh0 B18 FALSE 1 A 1 1 5 0 0 0 0 7 33.06 8 12/06/2007 Ballysallagh0 B5 FALSE 1 A 1 1 4 0 0 0 0 2 23.55 9 12/06/2007 Ballysallagh0 B6 FALSE 1 A 1 1 4 0 0 0 0 2 22.67 10 12/06/2007 Ballysallagh0 B7 FALSE 1 A 1 1 5 0 0 0 0 2 22.57 11 12/06/2007 Ballysallagh0 B8 FALSE 1 A 1 1 5 0 0 0 0 10 24.01 12 24/06/2007 Caledon1 C12 FALSE 1 A 1 0 0 0 1 0 0 3 21.68 13 25/06/2007 Caledon1 C17 FALSE 1 A 1 0 4 0 1 0 0 23 20.56 14 25/06/2007 Caledon1 C18 FALSE 1 A 1 0 0 0 0 0 2 2 17.47 15 25/06/2007 Caledon1 C20 FALSE 1 A 1 0 8 0 0 0 0 3 19.97 16 25/06/2007 Caledon1 C21 FALSE 1 A 1 0 3 0 2 0 0 20 20.58 17 25/06/2007 Caledon1 C24 FALSE 1 A 1 0 0 4 2 0 0 7 17.37 18 25/06/2007 Caledon1 C27 FALSE 1 A 1 0 0 0 3 0 0 6 24.14 19 25/06/2007 Caledon1 C28 FALSE 1 A 1 0 0 5 1 0 0 6 21.58 20 25/06/2007 Caledon1 C33 FALSE 1 A 0 0 0 0 1 0 0 0 19.06 21 25/06/2007 Caledon1 C35 FALSE 1 A 1 0 3 0 2 0 0 20 24.55 22 24/06/2007 Caledon1 C4 FALSE 1 A 1 1 4 3 0 0 0 5 22.65 23 24/06/2007 Caledon1 C8 FALSE 1 A 0 0 0 0 3 0 0 3 23.08 24 01/06/2007 Hillsborough0 H11 FALSE 1 A 1 1 4 0 0 0 0 0 23.01 25 01/06/2007 Hillsborough0 H16 FALSE 1 A 1 1 5 0 0 0 0 0 18.18 26 01/06/2007 Hillsborough0 H17 FALSE 1 A 1 1 4 0 0 0 0 0 24.45 27 01/06/2007 Hillsborough0 H19 FALSE 1 A 0 1 0 0 0 0 0 3 21.01 28 01/06/2007 Hillsborough0 H2 FALSE 1 A 1 1 5 0 0 0 0 1 24.86 29 01/06/2007 Hillsborough0 H21 FALSE 1 A 1 1 5 0 0 0 0 0 20.48 30 01/06/2007 Hillsborough0 H25 FALSE 1 A 1 1 5 0 0 0 0 0 22.09 31 01/06/2007 Hillsborough0 H26 FALSE 1 A 1 1 5 0 0 0 0 0 20.18 32 01/06/2007 Hillsborough0 H3 FALSE 1 A 0 1 5 0 0 0 0 0 25.82 33 01/06/2007 Hillsborough0 H34 FALSE 1 A 1 0 0 4 0 0 0 1 18.78 34 01/06/2007 Hillsborough0 H35 FALSE 1 A 0 1 4 0 0 0 0 0 20.02 35 01/06/2007 Hillsborough0 H5 FALSE 1 A 0 1 5 0 0 0 0 0 19.58 36 01/06/2007 Hillsborough0 H6 FALSE 1 A 1 1 5 0 0 0 0 6 28.84 37 16/06/2007 LoughNavar1 LN1 FALSE 1 A 1 1 5 0 3 0 0 1 22.71 38 16/06/2007 LoughNavar1 LN10 FALSE 1 A 1 1 5 0 0 0 0 0 23.53 39 16/06/2007 LoughNavar1 LN11 FALSE 1 A 1 1 4 0 0 0 0 2 33.24 40 16/06/2007 LoughNavar1 LN13 FALSE 1 A 1 1 5 0 0 0 0 0 20.20 41 16/06/2007 LoughNavar1 LN14 FALSE 1 A 1 1 0 7 0 0 1 1 31.35 42 16/06/2007 LoughNavar1 LN16 FALSE 1 A 1 1 0 5 0 0 0 2 19.43 43 16/06/2007 LoughNavar1 LN18 FALSE 1 A 1 0 6 0 0 0 0 12 24.30 44 16/06/2007 LoughNavar1 LN23 FALSE 1 A 1 1 5 0 0 0 0 1 24.34 45 16/06/2007 LoughNavar1 LN7
Re: [R] Newbie
I would say if you just want to prove that older deer have more scars try the Mann Whitney non parametric test... Forgive me but even that does not really make sense since the values are all 0 so it is to obvious... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Possible memory leak with R v.2.5.0
Hi Peter -- Here's my guess. Ironically, adding things to broken code reduces the signal to noise ratio. I ended up with get.vars.for.cluster = function( cluster, genes=get.global(gene.ids ), ratios=get.global(ratios)) { cluster - cluster rows - cluster$rows cols - cluster$cols r - ratios[ rows, cols ] avg.rows - apply( r, 2, mean, na.rm=TRUE ) r.all - ratios[ genes, cols ] devs - apply( r.all, 1, -, avg.rows ) apply( devs, 2, var, na.rm=TRUE ) } at what might reproduce your problem (though can't be sure!). The unusual bit is cluster - cluster At first I thought this would be a no-op (assigning cluster to itself), but apparently at this point in the code cluster does not exist in the environment of the function (just in the call) so cluster gets assigned outside the function. So then my guess is that get.vars.for.cluster is part of a package, and the package has a variable called cluster. get.vars.for.cluster then assigns its first argument to the package variable cluster (which is the first variable called cluster that - encounters). rm(list=ls(all=TRUE)) removes everything from the global environment, but (fortunately!) not from the package environment. You might end up storing more than 'just' cluster, depending on what it is. So I think the solution is to rethink the use of - (and also the get.global(), which are either for convenience (in which case it would probably be better to specify a default for the function argument) or out of a sense that copying is bad (but this is probably mistaken, since R's semantics are 'copy on change', so passing a 'big' object into a function does not usually trigger a copy)). You could also try 'detach'ing the package that get.vars.for.cluster is defined in. Hope that points in the right direction, Martin Peter Waltman [EMAIL PROTECTED] writes: I'm working with a very large matrix ( 22k rows x 2k cols) of RNA expression data with R v.2.5.0 on a RedHat Enterprise machine, x86_64 architecture. The relevant code is below, but I call a function that takes a cluster of this data ( a list structure that contains a $rows elt which lists the rows (genes ) in the cluster by ID, but not the actual data itself ). The function creates two copies of the matrix, one containing the rows in the cluster, and one with the rest of the rows in the matrix. After doing some statistical massaging, the function returns a statistical score for each rows/genes in the matrix, producing a vector of 22k elt's. When I run 'top', I see that the memory stamp of R after loading the matrix is ~750M. However, after calling this function on 10 clusters, this jumps to3.7 gig (at least by 'top's measurement), and this will not be reduced by any subsequent calls to gc(). Output from gc() is: gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 377925 20.26819934 364.3 604878 32.4 Vcells 88857341 678.0 240204174 1832.7 90689707 692.0 output from top is: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 1199 waltman 17 0 3844m 3.7g 3216 S 0.0 23.6 29:58.74 R Note, the relevant call that invoked my function is: test - sapply( c(1:10), function(x) get.vars.for.cluster( clusterStack[[x]], opt=rows ) ) Finally, for fun, I rm()'d all variables with the rm( list=ls() ) command, and then called gc(). The memory of this empty instance of R is still 3.4 gig, i.e. R.console: rm( list=ls() ) ls() character(0) gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 363023 19.45455947 291.4 604878 32.4 Vcells 44434871 339.1 192163339 1466.1 90689707 692.0 Subsequent top output: output from top is: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 1199 waltman 16 0 3507m 3.4g 3216 S 0.0 21.5 29:58.92 R Thanks for any help or suggestions, Peter Waltman p.s. code snippet follows. Note, that I've added extra rm() and gc() calls w/in the function to try to reduce the memory stamp to no avail. get.vars.for.cluster = function( cluster, genes=get.global( gene.ids ), opt=c(rows,cols), ratios=get.global(ratios), var.norm=T, r.sig=get.global( r.sig ), allow.anticor=get.global( allow.anticor ) ) { cat( phw dbg msg\n) cluster - cluster opt - match.arg( opt ) rows - cluster$rows cols - cluster$cols if ( opt == rows ) { cat( phw dbg msg: if opt == rows\n ) r - ratios[ rows, cols ] r.all - ratios[ genes, cols ] avg.rows - apply( r, 2, mean, na.rm=T ) ##median ) rm( r ) # phw
Re: [R] Error message when using zero-inflated count regression model in package zicounts
On Thu, 16 Aug 2007, James R. Milks wrote: Dr. Stevens, I've double-checked my variable lengths. All of my variables (Total.vines, Site, Species, and DBH) came in at 549. I did correct one problem in the data entry that had escaped my previous notice: somehow the undergrad who entered all the data managed to make the Acer negundo data split into two separate categories while still appearing to use the same ACNE abbreviation. When I made that correction and re-ran zicounts, R gave me the following error messages: Hmm, I don't know about the error messages in zicounts, but you could try to use the zeroinfl() implementation in package pscl: vines.zip - zeroinfl(Total.vines ~ Site + Species + DBH | Site + Species + DBH, data = sycamores.1) and see whether this produces a similar error. Z vines.zip-zicounts(resp=Total.vines~.,x=~Site+Species+DBH,z=~Site +Species+DBH,distname=ZIP,data=sycamores.1) Error in ifelse(y == 0, 1, y/mu) : dim-: dims [product 12] do not match the length of object [549] In addition: Warning messages: 1: longer object length is not a multiple of shorter object length in: eta + offset 2: longer object length is not a multiple of shorter object length in: y/mu In addition, zicounts would not run a normal poisson regression on the data, giving me the same error messages as the ZIP regression. Doing a poisson regression with glm did not show any error messages. However, the glm model with full interactions was still over-dispersed. Could the zicounts problem be that the individual sites and species had different population sizes? For instance, Site A had 149 trees, site B had 55 trees, site C had 270 trees, and site D had 75 trees. The species had similar discrepancies in population sizes, with Platanus occidentalis and Acer negundo forming the majority of the trees. Thanks for your help. Jim Milks Graduate Student Environmental Sciences Ph.D. Program 136 Biological Sciences Wright State University 3640 Colonel Glenn Hwy Dayton, OH 45435 On Aug 15, 2007, at 5:58 AM, Martin Henry H. Stevens wrote: Hi Jim, With regard to same number, I simply wanted to make sure that each variable was the same length. The error message you show is what you would get if, for instance, you misspelled one of the variables and it doesn't exist, in which case it would be NULL, while your other variables would each be 550 elements in length. Hank On Aug 14, 2007, at 4:47 PM, James Milks wrote: Dr. Stevens, Unfortunately, Poisson gives me an over-dispersed model with only 3 out of 14 variables/interactions significant. Doing a step-wise poisson regression still ended up with the same over-dispersed model. Given the high number of zeros in the response variable, Dr. Thad Tarpey (one of our statisticians on campus) suggested zero-inflated poisson regression as a possible solution to the over-dispersion problem. As for variables of the same length, there are different numbers of trees for each species and site since we ran different numbers of transects at each site (some sites were larger than others) and there were different numbers of species and trees within each transect. Acer negundo made up ~33% of all the trees we measured; Platanus occidentalis had 25%; Fraxinus americana was another 12% and ~11% was Ulmus americana. The remaining 25% was divided among 16 other species, all of which were excluded from the analysis due to singularities in the model when they were included (something about glm not liking singularities in the model). So if the zicounts requires that each species and site have the same length, then I will not be able to use it unless I can get R to randomly select x trees from each species and site combination. Thanks for your input. Jim Milks Graduate Student Environmental Sciences Ph.D. Program 136 Biological Sciences Wright State University 3640 Colonel Glenn Hwy Dayton, OH 45435 On Aug 14, 2007, at 9:37 AM, Hank Stevens wrote: Hi Jim, Two thoughts come to me, unencumbered by the thought process or knowledge of zicounts: 1. Is Poisson really NOT appropriate? (do you have to use zicounts?) 2. Are you 110% certain that all variables are the same length? Would NA's interfere? Cheers, Hank On Aug 13, 2007, at 5:10 PM, James Milks wrote: I have data on number of vines per tree for ~550 trees. Over half of the trees did not have any vines and the data is fairly skewed (median = 0, mean = 1.158, 3rd qu. = 1.000). I am attempting to investigate whether plot location (four sites), species (I'm using only the four most common species), or tree dbh has a significant influence on the number of vines per tree. When I attempted to use the zicounts function, R gave me the following error message: vines.zip-zicounts(resp=Total.vines~.,x=~Site+Species+DBH,z=~Site +Species+DBH,distrname=ZIP,data=sycamores.1) Error in ifelse(y == 0, 1, y/mu) : dim- : dims [product 12] do not
Re: [R] Trim trailng space from data.frame factor variables
Thanks Marc! What would be the easiest way to coerce char-variables back to factor-variables? Is there a way to prevent the coercion in d[] - lapply(d, function(x) if (is.factor(x)) sub( +$, , x) else x) ? -Lauri 2007/8/16, Marc Schwartz [EMAIL PROTECTED]: On Thu, 2007-08-16 at 17:54 +0300, Lauri Nikkinen wrote: Hi folks, I would like to trim the trailing spaces in my factor variables using lapply (described in this post by Marc Schwartz: http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22826.html) but the code is not functioning (in this example there is only one factor with trailing spaces): Ayepas I noted in that post, it was untestedmy error. The problem is that by using ifelse() as I did, the test for the column being a factor returns a single result, not one result per element. Hence, the appropriate conditional code is only performed on the first element in each column, rather than being vectorized on the entire column. y1 - rnorm(20) + 6.8 y2 - rnorm(20) + (1:20*1.7 + 1) y3 - rnorm(20) + (1:20*6.7 + 3.7) y - c(y1,y2,y3) x - gl(5,12) f - gl(3,20, labels=paste(lev, 1:3,, sep=)) d - data.frame(x=x,y=y, f=f) str(d) d[] - lapply(d, function(x) ifelse(is.factor(x), sub( +$, , x), x)) str(d) How should I modify this? Try this instead: d[] - lapply(d, function(x) if (is.factor(x)) sub( +$, , x) else x) str(d) 'data.frame': 60 obs. of 3 variables: $ x: chr 1 1 1 1 ... $ y: num 6.70 4.42 8.03 4.90 6.98 ... $ f: chr lev1 lev1 lev1 lev1 ... Note that by using grep(), the factors are coerced to character vectors as expected. You would need to coerce back to factors if you need them as such. HTH, Marc Schwartz [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Possible memory leak with large matrices in R v.2.5.0
I'm working with a very large matrix ( 22k rows x 2k cols) of RNA expression data with R v.2.5.0 on a RedHat Enterprise machine, x86_64 architecture. The relevant code is below, but I call a function that takes a cluster of this data ( a list structure that contains a $rows elt which lists the rows (genes ) in the cluster by ID, but not the actual data itself ). The function creates two copies of the matrix, one containing the rows in the cluster, and one with the rest of the rows in the matrix. After doing some statistical massaging, the function returns a statistical score for each rows/genes in the matrix, producing a vector of 22k elt's. When I run 'top', I see that the memory stamp of R after loading the matrix is ~750M. However, after calling this function on 10 clusters, this jumps to 3.7 gig (at least by 'top's measurement), and this will not be reduced by any subsequent calls to gc(). Output from gc() is: gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 377925 20.26819934 364.3 604878 32.4 Vcells 88857341 678.0 240204174 1832.7 90689707 692.0 output from top is: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 1199 waltman 17 0 3844m 3.7g 3216 S 0.0 23.6 29:58.74 R Note, the relevant call that invoked my function is: test - sapply( c(1:10), function(x) get.vars.for.cluster( clusterStack[[x]], opt=rows ) ) Finally, for fun, I rm()'d all variables with the rm( list=ls() ) command, and then called gc(). The memory of this empty instance of R is still 3.4 gig, i.e. R.console: rm( list=ls() ) ls() character(0) gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 363023 19.45455947 291.4 604878 32.4 Vcells 44434871 339.1 192163339 1466.1 90689707 692.0 Subsequent top output: output from top is: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 1199 waltman 16 0 3507m 3.4g 3216 S 0.0 21.5 29:58.92 R Thanks for any help or suggestions, Peter Waltman p.s. code snippet follows. Note, that I've added extra rm() and gc() calls w/in the function to try to reduce the memory stamp to no avail. get.vars.for.cluster = function( cluster, genes=get.global( gene.ids ), opt=c(rows,cols), ratios=get.global(ratios), var.norm=T, r.sig=get.global( r.sig ), allow.anticor=get.global( allow.anticor ) ) { cluster - cluster opt - match.arg( opt ) rows - cluster$rows cols - cluster$cols if ( opt == rows ) { r - ratios[ rows, cols ] r.all - ratios[ genes, cols ] avg.rows - apply( r, 2, mean, na.rm=T ) ##median ) rm( r ) gc() devs - apply( r.all, 1, -, avg.rows ) if ( !allow.anticor ) { rm( r.all, avg.rows ) gc() } vars - apply( devs, 2, var, na.rm=T ) rm( devs ) gc() vars - log10( vars ) gc() if ( allow.anticor ) { ## Get variance against the inverse of the mean profile devs.2 - apply( r.all, 1, -, -avg.rows ) gc() vars.2 - apply( devs.2, 2, var, na.rm=T ) gc() rm( devs.2 ) gc() vars.2 - log10( vars.2 ) gc() ## For each gene take the min of variance or anti-cor variance vars - cbind( vars, vars.2 ) rm( vars.2 ) gc() vars - apply( vars, 1, min ) gc() } ## Normalize the values by the variance over the rows in the cluster if ( var.norm ) { vars - vars - mean( vars[ rows ], na.rm=T ) tmp.sd - sd( vars[ rows ], na.rm=T ) if ( ! is.na( tmp.sd ) tmp.sd != 0 ) vars - vars / ( tmp.sd + r.sig ) } gc() return( vars ) } else { r.all - ratios[ rows, ] ## Mean-normalized variance vars - log10( apply( r.all, 2, var, na.rm=T ) / abs( apply( r.all, 2, mean, na.rm=T ) ) ) names( vars ) - colnames( ratios ) ## Normalize the values by the variance over the rows in the cluster if ( var.norm ) { vars - vars - mean( vars[ cluster$cols ], na.rm=T ) tmp.sd - sd( vars[ cluster$cols ], na.rm=T ) if ( ! is.na( tmp.sd ) tmp.sd != 0 ) vars - vars / ( tmp.sd + r.sig ) } return( vars ) } }
Re: [R] Trim trailng space from data.frame factor variables
On Thu, 2007-08-16 at 17:54 +0300, Lauri Nikkinen wrote: Hi folks, I would like to trim the trailing spaces in my factor variables using lapply (described in this post by Marc Schwartz: http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22826.html) but the code is not functioning (in this example there is only one factor with trailing spaces): Ayepas I noted in that post, it was untestedmy error. The problem is that by using ifelse() as I did, the test for the column being a factor returns a single result, not one result per element. Hence, the appropriate conditional code is only performed on the first element in each column, rather than being vectorized on the entire column. y1 - rnorm(20) + 6.8 y2 - rnorm(20) + (1:20*1.7 + 1) y3 - rnorm(20) + (1:20*6.7 + 3.7) y - c(y1,y2,y3) x - gl(5,12) f - gl(3,20, labels=paste(lev, 1:3,, sep=)) d - data.frame(x=x,y=y, f=f) str(d) d[] - lapply(d, function(x) ifelse(is.factor(x), sub( +$, , x), x)) str(d) How should I modify this? Try this instead: d[] - lapply(d, function(x) if (is.factor(x)) sub( +$, , x) else x) str(d) 'data.frame': 60 obs. of 3 variables: $ x: chr 1 1 1 1 ... $ y: num 6.70 4.42 8.03 4.90 6.98 ... $ f: chr lev1 lev1 lev1 lev1 ... Note that by using grep(), the factors are coerced to character vectors as expected. You would need to coerce back to factors if you need them as such. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trim trailng space from data.frame factor variables
The easiest way might be to modify the lapply() call as follows: d[] - lapply(d, function(x) if (is.factor(x)) factor(sub( +$, , x)) else x) str(d) 'data.frame': 60 obs. of 3 variables: $ x: Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ... $ y: num 7.01 8.33 5.48 6.51 5.61 ... $ f: Factor w/ 3 levels lev1,lev2,..: 1 1 1 1 1 1 1 1 1 1 ... This way the coercion back to a factor takes place within the loop as needed. Note that I also meant to type sub() and not grep() below. The default behavior for both is to return a character vector (if 'value = TRUE' in grep()). There is not an argument to override that behavior. HTH, Marc On Thu, 2007-08-16 at 19:19 +0300, Lauri Nikkinen wrote: Thanks Marc! What would be the easiest way to coerce char-variables back to factor-variables? Is there a way to prevent the coercion in d[] - lapply(d, function(x) if ( is.factor(x)) sub( +$, , x) else x) ? -Lauri 2007/8/16, Marc Schwartz [EMAIL PROTECTED]: On Thu, 2007-08-16 at 17:54 +0300, Lauri Nikkinen wrote: Hi folks, I would like to trim the trailing spaces in my factor variables using lapply (described in this post by Marc Schwartz: http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22826.html) but the code is not functioning (in this example there is only one factor with trailing spaces): Ayepas I noted in that post, it was untestedmy error. The problem is that by using ifelse() as I did, the test for the column being a factor returns a single result, not one result per element. Hence, the appropriate conditional code is only performed on the first element in each column, rather than being vectorized on the entire column. y1 - rnorm(20) + 6.8 y2 - rnorm(20) + (1:20* 1.7 + 1) y3 - rnorm(20) + (1:20*6.7 + 3.7) y - c(y1,y2,y3) x - gl(5,12) f - gl(3,20, labels=paste(lev, 1:3,, sep=)) d - data.frame (x=x,y=y, f=f) str(d) d[] - lapply(d, function(x) ifelse(is.factor(x), sub( +$, , x), x)) str(d) How should I modify this? Try this instead: d[] - lapply(d, function(x) if (is.factor(x)) sub( +$, , x) else x) str(d) 'data.frame': 60 obs. of 3 variables: $ x: chr 1 1 1 1 ... $ y: num 6.70 4.42 8.03 4.90 6.98 ... $ f: chr lev1 lev1 lev1 lev1 ... Note that by using grep(), the factors are coerced to character vectors as expected. You would need to coerce back to factors if you need them as such. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Urgent Help needed
Dear All: Urgent help is needed. I have a data set in matrix format of three columns: X, Y and index of four groups (1,2,3,4). What I need to do is the following; 1- How I can subtract the sample mean of each group indexed 1,2,3,4 from the corresponding data values of this group and create new columns say X-sample mean and Y-sample mean? I tried to use the tapply but I have some difficulties to restore the new data 2- How I can use the tapply if possible or any other R-function to find the correlation coefficient between the X and Y columns for each group indexed 1,2,3,4.? Could not use the tapply. I attached part of the data as txt file. Thank you so much for your attention to this matter, and I look forward to hear from you soon. Regards, Abou Data: x y index 15807.2412.54 15752.5133.54 12893.7601.53 8426.88 22.23 5706.24 333 3 3982.08 560 2 3642.62 670 2 295.68 124 1 215.40 104 1 195.40 204 1 4240.21 22.42 1222.72 45.92 1142.26 23.62 63.00 90.11 1216.00 82.42 2769.60 111 2 1790.46 34.72 26.10 26.10 1 19676.830.994 10920.60203 3 6144.00 46 3 4534.48 4534.48 3 4.0065 4 29500.0056 4 17100.0077 4 9000.00 435 3 6300.00 84 3 3962.88 334 2 5690.00 653 3 3736.00 233 2 2750.00 22 2 1316.00 345 2 4595.00 4595.00 3 5928.00 45 3 2645.70 0.002 2580.24 454 2 6547.34 6547.34 3 1615.68 5 2 194.06 55 1 184.80 6 1 82.94 44 1 16649.0056 4 4500.00 74 3 1600.00 744 2 = == AbouEl-Makarim Aboueissa, Ph.D. Assistant Professor of Statistics Department of Mathematics Statistics University of Southern Maine 96 Falmouth Street P.O. Box 9300 Portland, ME 04104-9300 Tel: (207) 228-8389 Email: [EMAIL PROTECTED] [EMAIL PROTECTED] Office: 301C Payson Smith x y index 15807.2412.54 15752.5133.54 12893.7601.53 8426.88 22.23 5706.24 333 3 3982.08 560 2 3642.62 670 2 295.68 124 1 215.40 104 1 195.40 204 1 4240.21 22.42 1222.72 45.92 1142.26 23.62 63.00 90.11 1216.00 82.42 2769.60 111 2 1790.46 34.72 26.10 26.10 1 19676.830.994 10920.60203 3 6144.00 46 3 4534.48 4534.48 3 4.0065 4 29500.0056 4 17100.0077 4 9000.00 435 3 6300.00 84 3 3962.88 334 2 5690.00 653 3 3736.00 233 2 2750.00 22 2 1316.00 345 2 4595.00 4595.00 3 5928.00 45 3 2645.70 0.002 2580.24 454 2 6547.34 6547.34 3 1615.68 5 2 194.06 55 1 184.80 6 1 82.94 44 1 16649.0056 4 4500.00 74 3 1600.00 744 2 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trim trailng space from data.frame factor variables
On Thu, 16 Aug 2007, Marc Schwartz wrote: The easiest way might be to modify the lapply() call as follows: d[] - lapply(d, function(x) if (is.factor(x)) factor(sub( +$, , x)) else x) str(d) 'data.frame': 60 obs. of 3 variables: $ x: Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ... $ y: num 7.01 8.33 5.48 6.51 5.61 ... $ f: Factor w/ 3 levels lev1,lev2,..: 1 1 1 1 1 1 1 1 1 1 ... This way the coercion back to a factor takes place within the loop as needed. Note that I also meant to type sub() and not grep() below. The default behavior for both is to return a character vector (if 'value = TRUE' in grep()). There is not an argument to override that behavior. I would have thought the thing to do was to apply sub() to the levels: chfactor - function(x) { levels(x) - sub( +$, , levels(x)); x } d[] - lapply(d, function(x) if (is.factor(x)) chfactor(x) else x) This has the advantage of not losing the order of the levels. It will merge levels if they only differ in the number of trailing spaces, which is probably what you want. HTH, Marc On Thu, 2007-08-16 at 19:19 +0300, Lauri Nikkinen wrote: Thanks Marc! What would be the easiest way to coerce char-variables back to factor-variables? Is there a way to prevent the coercion in d[] - lapply(d, function(x) if ( is.factor(x)) sub( +$, , x) else x) ? -Lauri 2007/8/16, Marc Schwartz [EMAIL PROTECTED]: On Thu, 2007-08-16 at 17:54 +0300, Lauri Nikkinen wrote: Hi folks, I would like to trim the trailing spaces in my factor variables using lapply (described in this post by Marc Schwartz: http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22826.html) but the code is not functioning (in this example there is only one factor with trailing spaces): Ayepas I noted in that post, it was untestedmy error. The problem is that by using ifelse() as I did, the test for the column being a factor returns a single result, not one result per element. Hence, the appropriate conditional code is only performed on the first element in each column, rather than being vectorized on the entire column. y1 - rnorm(20) + 6.8 y2 - rnorm(20) + (1:20* 1.7 + 1) y3 - rnorm(20) + (1:20*6.7 + 3.7) y - c(y1,y2,y3) x - gl(5,12) f - gl(3,20, labels=paste(lev, 1:3,, sep=)) d - data.frame (x=x,y=y, f=f) str(d) d[] - lapply(d, function(x) ifelse(is.factor(x), sub( +$, , x), x)) str(d) How should I modify this? Try this instead: d[] - lapply(d, function(x) if (is.factor(x)) sub( +$, , x) else x) str(d) 'data.frame': 60 obs. of 3 variables: $ x: chr 1 1 1 1 ... $ y: num 6.70 4.42 8.03 4.90 6.98 ... $ f: chr lev1 lev1 lev1 lev1 ... Note that by using grep(), the factors are coerced to character vectors as expected. You would need to coerce back to factors if you need them as such. HTH, Marc Schwartz -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trim trailng space from data.frame factor variables
On Thu, 2007-08-16 at 17:52 +0100, Prof Brian Ripley wrote: On Thu, 16 Aug 2007, Marc Schwartz wrote: The easiest way might be to modify the lapply() call as follows: d[] - lapply(d, function(x) if (is.factor(x)) factor(sub( +$, , x)) else x) str(d) 'data.frame': 60 obs. of 3 variables: $ x: Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ... $ y: num 7.01 8.33 5.48 6.51 5.61 ... $ f: Factor w/ 3 levels lev1,lev2,..: 1 1 1 1 1 1 1 1 1 1 ... This way the coercion back to a factor takes place within the loop as needed. Note that I also meant to type sub() and not grep() below. The default behavior for both is to return a character vector (if 'value = TRUE' in grep()). There is not an argument to override that behavior. I would have thought the thing to do was to apply sub() to the levels: chfactor - function(x) { levels(x) - sub( +$, , levels(x)); x } d[] - lapply(d, function(x) if (is.factor(x)) chfactor(x) else x) This has the advantage of not losing the order of the levels. It will merge levels if they only differ in the number of trailing spaces, which is probably what you want. Quite true. As also noted in that prior thread as I recall, is the 'strip.white' option in read.table() et al which would obviate the need for post-import trimming if it makes sense in this application for Lauri. Thanks, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] several plots on several pages
You can set up the 3 plots per page by using: par(mfrow=c(3,1)) Then there are a couple of options for skipping the top graphics position if the graph fails. If you know that the graph failed then you can just use plot.new() (or frame()) to skip the top plot and plot the next one in the 2nd position. Another option is you can call par(mfg=c(2,1)) explicitly before plotting the 2nd plot to force it to plot in the 2,1 position. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rainer M. Krug Sent: Thursday, August 16, 2007 6:58 AM To: r-help Subject: [R] several plots on several pages Hi version _ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 5.1 year 2007 month 06 day27 svn rev42083 language R version.string R version 2.5.1 (2007-06-27) I want to create a pdf withe three graphs on a page and with two pages: - | 1 | - | 2 | - | 3 | - NEW PAGE - | 4 | - | 5 | - | 6 | - Graph 1 should ALWAYS be at that spot, graph two also, even if graph one produces an error when plotting (the area can be empty, but doesn't have to.) I produced the foolowing code below, but I have a few problems: 1) how can I create a new page in the pdf? 2) how can I make sure that the second graph is in position 2 when graph one produces an error when plotting I(as in the example)? Everything works OK (for the firsat page) when graph one is plotted. I have the feeling, that I am thinking to complicated. Any help welcome, Rainer pdf(test.pdf) try( { ## Set layout to three rows and only oine column par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) ) ## First row par(mfg=c(1,1)) try( plot(runif(ff)) ) ## Second row par(mfg=c(2,1)) try( plot(runif(100)) ) ## Third row par(mfg=c(3,1)) plot(runif(1000)) ## THE NEXT THREE SHOULD BE ON A NEW PAGE IN THE PDF ## Set layout to three rows and only oine column par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) ) ## First row par(mfg=c(1,1)) try( plot(runif(ff)) ) ## Second row par(mfg=c(2,1)) try( plot(runif(100)) ) ## Third row par(mfg=c(3,1)) plot(runif(1000)) } ) dev.off() -- NEW EMAIL ADDRESS AND ADDRESS: [EMAIL PROTECTED] [EMAIL PROTECTED] WILL BE DISCONTINUED END OF MARCH Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation Biology (UCT) Plant Conservation Unit Department of Botany University of Cape Town Rondebosch 7701 South Africa Tel: +27 - (0)21 650 5776 (w) Fax: +27 - (0)86 516 2782 Fax: +27 - (0)21 650 2440 (w) Cell: +27 - (0)83 9479 042 Skype:RMkrug email:[EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] several plots on several pages
Oops, I read further down in your original post and see that you already knew about par(mfg=c(2,1)). To get it to advance to page 2 for the 4th plot try calling plot.new() which should move you to the next page, then doing par(mfg=c(1,1)) should cause the next graph to be at the top. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rainer M. Krug Sent: Thursday, August 16, 2007 6:58 AM To: r-help Subject: [R] several plots on several pages Hi version _ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 5.1 year 2007 month 06 day27 svn rev42083 language R version.string R version 2.5.1 (2007-06-27) I want to create a pdf withe three graphs on a page and with two pages: - | 1 | - | 2 | - | 3 | - NEW PAGE - | 4 | - | 5 | - | 6 | - Graph 1 should ALWAYS be at that spot, graph two also, even if graph one produces an error when plotting (the area can be empty, but doesn't have to.) I produced the foolowing code below, but I have a few problems: 1) how can I create a new page in the pdf? 2) how can I make sure that the second graph is in position 2 when graph one produces an error when plotting I(as in the example)? Everything works OK (for the firsat page) when graph one is plotted. I have the feeling, that I am thinking to complicated. Any help welcome, Rainer pdf(test.pdf) try( { ## Set layout to three rows and only oine column par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) ) ## First row par(mfg=c(1,1)) try( plot(runif(ff)) ) ## Second row par(mfg=c(2,1)) try( plot(runif(100)) ) ## Third row par(mfg=c(3,1)) plot(runif(1000)) ## THE NEXT THREE SHOULD BE ON A NEW PAGE IN THE PDF ## Set layout to three rows and only oine column par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) ) ## First row par(mfg=c(1,1)) try( plot(runif(ff)) ) ## Second row par(mfg=c(2,1)) try( plot(runif(100)) ) ## Third row par(mfg=c(3,1)) plot(runif(1000)) } ) dev.off() -- NEW EMAIL ADDRESS AND ADDRESS: [EMAIL PROTECTED] [EMAIL PROTECTED] WILL BE DISCONTINUED END OF MARCH Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation Biology (UCT) Plant Conservation Unit Department of Botany University of Cape Town Rondebosch 7701 South Africa Tel: +27 - (0)21 650 5776 (w) Fax: +27 - (0)86 516 2782 Fax: +27 - (0)21 650 2440 (w) Cell: +27 - (0)83 9479 042 Skype:RMkrug email:[EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trim trailng space from data.frame factor variables
If the problem is with the levels of the factor, why not change them directly? d = data.frame(a=1:5, + b=c('one ','two','three ','three ','two')) d$b [1] onetwothree three two Levels: one three two levels(d$b) = sub(' +$','',levels(d$b)) d$b [1] one two three three two Levels: one three two - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley [EMAIL PROTECTED] On Thu, 16 Aug 2007, Marc Schwartz wrote: The easiest way might be to modify the lapply() call as follows: d[] - lapply(d, function(x) if (is.factor(x)) factor(sub( +$, , x)) else x) str(d) 'data.frame': 60 obs. of 3 variables: $ x: Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ... $ y: num 7.01 8.33 5.48 6.51 5.61 ... $ f: Factor w/ 3 levels lev1,lev2,..: 1 1 1 1 1 1 1 1 1 1 ... This way the coercion back to a factor takes place within the loop as needed. Note that I also meant to type sub() and not grep() below. The default behavior for both is to return a character vector (if 'value = TRUE' in grep()). There is not an argument to override that behavior. HTH, Marc On Thu, 2007-08-16 at 19:19 +0300, Lauri Nikkinen wrote: Thanks Marc! What would be the easiest way to coerce char-variables back to factor-variables? Is there a way to prevent the coercion in d[] - lapply(d, function(x) if ( is.factor(x)) sub( +$, , x) else x) ? -Lauri 2007/8/16, Marc Schwartz [EMAIL PROTECTED]: On Thu, 2007-08-16 at 17:54 +0300, Lauri Nikkinen wrote: Hi folks, I would like to trim the trailing spaces in my factor variables using lapply (described in this post by Marc Schwartz: http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22826.html) but the code is not functioning (in this example there is only one factor with trailing spaces): Ayepas I noted in that post, it was untestedmy error. The problem is that by using ifelse() as I did, the test for the column being a factor returns a single result, not one result per element. Hence, the appropriate conditional code is only performed on the first element in each column, rather than being vectorized on the entire column. y1 - rnorm(20) + 6.8 y2 - rnorm(20) + (1:20* 1.7 + 1) y3 - rnorm(20) + (1:20*6.7 + 3.7) y - c(y1,y2,y3) x - gl(5,12) f - gl(3,20, labels=paste(lev, 1:3,, sep=)) d - data.frame (x=x,y=y, f=f) str(d) d[] - lapply(d, function(x) ifelse(is.factor(x), sub( +$, , x), x)) str(d) How should I modify this? Try this instead: d[] - lapply(d, function(x) if (is.factor(x)) sub( +$, , x) else x) str(d) 'data.frame': 60 obs. of 3 variables: $ x: chr 1 1 1 1 ... $ y: num 6.70 4.42 8.03 4.90 6.98 ... $ f: chr lev1 lev1 lev1 lev1 ... Note that by using grep(), the factors are coerced to character vectors as expected. You would need to coerce back to factors if you need them as such. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AIC and logLik for logistic regression in R and S-PLUS
Leandra Desousa sousa at ims.uaf.edu writes: I am using 'R' version 2.2.1 and 'S-PLUS' version 6.0; and I loaded the MASS library in 'S-PLUS'. I am running a logistic regression using glm: summary(mydata.glm) Call: glm(formula = COMU ~ MeanPycUpT + MeanPycUpS, family = binomial,data = mydata) [snip] Null deviance: 30.316 on 21 degrees of freedom Residual deviance: 23.900 on 19 degrees of freedom AIC: 29.9 'R' - AIC(mydata.glm) [1] 29.89986 logLik(mydata.glm) 'log Lik.' -11.94993 (df=3) - 'S-PLUS' - AIC(mydata.glm) [1] 71.03222 logLik(mydata.glm) [1] -31.51611 - 1) Which AIC value is the correct one? 2) Which log-likelihood value is the correct one? AIC and log-likelihood are often defined differently in software packages -- specifically, additive constants can be included or excluded as long as they are done consistently, without affecting inferences from the model. The absolute values of AIC and logLik aren't that important; the only thing that really matters are differences among models. Have you tried comparing models within R and within S-PLUS to establish whether they give the same inferences (I would guess they do)? 3) If 'extractAIC' in 'S-PLUS' and all values in 'R' are the correct ones, and the 'AIC' and 'logLik' in 'S-PLUS' values are wrong then: Why 'S-PLUS' cannot retrieve a log-likelihood value from my glm object('mydata.glm'), even though it is using log-likelihood to calculate its residual deviance? It's very hard for us to debug S-PLUS! Some (most?) of us on the list don't even have S-PLUS installed any more ... Perhaps you should ask this question on an S-PLUS mailing list, or of the (paid) S-PLUS technical support team ... And the obligatory R-list nags: * it would help if you upgraded your R version -- R 2.3.0 came out in April 2006, and we're now up to 2.5.1 * it's the MASS package, not the MASS library cheers Ben Bolker __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Urgent Help needed
try this: t0 = read.table(datatest.txt, header=T) X.mean = ave(t0[,1], as.factor(t0[,3])) you do the rest of Y.mean and make them into a data.fame or whatever. HTH, Weiwei On 8/16/07, AbouEl-Makarim Aboueissa [EMAIL PROTECTED] wrote: Dear All: Urgent help is needed. I have a data set in matrix format of three columns: X, Y and index of four groups (1,2,3,4). What I need to do is the following; 1- How I can subtract the sample mean of each group indexed 1,2,3,4 from the corresponding data values of this group and create new columns say X-sample mean and Y-sample mean? I tried to use the tapply but I have some difficulties to restore the new data 2- How I can use the tapply if possible or any other R-function to find the correlation coefficient between the X and Y columns for each group indexed 1,2,3,4.? Could not use the tapply. I attached part of the data as txt file. Thank you so much for your attention to this matter, and I look forward to hear from you soon. Regards, Abou Data: x y index 15807.2412.54 15752.5133.54 12893.7601.53 8426.88 22.23 5706.24 333 3 3982.08 560 2 3642.62 670 2 295.68 124 1 215.40 104 1 195.40 204 1 4240.21 22.42 1222.72 45.92 1142.26 23.62 63.00 90.11 1216.00 82.42 2769.60 111 2 1790.46 34.72 26.10 26.10 1 19676.830.994 10920.60203 3 6144.00 46 3 4534.48 4534.48 3 4.0065 4 29500.0056 4 17100.0077 4 9000.00 435 3 6300.00 84 3 3962.88 334 2 5690.00 653 3 3736.00 233 2 2750.00 22 2 1316.00 345 2 4595.00 4595.00 3 5928.00 45 3 2645.70 0.002 2580.24 454 2 6547.34 6547.34 3 1615.68 5 2 194.06 55 1 184.80 6 1 82.94 44 1 16649.0056 4 4500.00 74 3 1600.00 744 2 = == AbouEl-Makarim Aboueissa, Ph.D. Assistant Professor of Statistics Department of Mathematics Statistics University of Southern Maine 96 Falmouth Street P.O. Box 9300 Portland, ME 04104-9300 Tel: (207) 228-8389 Email: [EMAIL PROTECTED] [EMAIL PROTECTED] Office: 301C Payson Smith __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use AnnBuilderSourceUrls with local path insted of ftp adress
I want to build my own GO package using the function GOPkgBuilder of AnnBuilder. It uses AnnBuilderSourceUrls to collect data from different ftp sites. These data are not complete for my organism, so I would like to change the ftp adresses to a local one. The changing itself is working but when I run the script I get the following Error: Error in loadFromUrl(file.path(sourceURLs[[EG]], gene2go.gz)) : URL /path/to/file/gene2go.gz is incorrect or the target site is not responding! First of all, please post questions about BioConductor packages in BioConductor mailing list instead of R. Is /path/to/file a real local path? If so, you may also need to modify GOPkgBuilder a little. Try to change the line eg = EG(srcUrl = sourceURLs[[EG]]) to eg = EG(srcUrl = sourceURLs[[EG]], fromWeb = FALSE) The full code of the script is this: library(AnnBuilder) list-options(AnnBuilderSourceUrls) list$AnnBuilderSourceUrls$EG-/path/to/file/ options(AnnBuilderSourceUrls=list$AnnBuilderSourceUrls) GOPkgBuilder(pkgName = GO, pkgPath = ., filename = go_200706-termdb.obo-xml.gz, version = v.1.0, author = list(author = author, maintainer = maintainer [EMAIL PROTECTED]) ) Thanks in advance for any help. Daniela Albrecht -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Jianhua Zhang Department of Medical Oncology Dana-Farber Cancer Institute 44 Binney Street Boston, MA 02115-6084 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Urgent Help needed
For the 2nd item, perhaps: by(df[,1:2], df$index, FUN=cor) where df is your data.frame. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O On 16/08/07, AbouEl-Makarim Aboueissa [EMAIL PROTECTED] wrote: Dear All: Urgent help is needed. I have a data set in matrix format of three columns: X, Y and index of four groups (1,2,3,4). What I need to do is the following; 1- How I can subtract the sample mean of each group indexed 1,2,3,4 from the corresponding data values of this group and create new columns say X-sample mean and Y-sample mean? I tried to use the tapply but I have some difficulties to restore the new data 2- How I can use the tapply if possible or any other R-function to find the correlation coefficient between the X and Y columns for each group indexed 1,2,3,4.? Could not use the tapply. I attached part of the data as txt file. Thank you so much for your attention to this matter, and I look forward to hear from you soon. Regards, Abou Data: x y index 15807.2412.54 15752.5133.54 12893.7601.53 8426.88 22.23 5706.24 333 3 3982.08 560 2 3642.62 670 2 295.68 124 1 215.40 104 1 195.40 204 1 4240.21 22.42 1222.72 45.92 1142.26 23.62 63.00 90.11 1216.00 82.42 2769.60 111 2 1790.46 34.72 26.10 26.10 1 19676.830.994 10920.60203 3 6144.00 46 3 4534.48 4534.48 3 4.0065 4 29500.0056 4 17100.0077 4 9000.00 435 3 6300.00 84 3 3962.88 334 2 5690.00 653 3 3736.00 233 2 2750.00 22 2 1316.00 345 2 4595.00 4595.00 3 5928.00 45 3 2645.70 0.002 2580.24 454 2 6547.34 6547.34 3 1615.68 5 2 194.06 55 1 184.80 6 1 82.94 44 1 16649.0056 4 4500.00 74 3 1600.00 744 2 = == AbouEl-Makarim Aboueissa, Ph.D. Assistant Professor of Statistics Department of Mathematics Statistics University of Southern Maine 96 Falmouth Street P.O. Box 9300 Portland, ME 04104-9300 Tel: (207) 228-8389 Email: [EMAIL PROTECTED] [EMAIL PROTECTED] Office: 301C Payson Smith __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Urgent Help needed
On Thu, 2007-08-16 at 12:33 -0400, AbouEl-Makarim Aboueissa wrote: Dear All: Urgent help is needed. I have a data set in matrix format of three columns: X, Y and index of four groups (1,2,3,4). What I need to do is the following; 1- How I can subtract the sample mean of each group indexed 1,2,3,4 from the corresponding data values of this group and create new columns say X-sample mean and Y-sample mean? I tried to use the tapply but I have some difficulties to restore the new data 2- How I can use the “tapply” if possible or any other R-function to find the correlation coefficient between the X and Y columns for each group indexed 1,2,3,4.? Could not use the tapply. I attached part of the data as txt file. Thank you so much for your attention to this matter, and I look forward to hear from you soon. Regards, Abou Data: x y index 15807.24 12.54 15752.51 33.54 12893.76 01.53 8426.88 22.23 5706.24 333 3 3982.08 560 2 3642.62 670 2 295.68124 1 215.40104 1 195.40204 1 4240.21 22.42 1222.72 45.92 1142.26 23.62 63.00 90.11 1216.00 82.42 2769.60 111 2 1790.46 34.72 26.10 26.10 1 19676.83 0.994 10920.60 203 3 6144.00 46 3 4534.48 4534.48 3 4.00 65 4 29500.00 56 4 17100.00 77 4 9000.00 435 3 6300.00 84 3 3962.88 334 2 5690.00 653 3 3736.00 233 2 2750.00 22 2 1316.00 345 2 4595.00 4595.00 3 5928.00 45 3 2645.70 0.002 2580.24 454 2 6547.34 6547.34 3 1615.68 5 2 194.0655 1 184.806 1 82.94 44 1 16649.00 56 4 4500.00 74 3 1600.00 744 2 = I might be tempted to take the following approach: If your data is a matrix, coerce it to a data frame first. Let's call that 'DF'. str(DF) 'data.frame': 44 obs. of 3 variables: $ x: num 15807 15753 12894 8427 5706 ... $ y: num 12.5 33.5 1.5 22.2 333 560 670 124 104 204 ... $ index: int 4 4 3 3 3 2 2 1 1 1 ... Now use split() to break up the data frame into a list of 4 sub-dataframes, based upon the index value. We can use scale() within a lapply() loop to center the 'x' and 'y' columns for each sub-dataframe: DF.ctr - lapply(split(DF[, -3], DF$index), scale, scale = FALSE) str(DF.ctr) List of 4 $ 1: num [1:8, 1:2] 138.5 58.2 38.2 -94.2 -131.1 ... ..- attr(*, dimnames)=List of 2 .. ..$ : chr [1:8] 8 9 10 14 ... .. ..$ : chr [1:2] x y ..- attr(*, scaled:center)= Named num [1:2] 157.2 81.7 .. ..- attr(*, names)= chr [1:2] x y $ 2: num [1:16, 1:2] 1469 1129 1727 -1291 -1371 ... ..- attr(*, dimnames)=List of 2 .. ..$ : chr [1:16] 6 7 11 12 ... .. ..$ : chr [1:2] x y ..- attr(*, scaled:center)= Named num [1:2] 2513 230 .. ..- attr(*, names)= chr [1:2] x y $ 3: num [1:13, 1:2] 5879 1413 -1308 3906 -870 ... ..- attr(*, dimnames)=List of 2 .. ..$ : chr [1:13] 3 4 5 20 ... .. ..$ : chr [1:2] x y ..- attr(*, scaled:center)= Named num [1:2] 7014 1352 .. ..- attr(*, names)= chr [1:2] x y $ 4: num [1:7, 1:2] -6262 -6317 -2393 17931 7431 ... ..- attr(*, dimnames)=List of 2 .. ..$ : chr [1:7] 1 2 19 23 ... .. ..$ : chr [1:2] x y ..- attr(*, scaled:center)= Named num [1:2] 2206943 .. ..- attr(*, names)= chr [1:2] x y Now, create a new single DF comprised of the sub-dataframes from DF.ctr: DF.new - do.call(rbind, DF.ctr) Define colnames: colnames(DF.new) - c(x-mean, y-mean) str(DF.new) num [1:44, 1:2] 138.5 58.2 38.2 -94.2 -131.1 ... - attr(*, dimnames)=List of 2 ..$ : chr [1:44] 8 9 10 14 ... ..$ : chr [1:2] x-mean y-mean Now, use merge() to join DF and DF.new by the rownames: DF.final - merge(DF, DF.new, by = row.names) DF.final Row.namesx y index x-mean y-mean 1 1 15807.24 12.50 4 -6262.12857 -30.498571 2 10 195.40 204.00 138.22750 122.35 3 11 4240.21 22.40 2 1726.93188 -208.037500 4 12 1222.72 45.90 2 -1290.55812 -184.537500 5 13 1142.26 23.60 2 -1371.01812 -206.837500 6 1463.00 90.10 1 -94.17250 8.45 7 15 1216.00 82.40 2 -1297.27812 -148.037500 8 16 2769.60 111.00 2 256.32188 -119.437500 9 17 1790.46 34.70 2 -722.81812 -195.737500 101826.10 26.10 1 -131.07250 -55.55 1119 19676.830.99 4 -2392.53857 -42.008571 12 2 15752.51 33.50 4 -6316.85857-9.498571 1320 10920.60
Re: [R] Possible memory leak with R v.2.5.0
Hi Martin - Thanks for the feedback. Right after I sent the email to the list last night, I realized that I'd forgotten to clear the all the vars we attach() to the environment before rm()'ing everything. Whoops. However, I found that while doing this *does* reduce the memory stamp by about a gig (down to 2.6 from 3.4), subsequent calls to gc() still can't reduce the memory stamp any further. Also, I probably should have added some add'l explanation of our code. I'm working on is legacy code that doesn't implement packages per se (or per the definition that R uses, at least). Instead, they are lists() of functions, i.e. try ( detach( gain-funcs ), silent=T ) gain-funcs = list( func1 = function() { ... }, func2 = function(){ ... } ) attach( gain-funcs ) In this framework, we can source a given .R file and have access to these functions without cluttering up the global namespace. Additionally, the 'ratios' var contains the expression matrix, and is actually an elt of another variable called 'global.data' that is also attach()'d. Similar to what you suggested, I pulled out the get.global( 'ratios' ) parameter from the function (since it's already available globally), and found that that had no effect on reducing the memory stamp. Frustrating. Likewise, after checking, I discovered that the cluster - cluster command was just for debugging purposes, so I've since commented it out in both your version, as well as mine. Additionally, there is no need to pass the cluster into the argument since we keep it in a stack (implemented as a list) that's available in the global namespace, so I re-wrote your version (and mine) to take the index of the cluster, rather than pass the cluster itself. Running your stripped down version of the get.vars.for.cluster() fn on 5 clusters caused R's memory stamp to jump up to as high as 5.7 gig, ending with a final stamp of 4.9 gig. Detach()'ing all the vars we add and the gc()'ing allowed it to drop to 3.4 gig. rm()'ing the var that stored the results of these 5 get.vars.for.cluster() calls and then gc()'ing did not further reduce the memory stamp. rm()'ing everything from the global namespace and gc()'in dropped this further to 3.1 gig, but no further. Adding lines to remove the r and r.all vars and gc() w/in the get.vars.for.cluster() function reduced the running memory footprint to a range of 3.5-4.4 gig, and detach()'ing, rm()'ing and gc()'ing dropped it down to 2.8. The odd thing is that calling gc() at this point reports that R is using far less memory than 'top' reports, for example, from one of my tests: gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 380537 20.46366476 340.1 380701 20.4 Vcells 88715615 676.9 277437132 2116.7 88715745 676.9 versus top: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 1199 waltman 17 0 3844m 3.7g 3216 S 0.0 23.6 29:58.74 R Given this behavior, I can only assume that there's a memory leak in R, at least for v.2.5.0. I'll try to get a version of v.2.4.x installed somewhere to see if I see a similar behavior with it. Peter Martin Morgan wrote: Hi Peter -- Here's my guess. Ironically, adding things to broken code reduces the signal to noise ratio. I ended up with get.vars.for.cluster = function( cluster, genes=get.global(gene.ids ), ratios=get.global(ratios)) { cluster - cluster rows - cluster$rows cols - cluster$cols r - ratios[ rows, cols ] avg.rows - apply( r, 2, mean, na.rm=TRUE ) r.all - ratios[ genes, cols ] devs - apply( r.all, 1, -, avg.rows ) apply( devs, 2, var, na.rm=TRUE ) } at what might reproduce your problem (though can't be sure!). The unusual bit is cluster - cluster At first I thought this would be a no-op (assigning cluster to itself), but apparently at this point in the code cluster does not exist in the environment of the function (just in the call) so cluster gets assigned outside the function. So then my guess is that get.vars.for.cluster is part of a package, and the package has a variable called cluster. get.vars.for.cluster then assigns its first argument to the package variable cluster (which is the first variable called cluster that - encounters). rm(list=ls(all=TRUE)) removes everything from the global environment, but (fortunately!) not from the package environment. You might end up storing more than 'just' cluster, depending on what it is. So I think the solution is to rethink the use of - (and also the get.global(), which are either for convenience (in which case it would probably be better to specify a default for the function argument) or out of a sense that copying is bad (but this is probably mistaken, since R's semantics are 'copy on change', so passing a 'big' object into a function does not usually trigger
Re: [R] Urgent Help needed
Thanks all. it works. Just one more thing: if you look to this out put, by(data1[,2:3], data1[,4], cor)[1] $`1` XY X 1.0000.4400451 Y 0.4400451 1.000 Q. How I can just pick the value of the correlation 0.4400451 from this output and call it sat corxy. Once again thank you all so much for your helps. Abou == AbouEl-Makarim Aboueissa, Ph.D. Assistant Professor of Statistics Department of Mathematics Statistics University of Southern Maine 96 Falmouth Street P.O. Box 9300 Portland, ME 04104-9300 Tel: (207) 228-8389 Email: [EMAIL PROTECTED] [EMAIL PROTECTED] Office: 301C Payson Smith Henrique Dallazuanna [EMAIL PROTECTED] 8/16/2007 2:05 PM For the 2nd item, perhaps: by(df[,1:2], df$index, FUN=cor) where df is your data.frame. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O On 16/08/07, AbouEl-Makarim Aboueissa [EMAIL PROTECTED] wrote: Dear All: Urgent help is needed. I have a data set in matrix format of three columns: X, Y and index of four groups (1,2,3,4). What I need to do is the following; 1- How I can subtract the sample mean of each group indexed 1,2,3,4 from the corresponding data values of this group and create new columns say X-sample mean and Y-sample mean? I tried to use the tapply but I have some difficulties to restore the new data 2- How I can use the tapply if possible or any other R-function to find the correlation coefficient between the X and Y columns for each group indexed 1,2,3,4.? Could not use the tapply. I attached part of the data as txt file. Thank you so much for your attention to this matter, and I look forward to hear from you soon. Regards, Abou Data: x y index 15807.2412.54 15752.5133.54 12893.7601.53 8426.88 22.23 5706.24 333 3 3982.08 560 2 3642.62 670 2 295.68 124 1 215.40 104 1 195.40 204 1 4240.21 22.42 1222.72 45.92 1142.26 23.62 63.00 90.11 1216.00 82.42 2769.60 111 2 1790.46 34.72 26.10 26.10 1 19676.830.994 10920.60203 3 6144.00 46 3 4534.48 4534.48 3 4.0065 4 29500.0056 4 17100.0077 4 9000.00 435 3 6300.00 84 3 3962.88 334 2 5690.00 653 3 3736.00 233 2 2750.00 22 2 1316.00 345 2 4595.00 4595.00 3 5928.00 45 3 2645.70 0.002 2580.24 454 2 6547.34 6547.34 3 1615.68 5 2 194.06 55 1 184.80 6 1 82.94 44 1 16649.0056 4 4500.00 74 3 1600.00 744 2 = == AbouEl-Makarim Aboueissa, Ph.D. Assistant Professor of Statistics Department of Mathematics Statistics University of Southern Maine 96 Falmouth Street P.O. Box 9300 Portland, ME 04104-9300 Tel: (207) 228-8389 Email: [EMAIL PROTECTED] [EMAIL PROTECTED] Office: 301C Payson Smith __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with optimization using GENOUD
Dear Friends, I have been trying to learn how to use the derivative free optimization algorithms implemented in the package RGENOUD by Mebane and Sekhon. However, it does not seem to work for reasons best described as my total ignorance. If anybody has experience using this package, it would be really helpful if you can point out where I'm making a mistake. Thanks in advance Anup Sample code attached library(rgenoud) nobs - 5000 t.beta - c(0,1,-1) X - as.matrix(cbind(rep(1, nobs), runif(nobs), runif(nobs))) # Creating the design matrix prodterm - (X%*%t.beta)+rnorm(nrow(X)) Y - as.matrix(ifelse(prodterm0, 0, 1)) # Defining the likelihood function log.like - function(beta, Y, X) { term1 - pnorm(X%*%beta) term2 - 1-term1 loglik - (sum(Y*log(term1))+sum((1-Y)*log(term2))) # Likelihood function to be maximized } stval - c(0,0,0) opt.output - optim(stval,log.like,Y=Y[,1], X=X[,1:3], hessian=T, method=BFGS, control=c(fnscale=-1,trace=1)) opt.output ### Now using GENOUD gives me errors genoud.output - genoud(log.like,beta=stval,X=X[,1:3], Y=Y[,1], nvars=3, pop.size=3000, max=TRUE) - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Urgent Help needed
Hi, try this: by(df[,1:2], df$index, FUN=function(x)cor(x[1],x[2])) -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O On 16/08/07, AbouEl-Makarim Aboueissa [EMAIL PROTECTED] wrote: Thanks all. it works. Just one more thing: if you look to this out put, by(data1[,2:3], data1[,4], cor)[1] $`1` XY X 1.0000.4400451 Y 0.4400451 1.000 Q. How I can just pick the value of the correlation 0.4400451 from this output and call it sat corxy. Once again thank you all so much for your helps. Abou == AbouEl-Makarim Aboueissa, Ph.D. Assistant Professor of Statistics Department of Mathematics Statistics University of Southern Maine 96 Falmouth Street P.O. Box 9300 Portland, ME 04104-9300 Tel: (207) 228-8389 Email: [EMAIL PROTECTED] [EMAIL PROTECTED] Office: 301C Payson Smith Henrique Dallazuanna [EMAIL PROTECTED] 8/16/2007 2:05 PM For the 2nd item, perhaps: by(df[,1:2], df$index, FUN=cor) where df is your data.frame. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O On 16/08/07, AbouEl-Makarim Aboueissa [EMAIL PROTECTED] wrote: Dear All: Urgent help is needed. I have a data set in matrix format of three columns: X, Y and index of four groups (1,2,3,4). What I need to do is the following; 1- How I can subtract the sample mean of each group indexed 1,2,3,4 from the corresponding data values of this group and create new columns say X-sample mean and Y-sample mean? I tried to use the tapply but I have some difficulties to restore the new data 2- How I can use the tapply if possible or any other R-function to find the correlation coefficient between the X and Y columns for each group indexed 1,2,3,4.? Could not use the tapply. I attached part of the data as txt file. Thank you so much for your attention to this matter, and I look forward to hear from you soon. Regards, Abou Data: x y index 15807.2412.54 15752.5133.54 12893.7601.53 8426.88 22.23 5706.24 333 3 3982.08 560 2 3642.62 670 2 295.68 124 1 215.40 104 1 195.40 204 1 4240.21 22.42 1222.72 45.92 1142.26 23.62 63.00 90.11 1216.00 82.42 2769.60 111 2 1790.46 34.72 26.10 26.10 1 19676.830.994 10920.60203 3 6144.00 46 3 4534.48 4534.48 3 4.0065 4 29500.0056 4 17100.0077 4 9000.00 435 3 6300.00 84 3 3962.88 334 2 5690.00 653 3 3736.00 233 2 2750.00 22 2 1316.00 345 2 4595.00 4595.00 3 5928.00 45 3 2645.70 0.002 2580.24 454 2 6547.34 6547.34 3 1615.68 5 2 194.06 55 1 184.80 6 1 82.94 44 1 16649.0056 4 4500.00 74 3 1600.00 744 2 = == AbouEl-Makarim Aboueissa, Ph.D. Assistant Professor of Statistics Department of Mathematics Statistics University of Southern Maine 96 Falmouth Street P.O. Box 9300 Portland, ME 04104-9300 Tel: (207) 228-8389 Email: [EMAIL PROTECTED] [EMAIL PROTECTED] Office: 301C Payson Smith __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Linear models over large datasets
I'd like to fit linear models on very large datasets. My data frames are about 200 rows x 200 columns of doubles and I am using an 64 bit build of R. I've googled about this extensively and went over the R Data Import/Export guide. My primary issue is although my data represented in ascii form is 4Gb in size (therefore much smaller considered in binary), R consumes about 12Gb of virtual memory. What exactly are my options to improve this? I looked into the biglm package but the problem with it is it uses update() function and is therefore not transparent (I am using a sophisticated script which is hard to modify). I really liked the concept behind the LM package here: http://www.econ.uiuc.edu/~roger/research/rq/RMySQL.html But it is no longer available. How could one fit linear models to very large datasets without loading the entire set into memory but from a file/database (possibly through a connection) using a relatively simple modification of standard lm()? Alternatively how could one improve the memory usage of R given a large dataset (by changing some default parameters of R or even using on-the-fly compression)? I don't mind much higher levels of CPU time required. Thank you in advance for your help. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] combining P values using Fisher's method
Hi All, Can somebody tell me how to use R to combine p values using Fisher's method? thanks. Jiong The email message (and any attachments) is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message (and any attachments). Thank You. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject)
hi, i'm new to R and i'm trying to port a quattro pro spreadsheet into R. spreadsheets have optional lower and upper limit parameters on the beta distribution function. i would like to know how to incorporate this with R's pbeta function. thanks in advance, mara. Park yourself in front of a world of choices in alternative vehicles. Visit [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error message when using zero-inflated count regression model in package zicounts
No, zeroinfl() in pscl did not give me any errors when I ran the model. So there may be a bug in zicounts. Only problem now is how to interpret the zeroinfl model. Am I correct in my understanding that zeroinfl runs both the poisson and binomial models without interactions? I'm thinking along those lines since no interaction terms appear in either model. If so, how do I check for any interactions? Thanks. Jim Milks Graduate Student Environmental Sciences Ph.D. Program 136 Biological Sciences Wright State University 3640 Colonel Glenn Hwy Dayton, OH 45435 On Aug 16, 2007, at 11:19 AM, Achim Zeileis wrote: On Thu, 16 Aug 2007, James R. Milks wrote: Dr. Stevens, I've double-checked my variable lengths. All of my variables (Total.vines, Site, Species, and DBH) came in at 549. I did correct one problem in the data entry that had escaped my previous notice: somehow the undergrad who entered all the data managed to make the Acer negundo data split into two separate categories while still appearing to use the same ACNE abbreviation. When I made that correction and re-ran zicounts, R gave me the following error messages: Hmm, I don't know about the error messages in zicounts, but you could try to use the zeroinfl() implementation in package pscl: vines.zip - zeroinfl(Total.vines ~ Site + Species + DBH | Site + Species + DBH, data = sycamores.1) and see whether this produces a similar error. Z vines.zip-zicounts(resp=Total.vines~.,x=~Site+Species+DBH,z=~Site +Species+DBH,distname=ZIP,data=sycamores.1) Error in ifelse(y == 0, 1, y/mu) : dim-: dims [product 12] do not match the length of object [549] In addition: Warning messages: 1: longer object length is not a multiple of shorter object length in: eta + offset 2: longer object length is not a multiple of shorter object length in: y/mu In addition, zicounts would not run a normal poisson regression on the data, giving me the same error messages as the ZIP regression. Doing a poisson regression with glm did not show any error messages. However, the glm model with full interactions was still over- dispersed. Could the zicounts problem be that the individual sites and species had different population sizes? For instance, Site A had 149 trees, site B had 55 trees, site C had 270 trees, and site D had 75 trees. The species had similar discrepancies in population sizes, with Platanus occidentalis and Acer negundo forming the majority of the trees. Thanks for your help. Jim Milks Graduate Student Environmental Sciences Ph.D. Program 136 Biological Sciences Wright State University 3640 Colonel Glenn Hwy Dayton, OH 45435 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear models over large datasets
Here are a couple of options that you could look at: The biglm package also has the bigglm function which you only call once (no update), you just need to give it a function that reads the data in chunks for you. Using bigglm with a gaussian family is equivalent to lm. You could also write your own function that calls biglm and the necessary updates on it, then just call your function. The SQLiteDF package has an sdflm function that uses the same internal code as biglm, but based on having the data stored in an sqlite database. You don't need to call update with this function. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alp ATICI Sent: Thursday, August 16, 2007 2:24 PM To: r-help@stat.math.ethz.ch Subject: [R] Linear models over large datasets I'd like to fit linear models on very large datasets. My data frames are about 200 rows x 200 columns of doubles and I am using an 64 bit build of R. I've googled about this extensively and went over the R Data Import/Export guide. My primary issue is although my data represented in ascii form is 4Gb in size (therefore much smaller considered in binary), R consumes about 12Gb of virtual memory. What exactly are my options to improve this? I looked into the biglm package but the problem with it is it uses update() function and is therefore not transparent (I am using a sophisticated script which is hard to modify). I really liked the concept behind the LM package here: http://www.econ.uiuc.edu/~roger/research/rq/RMySQL.html But it is no longer available. How could one fit linear models to very large datasets without loading the entire set into memory but from a file/database (possibly through a connection) using a relatively simple modification of standard lm()? Alternatively how could one improve the memory usage of R given a large dataset (by changing some default parameters of R or even using on-the-fly compression)? I don't mind much higher levels of CPU time required. Thank you in advance for your help. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear models over large datasets
On Thu, Aug 16, 2007 at 03:24:08PM -0500, Alp ATICI wrote: I'd like to fit linear models on very large datasets. My data frames are about 200 rows x 200 columns of doubles and I am using an 64 bit build of R. I've googled about this extensively and went over the R Data Import/Export guide. My primary issue is although my data represented in ascii form is 4Gb in size (therefore much smaller considered in binary), R consumes about 12Gb of virtual memory. One option is to simply buy more memory, which might work for you in this case, but in larger cases, is not scalable. I'm not sure how to make R happier with handling large datasets, but you may be able to use the power of random sampling to help you? Read the data from mysql, selecting a random 10% subset. This should use 1.2 Gb or so. You then fit the model to this subset. Repeat the procedure 100 times using independent samples. Now you have bootstrapped the coefficients of your model. Use the average value and standard deviation of the coefficients as your coefficient estimates and standard errors?? Since swapping is typically 1000 times slower or more than disk access, this process might take 1/10 of the time or less compared to letting the R process thrash its disk. It's a thought, not sure how well it works. -- Daniel Lakeland [EMAIL PROTECTED] http://www.street-artists.org/~dlakelan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combining P values using Fisher's method
On 08/16/07 12:50, Jiong Zhang, PhD wrote: Hi All, Can somebody tell me how to use R to combine p values using Fisher's method? thanks. This might get you started. It is for the sole purpose of combining two two-tailed p-values for effects in the same direction. Thus, it is not very general. I think it gives a one-tailed result. Better check it. combine - function(x,y) return(pchisq(-2*log(x/2)-2*log(y/2),4,low=F)) -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ADF test
The option alternative in adf.test() takes the value 'stationary' or 'explosive'. The value 'explosive' is used to test if the series is stationary about a linear time trend. This means that a constant and trend are to be included in the DF or ADF test regression. In the case here the series is trended. The question is if the trend is stochastic or deterministic. Have a look at the following analysis t=1:length(x) plot(t,x) trend = lm(x~t) abline(lm(x~t)) summary(trend) library(urca) x = ts(x, start=1, end = length(x), frequency=1) x.ct = ur.df(x,lags=0,type='trend') plot(x.ct) library(tseries) adf.test(x,alternative = explosive , k=0) summary(ur.df(x,lags=0,type='trend') which analyses your data using two different libraries. (x is your data and both procs. produce the same DF test). I should remark that your data are rounded and this possibly acts against a full analysis. Some knowledge of the data generating process might suggest a more appropriate way of testing for stationarity. On 16/08/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi Megh i hope you have confused with 'what is my NULL hypothesis' ? i suggest you to take any ideal dataset about which you know that whether it is stationary or not ? apply the test to know what is the NULL hypothesis used in any software :) usually in many softwares the NULL hypothesis is in negative sense. Please everybody comment on this :) hoping that you series is return series and not price series :). Thus applying adf test for your series :) my test show that your series is not stationary at all as my correlalogram comes as follows. 1 0.998283718 0.997582959 0.99703921 0.99665648 0.996548006 0.99647617 0.995925698 0.995317271 0.994746317 0.994727781 0.99508777 0.99501576 0.99437404 0.993338292 0.992684933 0.992310313 @@@ m Although if i assume that your series is a price series and defining return = 100*ln(pt/pt-1). Returns become as follows 0 -0.201816416 0.201816416 0 0.201409937 0 0 0 0 0.201005093 0 0 0 0 0.200601873 0 0.200200267 0.199800266 0 -0.199800266 0.199800266 0 0 0 0.199401861 -0.199401861 0.199401861 0 0.199005041 0 0 0 0 0.198609797 0 0 0 0 0 0.19821612 0 0.197824001 0 0 0.19743343 0 0 0 0.197044399 -0.197044399 0.197044399 0 0.196656897 0.196270917 0 -0.196270917 0.196270917 0 0 0 0 0 0 0 0.195886449 0 0 0 0 0 0 0.195503484 0 0 0 0.195122013 0.194742028 0 0 0 0 0.194363521 0 0 0 -0.194363521 0.194363521 0 0 0.193986482 0 0 0 0 0 0 0 0 0 0 0.193610903 0 0 0 0 0.193236775 0 0 0 0 0.192864091 0 0.192492841 0 0 0 0 0.192123018 0 0 0 0 0 0.191754613 0 0.191387618 0 0 0 0.191022026 0 0 0.190657827 -0.190657827 0 0.190657827 0.190295015 0 0 0 0 0 0 0.18993358 0 -0.18993358 0.18993358 0 0.189573516 0.189214815 0 0 0 0.188857469 0 0 0.18850147 0 0 0.18814681 0 0.187793482 0 then the value of autocorrelations i.e. correlalogram comes as approx 1 0.089252308 0.058227292 0.017934984 0.025264591 -0.014925678 -0.004668544 0.014890995 0.001625333 0.010669589 -0.010587179 -0.03000206 -0.011863654 0.00772247 0.024272208 -0.019521244 -0.035998575 -0.061608877 -0.048401231 -0.008594859 which show that the values are quite likely to make series stationary :) data[1:10,] V1 V2 1 4.96 0.000 2 4.95 -0.2018164 3 4.96 0.2018164 4 4.96 0.000 5 4.97 0.2014099 6 4.97 0.000 7 4.97 0.000 8 4.97 0.000 9 4.97 0.000 10 4.98 0.2010051 adf.test(data[,1]) Augmented Dickey-Fuller Test data: data[, 1] Dickey-Fuller = -1.1052, Lag order = 5, p-value = 0.9188 alternative hypothesis: stationary adf.test(data[,2]) Augmented Dickey-Fuller Test data: data[, 2] Dickey-Fuller = -6.2265, Lag order = 5, p-value = 0.01 alternative hypothesis: stationary Warning message: p-value smaller than printed p-value in: adf.test(data[, 2]) this explains everything clearly :) your NULL hypothesis is Series is not stationary - hence hypothesis in negative sense prooved by taking ideal data data1-rnorm(1) #normal data adf.test(data1) Augmented Dickey-Fuller Test data: data1 Dickey-Fuller = -21.2118, Lag order = 21, p-value = 0.01 alternative hypothesis: stationary Warning message: p-value smaller than printed p-value in: adf.test(data1) HTH Megh Dal [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 08/16/2007 04:27 PM To r-help@stat.math.ethz.ch cc Subject [R] ADF test Hi all, Hope you people do not feel irritated for repeatedly sending mail on Time series. Here I got another problem on the same, and hope I would get some answer from you. I have following dataset: data[,1] [1] 4.96 4.95 4.96 4.96 4.97 4.97 4.97 4.97 4.97 4.98 4.98 4.98 4.98 4.98 4.99 4.99 5.00 5.01 [19] 5.01 5.00 5.01 5.01 5.01
Re: [R] Linear models over large datasets
Its actually only a few lines of code to do this from first principles. The coefficients depend only on the cross products X'X and X'y and you can build them up easily by extending this example to read files or a database holding x and y instead of getting them from the args. Here we process incr rows of builtin matrix state.x77 at a time building up the two cross productxts, xtx and xty, regressing Income (variable 2) on the other variables: mylm - function(x, y, incr = 25) { start - xtx - xty - 0 while(start nrow(x)) { idx - seq(start + 1, min(start + incr, nrow(x))) x1 - cbind(1, x[idx,]) xtx - xtx + crossprod(x1) xty - xty + crossprod(x1, y[idx]) start - start + incr } solve(xtx, xty) } mylm(state.x77[,-2], state.x77[,2]) On 8/16/07, Alp ATICI [EMAIL PROTECTED] wrote: I'd like to fit linear models on very large datasets. My data frames are about 200 rows x 200 columns of doubles and I am using an 64 bit build of R. I've googled about this extensively and went over the R Data Import/Export guide. My primary issue is although my data represented in ascii form is 4Gb in size (therefore much smaller considered in binary), R consumes about 12Gb of virtual memory. What exactly are my options to improve this? I looked into the biglm package but the problem with it is it uses update() function and is therefore not transparent (I am using a sophisticated script which is hard to modify). I really liked the concept behind the LM package here: http://www.econ.uiuc.edu/~roger/research/rq/RMySQL.html But it is no longer available. How could one fit linear models to very large datasets without loading the entire set into memory but from a file/database (possibly through a connection) using a relatively simple modification of standard lm()? Alternatively how could one improve the memory usage of R given a large dataset (by changing some default parameters of R or even using on-the-fly compression)? I don't mind much higher levels of CPU time required. Thank you in advance for your help. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on output of vectors from a for loop into a matrix
Hello R-help, I am a recent convert to R from SAS and I am having some difficulty with output of a for loop into a matrix. I have searched the help manuals and the archives, but I can't get my code to work. It is probably a syntax error that I am not spotting. I am trying to make a distance matrix by extracting the distances from each point to all others in a vector (the for loop). I can get them all to print and can save each individually to a file, but I cannot get them to be bound into a matrix. Here is the code that I have tried: m-length(Day.1.flower.coords$x) #31 grid points output.matix-matrix(0.0,m,m) for(i in 1:m){ dist-spDistsN1(Day.1.coords.matrix, Day.1.coords.matrix[i,]) dist-as.vector(dist) output.matrix[i,]-dist print(dist)} The error message that I get is: Error in output.matrix[i,] - dist : incorrect number of subscripts on matrix Thanks for your help. Ryan ~~ Ryan D. Briscoe Runquist Population Biology Graduate Group University of California, Davis [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sample size computation for survival data
Does a package exist to help compute sample size for censored data? I am planning a study involving interval censored data. I know that there exists functions in stata for this, so I was wondering if R has similar facilities. So far I have not been able to find anything for R. Thanks for any help. Stacey - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on output of vectors from a for loop into a matrix
On Thursday 16 August 2007 15:35, Ryan Briscoe Runquist wrote: Hello R-help, I am a recent convert to R from SAS and I am having some difficulty with output of a for loop into a matrix. I have searched the help manuals and the archives, but I can't get my code to work. It is probably a syntax error that I am not spotting. I am trying to make a distance matrix by extracting the distances from each point to all others in a vector (the for loop). I can get them all to print and can save each individually to a file, but I cannot get them to be bound into a matrix. Here is the code that I have tried: m-length(Day.1.flower.coords$x) #31 grid points output.matix-matrix(0.0,m,m) for(i in 1:m){ dist-spDistsN1(Day.1.coords.matrix, Day.1.coords.matrix[i,]) dist-as.vector(dist) output.matrix[i,]-dist print(dist)} The error message that I get is: Error in output.matrix[i,] - dist : incorrect number of subscripts on matrix Thanks for your help. Ryan ~~ Ryan D. Briscoe Runquist Population Biology Graduate Group University of California, Davis [EMAIL PROTECTED] Hi Bryan, for all UCD students you have the luxury of not using a loop! :) Would something like this work - for the creation of a 'distance matrix' : # example dataset: 2 x 2 grid d - expand.grid(x=1:2, y=1:2) # an instructive plot plot(d, type='n') text(d, rownames(d)) # create distance object and convert to matrix: m - as.matrix(dist(d)) m 1234 1 0.00 1.00 1.00 1.414214 2 1.00 0.00 1.414214 1.00 3 1.00 1.414214 0.00 1.00 4 1.414214 1.00 1.00 0.00 # is that the kind of distance matrix you are looking for : ?dist Distance Matrix Computation Description: This function computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix. [...] cheers, Dylan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dylan Beaudette Soils and Biogeochemistry Graduate Group University of California at Davis 530.754.7341 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combine matrix
This does not work in the general case. To see this, try: rownames(a}[5] - a and see what happens then. --- François Pinard [EMAIL PROTECTED] wrote: [Gianni Burgin] let say something like this a=matrix(1:25, nrow=5) rownames(a)=letters[1:5] colnames(a)=rep(A, 5) a A A A A A a 1 6 11 16 21 b 2 7 12 17 22 c 3 8 13 18 23 d 4 9 14 19 24 e 5 10 15 20 25 b=matrix(1:40, nrow=8) rownames(b)=c(rep(a,4),rep(b,4)) colnames(b)=rep(B, 5) b B B B B B a 1 9 17 25 33 a 2 10 18 26 34 a 3 11 19 27 35 a 4 12 20 28 36 b 5 13 21 29 37 b 6 14 22 30 38 b 7 15 23 31 39 b 8 16 24 32 40 as a results I wold like something like A A A A A B B B B B a 1 6 11 16 21 1 9 17 25 33 a 1 6 11 16 21 2 10 18 26 34 a 1 6 11 16 21 3 11 19 27 35 a 1 6 11 16 21 4 12 20 28 36 b 2 7 12 17 22 5 13 21 29 37 b 2 7 12 17 22 6 14 22 30 38 b 2 7 12 17 22 7 15 23 31 39 b 2 7 12 17 22 8 16 24 32 40 does it is clear? is there a function that automate this operation? Like, maybe: cbind(a[rownames(b),], b) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function to find coodinates in an array
A not very good solution is as below: If your array's dimensions were KxMxN and the linear index is i then n - ceiling(i/(K*M)) i1 - i - (n-1)*(K*M) m - ceiling(i1/K) k - i1 - (m-1)*K and your index is (k,m,n) I am almost sure that there is a function in R which does this (it exists in Matlab). Regards, Moshe. --- Ana Conesa [EMAIL PROTECTED] wrote: Dear list, I am looking for a function/way to get the array coordinates of given elements in an array. What I mean is the following: - Let X be a 3D array - I find the ordering of the elements of X by ord - order(X) (this returns me a vector) - I now want to find the x,y,z coordinates of each element of ord Can anyone help me? Thanks! Ana __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function to find coodinates in an array
See arrayIndex() in the R.utils package, e.g. X - array((2*3*4):1, dim=c(2,3,4)) idx - 1:length(X) ijk - arrayIndex(idx, dim=dim(X)) print(ijk) [,1] [,2] [,3] [1,]111 [2,]211 [3,]121 [4,]221 [5,]131 [6,]231 [7,]112 [8,]212 [9,]122 10,]222 11,]132 12,]232 13,]113 14,]213 15,]123 16,]223 17,]133 18,]233 19,]114 20,]214 21,]124 22,]224 23,]134 24,]234 /Henrik On 8/16/07, Moshe Olshansky [EMAIL PROTECTED] wrote: A not very good solution is as below: If your array's dimensions were KxMxN and the linear index is i then n - ceiling(i/(K*M)) i1 - i - (n-1)*(K*M) m - ceiling(i1/K) k - i1 - (m-1)*K and your index is (k,m,n) I am almost sure that there is a function in R which does this (it exists in Matlab). Regards, Moshe. --- Ana Conesa [EMAIL PROTECTED] wrote: Dear list, I am looking for a function/way to get the array coordinates of given elements in an array. What I mean is the following: - Let X be a 3D array - I find the ordering of the elements of X by ord - order(X) (this returns me a vector) - I now want to find the x,y,z coordinates of each element of ord Can anyone help me? Thanks! Ana __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function to find coodinates in an array
If I am correctly understanding the problem, I think that this is what you want: set.seed(1) # Create a 3x3x3 array ARR - array(sample(100, 27), c(3, 3, 3)) ARR , , 1 [,1] [,2] [,3] [1,] 27 89 97 [2,] 37 20 62 [3,] 57 86 58 , , 2 [,1] [,2] [,3] [1,]6 61 43 [2,] 19 34 88 [3,] 16 67 83 , , 3 [,1] [,2] [,3] [1,] 32 17 21 [2,] 63 51 29 [3,] 75 101 # Get the ordered indices of the elements in the array order(ARR) [1] 27 10 24 12 22 11 5 25 1 26 19 14 2 16 23 3 9 13 8 20 15 21 [23] 18 6 17 4 7 # Get the actual array elements in order ARR[order(ARR)] [1] 1 6 10 16 17 19 20 21 27 29 32 34 37 43 51 57 58 61 62 63 67 75 [23] 83 86 88 89 97 # Now loop over the above and using which(), get the 3D indices t(sapply(ARR[order(ARR)], function(x) which(ARR == x, arr.ind = TRUE))) [,1] [,2] [,3] [1,]333 [2,]112 [3,]323 [4,]312 [5,]123 [6,]212 [7,]221 [8,]133 [9,]111 [10,]233 [11,]113 [12,]222 [13,]211 [14,]132 [15,]223 [16,]311 [17,]331 [18,]122 [19,]231 [20,]213 [21,]322 [22,]313 [23,]332 [24,]321 [25,]232 [26,]121 [27,]131 See ?which and take note of the arr.ind argument. HTH, Marc Schwartz On Thu, 2007-08-16 at 19:21 -0700, Moshe Olshansky wrote: A not very good solution is as below: If your array's dimensions were KxMxN and the linear index is i then n - ceiling(i/(K*M)) i1 - i - (n-1)*(K*M) m - ceiling(i1/K) k - i1 - (m-1)*K and your index is (k,m,n) I am almost sure that there is a function in R which does this (it exists in Matlab). Regards, Moshe. --- Ana Conesa [EMAIL PROTECTED] wrote: Dear list, I am looking for a function/way to get the array coordinates of given elements in an array. What I mean is the following: - Let X be a 3D array - I find the ordering of the elements of X by ord - order(X) (this returns me a vector) - I now want to find the x,y,z coordinates of each element of ord Can anyone help me? Thanks! Ana __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] N.cohen.kappa function for polytomous outcomes
Is there a function analogous to N.cohen.kappa (concord package) which computes sample size needed for a Kappa statistic with polytomous outcomes? I have just started using R and so far I am finding it is missing alot of things that Stata has. - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combine matrix
Try this. We convert to data frame placing the row names in column 1, do the merge, remove column 1 and convert back to matrix: # test input a - matrix(1:25, nrow = 5, dimnames = list(letters[1:5], rep(A, 5))) b - matrix(1:40, nrow = 8, dimnames = list(rep(letters[1:2], each = 4), rep(B, 5))) # 1. process to.DF - function(x) data.frame(rn = row.names(x), x, row.names = 1:nrow(x)) out - as.matrix(merge(to.DF(a), to.DF(b), by = 1)[,-1]) colnames(out) - c(colnames(a), colnames(b)) out # 2. same but merge is done using sqldf # assume same a, b and to.DF as before library(sqldf) DFa - to.DF(a) DFb - to.DF(b) out - as.matrix(sqldf(select * from DFa join DFb using(rn))[-1]) colnames(out) - c(colnames(a), colnames(b)) out # 3. same but uses sqldf and proto (which sqldf automatically loads) # assume same a, b and to.DF as before library(sqldf) out - as.matrix(sqldf(select * from a join b using(rn), envir = proto(a = to.DF(a), b = to.DF(b)))[-1]) colnames(out) - c(colnames(a), colnames(b)) out On 8/16/07, Gianni Burgin [EMAIL PROTECTED] wrote: let say something like this a=matrix(1:25, nrow=5) rownames(a)=letters[1:5] colnames(a)=rep(A, 5) a A A A A A a 1 6 11 16 21 b 2 7 12 17 22 c 3 8 13 18 23 d 4 9 14 19 24 e 5 10 15 20 25 b=matrix(1:40, nrow=8) rownames(b)=c(rep(a,4),rep(b,4)) colnames(b)=rep(B, 5) b B B B B B a 1 9 17 25 33 a 2 10 18 26 34 a 3 11 19 27 35 a 4 12 20 28 36 b 5 13 21 29 37 b 6 14 22 30 38 b 7 15 23 31 39 b 8 16 24 32 40 as a results I wold like something like A A A A A B B B B B a 1 6 11 16 21 1 9 17 25 33 a 1 6 11 16 21 2 10 18 26 34 a 1 6 11 16 21 3 11 19 27 35 a 1 6 11 16 21 4 12 20 28 36 b 2 7 12 17 22 5 13 21 29 37 b 2 7 12 17 22 6 14 22 30 38 b 2 7 12 17 22 7 15 23 31 39 b 2 7 12 17 22 8 16 24 32 40 does it is clear? is there a function that automate this operation? thank you very much! On 8/16/07, jim holtman [EMAIL PROTECTED] wrote: Can you provide an example of what you mean; e.g., the two input matrices and the desired output. On 8/16/07, Gianni Burgin [EMAIL PROTECTED] wrote: Hi R user, I am new to R, and I have a very simple question for you. I have two matrix A and B, with internally redundant rownames (but variables are different). Some, but not all the rownames are shared among the two matrix. I want to create a greater matrix that combines the previuos two, and has all the possible combinations of matching rownames lines among matrix A and B. looking for the solution I bumped in merge but actually works on data.frame, and in dataframe there could be no redundancy in names. can you help me?? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function to find coodinates in an array
[Ana Conesa] I am looking for a function/way to get the array coordinates of given elements in an array. What I mean is the following: - Let X be a 3D array - I find the ordering of the elements of X by ord - order(X) (this returns me a vector) - I now want to find the x,y,z coordinates of each element of ord [Moshe Olshansky] If your array's dimensions were KxMxN and the linear index is i then n - ceiling(i/(K*M)) i1 - i - (n-1)*(K*M) m - ceiling(i1/K) k - i1 - (m-1)*K and your index is (k,m,n) The reshape package might be helpful, here. If I understand the problem correctly, given this artificial example: X - sample(1:24) dim(X) - c(2, 3, 4) you would want: library(reshape) melt(X)[order(X), -4] so getting the indices in a three columns data frame. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function to find coodinates in an array
Get the indices using expand.grid and then reorder them: set.seed(1); X - array(rnorm(24), 2:4) # input X # look at X do.call(expand.grid, sapply(dim(X), seq))[order(X),] On 8/16/07, Ana Conesa [EMAIL PROTECTED] wrote: Dear list, I am looking for a function/way to get the array coordinates of given elements in an array. What I mean is the following: - Let X be a 3D array - I find the ordering of the elements of X by ord - order(X) (this returns me a vector) - I now want to find the x,y,z coordinates of each element of ord Can anyone help me? Thanks! Ana __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] comparison of arrays of strings
Hi i have two arrays of genes names,one with18 gene names and the other with 24000 gene names,I have to compare both of them for finding common names. I have both the arrays in .csv format.i loaded the files and tried to compare them using for and if loops but I got the error Error in Ops.factor(cgh[i, 1], cgh[j, 2]) : level sets of factors are different Please suggest me how to solve this problem or any other alternative procedure Thanks ramakanth Get the freedom to save as many mails as you wish. To know how, go to http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] for plots
Hi, All, I am a beginner for R. Now I have installed R 2.5.1 in Window environment. After I run a program such as gam I would like to display a plot for the object. The following is an example. When I did this, only the last plot was presented on my screen. How can I get a plot before the last plot? I mean if the object has several plots how can I get those? gam.object - gam(y ~ s(x,6) + z,data=gam.data) plot(gam.object,se=TRUE) Thank you. Brad. Dr. Guicheng (Brad) Zhang Senior Research Officer School of Paediatrics and Child Health Telethon Institute for Child Health Research 100 Roberts Road, Subiaco Western Australia, 6008 AUSTRALIA Email: [EMAIL PROTECTED] Phone: 93407896 Fax: 93882097 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comparison of arrays of strings
Read them into 2 different vectors and then use 'intersect'. On 8/17/07, ramakanth reddy [EMAIL PROTECTED] wrote: Hi i have two arrays of genes names,one with18 gene names and the other with 24000 gene names,I have to compare both of them for finding common names. I have both the arrays in .csv format.i loaded the files and tried to compare them using for and if loops but I got the error Error in Ops.factor(cgh[i, 1], cgh[j, 2]) : level sets of factors are different Please suggest me how to solve this problem or any other alternative procedure Thanks ramakanth Get the freedom to save as many mails as you wish. To know how, go to http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] for plots
Turn 'Recording on for the plots. windows(record=TRUE) or select from the GUI. On 8/17/07, Brad Zhang [EMAIL PROTECTED] wrote: Hi, All, I am a beginner for R. Now I have installed R 2.5.1 in Window environment. After I run a program such as gam I would like to display a plot for the object. The following is an example. When I did this, only the last plot was presented on my screen. How can I get a plot before the last plot? I mean if the object has several plots how can I get those? gam.object - gam(y ~ s(x,6) + z,data=gam.data) plot(gam.object,se=TRUE) Thank you. Brad. Dr. Guicheng (Brad) Zhang Senior Research Officer School of Paediatrics and Child Health Telethon Institute for Child Health Research 100 Roberts Road, Subiaco Western Australia, 6008 AUSTRALIA Email: [EMAIL PROTECTED] Phone: 93407896 Fax: 93882097 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.