[R] USING VSS
hi, i start working with PCA method i found VSS package it is very helpfull it show me residual matrix with the number of components to extract now i want to verify the normality of residual matrix and than to simulate can every one help me please . - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with loops and how R stores data
On 02-Feb-08 21:16:41, R-novice wrote: I am trying to make an array c(3,8) that contains the averages of what is in another array c(9,8). I want to average rows 1:3, 4:6, 7:9 and have a loop replace the generic 1:24 values in my array with the average of the three rows. The problem I am having is that R only replaces the last value from the loops, and I am not experienced enough with R to know how it stores data when looping and how to overcome this problem. By having it print the avg, I was expecting to see 24 numbers, but it only gives me one. Here is my code. cts.sds-array(1:24,c(3,8)) for(i in 3){ for(j in 8){ avg-sum(exprdata[3*i-2:3*i,j]/3) cts.sds[i,j]-avg print(avg) } } print(cts.sds) Any help with this pesky matter will be greatly appreciated. I think you have made two types of mistake here. 1. You wrote for(i in 3) and for(i in 3). This means that 'i' will take only one value, namely 3; and 'j' will take only one value, namely 8. If you wanted to loop over i = 1,2,3 and j=1,2,3,4,5,6,7,8 then you should write for(i in (1:3)){ for(j in (1:8){ ... }} 2. This is more subtle, and you're far from the only person to have tripped over this one. Because of the order of precedence of the operators, in the epression 3*i-2:3*i for each value of 'i' it will first evaluate 2:3, namely {2,3}, and then evaluate the remainder. So, for say i=2, you will get two numbers 3*i-2*i = 2 and 3*i-3*i = 0, whereas what you presumably intended was (3*i-2):(3*i) = 4:6 = {4,5,6}. The way to avoid this trap is to use parantheses to force the order of evaluation you want: avg-sum(exprdata[(3*i-2):(3*i,j)]/3) Example: i-2 3*i-2:3*i # [1] 2 0 (3*i-2):(3*i) # [1] 4 5 6 The reason is that the : operator takes precedence over the binary operators *, + and -. Enter ?Syntax to get a listing of the various operators in order of precedence. Hoping this helps, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 04-Feb-08 Time: 08:18:05 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] New version 2.6.1- the menu is written in german
What OS is this? If Windows, see rw-FAQ Q3.2 Otherwise, see 'R Installation and Administration' Chapter 7. On Sat, 2 Feb 2008, cris wrote: Dear all, i have recently update R in my computer. To my surprise the menu bar is written in German language. This is the first time this happen to me, usually it is English the language used. I download the last version from several mirrors from different English and Spanish spoken countries, but the result is the same; when opening R, the menu appears in German. Can someone tell me how should i change to English language? Thanks for your help, Cristin García __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need help
Use a working directory where you have write permission. On Sat, 2 Feb 2008, [EMAIL PROTECTED] wrote: Good afternoon, I've installed R on Suse 10.3 compiling the source files. I opne the program on shell configuration and i have a problem saving the workspace and *.R or *.Rdata files. I 've read the manual and written the comand as it says but the output is: save.image() Errore in gzfile(file, wb) : impossibile aprire la connessione Inoltre: Warning message: In gzfile(file, wb) : impossibile aprire il file compresso '.RDataTmp' an output error. So i've tried in windows, but i've had the same problem. How could i save the data and the files in R? I apologize for my english but it'not my language. Thanks a lot to everybody Danilo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counts of each column that are not NA, and/or greater than column means
On Mon, Feb 04, 2008 at 03:21:10PM +0800, Ng Stanley wrote: Hi, Given a test matrix, test - matrix(c(1,2,3,NA,2,3,NA,NA,2), 3,3) A) How to compute the counts of each column (excluding the NA) i.e., 3, 2, 1 apply(test, 2, function(x) sum(!is.na(x))) B) How to compute the counts of each column (excluding the NA) that are greater than the column means ? i.e., 1, 1, 0 apply(test, 2, function(x) sum(x mean(x, na.rm=TRUE), na.rm=TRUE)) In general, you need ?apply to calculate something for each row/column of a matrix. Gabor I could write a for loop, but hope to use better alternative. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Csardi Gabor [EMAIL PROTECTED]UNIL DGM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ARCH LM test for univariant time series
Dear All, one can visually inspect ARCH-effects by plotting acf/pacf of the squared residuals from an OLS-estimation. This can be as simple as a demeaned series. Further one can run an auxiliary regression by regressing q lagged squared values and a constant on the squared series itself. This test statistic (N-q)*R^2 is distributed as chisq with q degrees of freedom. Something along the lines: archlmtest - function (x, lags, demean = FALSE) { x - as.vector(x) if(demean) x - scale(x, center = TRUE, scale = FALSE) lags - lags + 1 mat - embed(x^2, lags) arch.lm - summary(lm(mat[, 1] ~ mat[, -1])) STATISTIC - arch.lm$r.squared * length(resid(arch.lm)) names(STATISTIC) - Chi-squared PARAMETER - lags - 1 names(PARAMETER) - df PVAL - 1 - pchisq(STATISTIC, df = PARAMETER) METHOD - ARCH LM-test result - list(statistic = STATISTIC, parameter = PARAMETER, p.value = PVAL, method = METHOD, data.name = deparse(substitute(x))) class(result) - htest return(result) } should work and yield equal results as mentioned earlier in this thread. Best, Bernhard Spencer, The warning message is sent from VAR, it basically lets you know that the data it used had no column names and it had to supply them using y1, y2, y3, etc. It can be suppressed by including options(warn=-1) in the function. Anyway, it seems that the p value from my function does not match FinMetrics'. I guess the function doesn't work... hmm... On 2/2/08, Spencer Graves [EMAIL PROTECTED] wrote: Dear Tom: Your revised function eliminates the discrepancy in the degrees of freedom but is still very different from the numbers reports on Tsay, p. 102: archTest(log(1+as.numeric(m.intc7303)), lag=12) ARCH test (univariate) data: Residual of y1 equation Chi-squared = 13.1483, df = 12, p-value = 0.3584 Warning message: In VAR(s, p = 1, type = const) : No column names supplied in y, using: y1, y2, y3, y4, y5, y6, y7, y8, y9, y10, y11, y12 , instead. TOM: What can you tell me about the warning message? Thanks for your help with this. Spencer Graves tom soyer wrote: Spencer, Sorry, I forgot that the default lag in arch is 16. Here is the fix. Can you try it again and see if it gives the correct (or at least similar compared to a true LM test) result? archTest=function(x, lags=12){ #x is a vector require(vars) s=embed(x,lags) y=VAR(s,p=1,type=const) result=arch(y,lags.single=lags,multi=F)$arch.uni[[1]] return(result) } Thanks and sorry about the bug. On 2/2/08, Spencer Graves [EMAIL PROTECTED] wrote: Dear Tom, Bernhard, Ruey: I can't get that to match Tsay's example, but I have other questions about that. 1. I got the following using Tom's 'archTest' function (below): archTest(log(1+as.numeric(m.intc7303)), lags=12) ARCH test (univariate) data: Residual of y1 equation Chi-squared = 10.8562, df = 16, p-value = 0.8183 Warning message: In VAR(s, p = 1, type = const) : No column names supplied in y, using: y1, y2, y3, y4, y5, y6, y7, y8, y9, y10, y11, y12 , instead. ** First note that the answer has df = 16, even though I supplied lags = 12. 2. For (apparently) this example, S-Plus FinMetrics 'archTest' function returned Test for ARCH Effects: LM Test. Null Hypothesis: no ARCH effects. Test Stat 43.5041, p.value 0.. Dist. under Null: chi-square with 12 degrees of freedom. 3. Starting on p. 101, Ruey mentioned the Lagrange multiplier test of Engle (1982), saying This test is equivalent to the usual F test for no regression, but refers it to a chi-square, not an F distribution. Clearly, there is a gap here, because the expected value of the F distribution is close to 1 [d2/(d2-2), where d2 = denominator degrees of freedom; http://en.wikipedia.org/wiki/F-distribution], while the expected value for a chi-square is the number of degrees of freedom Unfortunately, I don't feel I can afford the time to dig into this further right now. Thanks for your help. Spencer Graves tom soyer wrote: Spencer, how about something like this: archTest=function (x, lags= 16){ #x is a vector require(vars) s=embed(x,lags) y=VAR(s,p=1,type=const) result=arch(y,multi=F)$arch.uni[[1]] return(result) } can you, or maybe Bernhard, check and see whether this function gives the correct result? thanks, On 2/1/08, *Spencer Graves* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Hi, Tom: The 'arch' function in the 'vars' package is supposed to be able to do that. Unfortunately, I was unable to make it work for a univariate series. Bernhard Pfaff, the author of 'vars', said that if I read the code for 'arch', I could easily retrieve the necessary lines and put them
[R] Association Measures
Does anybody know if there is an implementation of Goodman-Kruskal lambda measures of association in R? Also, how we can analyze ordered contingency tables and compute the relative measures of associations in R? Thank in advance - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counts of each column that are not NA, and/or greater than column means
hits=-2.5 tests�YES_00,FORGED_RCVD_HELO X-USF-Spam-Flag: NO check the following options: test - matrix(c(1,2,3,NA,2,3,NA,NA,2), 3, 3) # A colSums(!is.na(test)) # B mat - test rep(colMeans(test, na.rm = TRUE), each = nrow(test)) colSums(!is.na(mat) mat) apply(test, 2, function(x) { mus - mean(x, na.rm = TRUE) sum(x mus, na.rm = TRUE) }) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm Quoting Ng Stanley [EMAIL PROTECTED]: Hi, Given a test matrix, test - matrix(c(1,2,3,NA,2,3,NA,NA,2), 3,3) A) How to compute the counts of each column (excluding the NA) i.e., 3, 2, 1 B) How to compute the counts of each column (excluding the NA) that are greater than the column means ? i.e., 1, 1, 0 I could write a for loop, but hope to use better alternative. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] setting up directorie and packages at the startup
Hi! Can someone help me how to set up a directory and load additional packages at the startup than I do not have to do it each time I go to R session. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] setting up directorie and packages at the startup
Lassana TOURE lassana.toure at ier.ml writes: Can someone help me how to set up a directory and load additional packages at the startup than I do not have to do it each time I go to R session. It depends on your system and the type of installation, but searching for .Rprofile in the manuals and CRAN might give you a starter. Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
[EMAIL PROTECTED] napsal dne 02.02.2008 07:10:37: bgchen wrote: r-help c(2.43, 3.22, 6.9, 3.03, 5.36, 6.9, 2.29, 6.13, 6.11, 4.25, 3.85, 5.09, 7.44, 2.86, 2.82, 3.64, 3.22, 7, 2.65, 4.5, 3.73, 5.09, 5.8, 7.87, 2.87, 2.9, 6.63, 6.8, 2.45, 7.68, 2.56, 2.54, 7.35, 4.61, 2.58, 3.27, 5.8, 3.13, 3.29, 2.56, 5.79, 2.67, 7.55, 7.13, 8.4, 6.16, 3.34, 6.49, 1.85, 7.4, 7.89, 3.3, 6.28, 6.52, 6.07, 4.26, 4.4, 3.1, 5.97, 6.69, 3.69, 4.62, 5.79, 2.54, 8.91, 3.48, 2.35, 7.28, 6.75, 6.97, 7.29, 2.28, 3.36, 4.79, 2.41, 7.93, 6.17, 2.19, 4.27, 7.17, 2.12, 3.05, 4.4, 5.35, 2.96, 3.54, 3.31, 7.28, 2.47, 2.27, 7.43, 3.69, 3.32, 2.61, 2.19, 5.67, 5.03, 7.58, 5.27, 2.25, 2.59, 4.16, 1.71, 3.64, 2.93, 7.2, 5.99, 7.68, 4.14, 5.83, 3.87, 7.29, 5.95, 3.4, 2.45, 2.85, 6.77, 3.58, 4.31, 5.83, 6.19, 2.56, 6.69, 7.2, 5.04, 4.34, 2.66, 7.68, 6.8, 2.5, 1.95, 7.12, 3.02, 2.58, 3.39, 3.78, 3.53, 4.42, 6.64, 2.19, 4.79, 5.75, 5.55, 5.57, 3.94, 7.27, 3.39, 6.64, 6.13, 4.55, 6.75, 3.79, 1.82, 5.95, 5.88, 2.64, 5.43, 7.9, 3.14, 5.38, 7.97, 5.83, 3.18, 2.47, 3.33, 6.98, 6.35, 6.1, 7.11, 7.34, 5.08, 2, 3.13, 2.96, 2.43, 2.12, 3.24, 3.13, 2.6, 5.15, 3.34, 7.07, 5.42, 3.1, 2.66, 2.42, 7.3, 3.28, 4.68, 5.9, 6.64, 4.79, 1.84, 6.78, 6.1, 3.29, 2.35, 2.16, 6.36, 6.41, 5.39, 2.48, 7.35, 5.87, 6.77, 5.66, 2.81, 5.27, 2.95, 7.01, 2.55, 3.4, 2.84, 2.35, 6.36, 4.31, 7.44, 3.35, 6.42, 5.86, 3.14, 6.95, 5.27, 2.05, 4.85, 3.37, 2.28, 7.59, 6.99, 5.68, 4.5, 2.81, 6.07, 5.27, 6.32, 5.25, 3.69, 3.89, 3.23, 3.5, 6.2, 6.78, 2.51, 5.87, 7, 6.99, 7.27, 7.62, 6.87, 2.38, 6.78, 3.11, 4.28, 7.57, 5.65, 3.32, 2.53, 3.09, 4.99, 2.19, 6.37, 2.68, 7.22, 2.47, 3.05, 4.92, 2.82, 7.72, 3.67, 6.76, 6.38, 3.18, 3.02, 6.2, 3.47, 3.02, 7.22, 2.21, 7.33, 7.58, 6.58, 2.98, 6.7, 2.64, 6, 2.72, 2, 6.35, 5.36, 6.74, 5.29, 6.22, 3.14, 7.15, 4.45, 2.34, 2.63, 4.62, 7.51, 7.4, 3.16, 5.99, 7.07, 4.85, 3.11, 4.6, 6.55, 2.63, 3.15, 5.16, 5.81, 3.99, 6.12, 4.08, 3.69, 3.32, 3.13, 2.41, 2.95, 2.88, 5.79, 5.83, 2.49, 3.8, 4.62, 3.71, 3.18, 7.41, 7.3, 6.51, 2.27, 7.15, 6.04, 8.11, 7.74, 7.09, 7.21, 2.54, 6.74, 3.48, 7.8, 6.41, 5.64, 5.11, 4.91, 2.25, 3.31, 3.95, 6.58, 5.38, 5.45, 3.46, 2.4, 5.79, 3.34, 3.32, 1.97, 2.49, 3.29, 5.69, 1.75, 4.57, 7.04, 5.29, 7.79, 7.84, 3.02, 4.36, 6.87, 2.14, 5.43, 4.11, 7.51, 2.13, 5.33, 2.91, 2.87, 5.95, 2.29, 2.08, 7.78, 3.96, 4.55, 2.82, 2.63, 3.13, 3.43, 5.96, 2.39, 4.98, 3.51, 6.26, 4.03, 2.41, 6.8, 7.64, 3.3, 2.66, 6.51, 3.33, 5.64, 2.5, 6.09, 2.63, 2.55, 2.52, 4.15, 5.3, 6.11, 5.13, 3.42, 7.42, 1.87, 6.64, 4.63, 3.25, 1.97, 5.44, 7.96, 6.67, 7.14, 5.12, 5.27, 7.48, 7.25, 2.51, 4.5, 2.87, 6.7, 2.09, 6.85, 2.16, 6.37, 3.01, 3.55, 4.5, 2.69, 3.52, 5.7, 3.02, 5.94, 7.38, 6.45, 5.33, 2.8, 4.83, 3.01, 6.34, 2.99, 1.98, 3.02, 3.11, 8.65, 6.38, 7.08, 7.74, 6.19, 7.97, 2.7, 5.05, 6.61, 3.75, 5.24, 6.59, 5.91, 2.02, 6.95, 3.05, 2.89, 5.09, 4.9, 6.78, 5.79, 5.42, 3.13, 3.97, 2.24, 5.91, 6.28, 3.44, 6.14, 2.41, 8.03, 3.39, 3.33, 3.21, 2.04, 4.49, 4.94, 3, 7.59, 6.84, 2.11, 6.36, 2.63, 2.01, 2.73, 4.94, 2.7, 2.65) 42 Jim Nice, have you seen Vogons nearby? Petr __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use classificators learned in R in
sich at gmx.de writes: I am interested in using R for machine learning (supervised classification). Currently, I have been investigating especially the rpart, tree, and randomForest package, and have achieved first results. are there any experiences, how the learned classificators could be used in e.g. C ? in other words, I want to transfer the learned predictor from R to C-code. You could use dput to write the representation to a file, and read it from C. Parsing could be nasty, though, so I would prefer to extract the relevant information (e.g. fit$cptable and fit$splits in the rpart example), and write it to a database in numeric form with additional information if required. Another option could be XML (package XML), and using one of the many XML libraries in C(++). Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] precision in seq
Hi everybody, this is a warning more than a question. I noticed that seq produces approximate results: seq(0,1,0.05)[19]==0.9 [1] TRUE seq(0,1,0.05)[20]==0.95 [1] FALSE seq(0,1,0.05)[21]==1 [1] TRUE seq(0,1,0.05)[20]-0.95 [1] 1.110223024625157e-16 I do not understand why 0.9 and 1 are correct (within some tolerance or strictly exact?) and 0.95 is not. this one works: ((0:20)/20)[20]==0.95 [1] TRUE Eric Elguero __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get points from SpatialPolygonsDataFrame
Takatsugu Kobayashi tkobayas at indiana.edu writes: try tmp- slot(ex_1.7.selected, 'polygons') sub.tmp - slot(tmp[[1]],'Polygons') [EMAIL PROTECTED] will get you there. taka Jarek Jasiewicz wrote: Milton Cezar Ribeiro wrote: Dear all, grd - GridTopology(c(1,1), c(1,1), c(10,10)) polys - as.SpatialPolygons.GridTopology(grd) centroids - coordinates(polys) x - centroids[,1] y - centroids[,2] z - 1.4 + 0.1*x + 0.2*y + 0.002*x*x ex_1.7 - SpatialPolygonsDataFrame(polys, data=data.frame(x=x, y=y, z=z, row.names=sapply(slot(polys, polygons), function(i) slot(i, ID ex_1.7.selected-ex_1.7[1,] slot(ex_1.7.selected,coords) A recent thread on the (more relevant for your purposes) R-sig-geo list provides code for doing at least part of what you say you want: https://stat.ethz.ch/pipermail/r-sig-geo/2008-January/003075 Here, it would be: library(sp) grd - GridTopology(c(1,1), c(1,1), c(10,10)) polys - as.SpatialPolygons.GridTopology(grd) centroids - coordinates(polys) x - centroids[,1] y - centroids[,2] z - 1.4 + 0.1*x + 0.2*y + 0.002*x*x ex_1.7 - SpatialPolygonsDataFrame(polys, data=data.frame(x=x, y=y, z=z, row.names=sapply(slot(polys, polygons), function(i) slot(i, ID pls - slot(ex_1.7, polygons) # from construction, we know that each Polygons object in pls has five 2D # coordinates and only one part, so we won't check: res - t(sapply(pls, function(x) c(slot(slot(x, Polygons)[[1]], coords # steps along the list of Polygons objects, pulling out the coords slot # of the first and only Polygon object in its Polygons slot, finally # flattening them from a matrix to a vector as you request, and transposing str(res) res[1,] Hope this helps, Roger __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Effect size of comparison of two levels of a factor in multiple linear regression
On 2/3/2008 10:09 AM, Christoph Mathys wrote: Dear R users, I have a linear model of the kind outcome ~ treatment + covariate where 'treatment' is a factor with three levels (0, 1, and 2), and the covariate is continuous. Treatments 1 and 2 both have regression coefficients significantly different from 0 when using treatment contrasts with treatment 0 as the baseline. I would now like to determine effect sizes (akin to Cohen's d in a two-sample comparison) for the comparison to baseline of treatments 1 and 2. I have illustrated a way to do this in the reproducible example below and am grateful for any comments on the soundness of what I'm doing. I have not yet found a way to determine confidence intervals for the effect sizes derived below and would appreciate tips on that. set.seed(123456) # Make session reproducible # Set up the treatment factor with three levels and 100 observations # each treatment - factor(c(rep(0, 100), rep(1, 100), rep(2, 100))) # Simulate outcomes outcome - rep(NA, 300) outcome[treatment==0] - rnorm(100, 10, 5) # baseline: mean=10, sd=5 outcome[treatment==1] - rnorm(100, 30, 5) # effect size 4 outcome[treatment==2] - rnorm(100, 40, 5) # effect size 6 # Check effect sizes (Cohen's d) cohens.d - function (x, y) {(mean(x)-mean(y))/sqrt((var(x)+var(y))/2) } cohens.d(outcome[treatment==1], outcome[treatment==0]) [1] 3.984774 cohens.d(outcome[treatment==2], outcome[treatment==0]) [1] 6.167798 # Sometimes standardized regression coefficients are recommended # for determining effect size but that clearly doesn't work here: coef(lm(scale(outcome) ~ treatment)) (Intercept) treatment1 treatment2 -1.2333661.4531522.246946 # The reason it doesn't work is that the difference of outcome # means is divided by the sd of *all* outcomes: (mean(outcome[treatment==1])-mean(outcome[treatment==0]))/sd(outcome) [1] 1.453152 (mean(outcome[treatment==2])-mean(outcome[treatment==0]))/sd(outcome) [1] 2.246946 How about scaling the outcome by the residual standard error from the unstandardized model? For example: library(MASS) cmat - diag(3) diag(cmat) - c(25,25,25) df - data.frame(Y = c(mvrnorm(100, mu=c(10,30,40), Sigma=cmat, empirical=TRUE)), TX = factor(rep(c(0,1,2), each=100))) fm1 - lm(Y ~ TX, data = df) fm2 - lm(scale(Y, scale=summary(fm1)$sigma) ~ TX, data = df) summary(fm1) Call: lm(formula = Y ~ TX, data = df) Residuals: Min 1Q Median 3Q Max -12.7260 -3.5215 -0.1982 3.4517 12.1690 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 10. 0.5000 20.00 2e-16 *** TX1 20. 0.7071 28.28 2e-16 *** TX2 30. 0.7071 42.43 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 5 on 297 degrees of freedom Multiple R-squared: 0.8627, Adjusted R-squared: 0.8618 F-statistic: 933.3 on 2 and 297 DF, p-value: 2.2e-16 summary(fm2) Call: lm(formula = scale(Y, scale = summary(fm1)$sigma) ~ TX, data = df) Residuals: Min 1Q Median 3Q Max -2.54521 -0.70431 -0.03965 0.69034 2.43380 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -3. 0.1000 -33.33 2e-16 *** TX1 4. 0.1414 28.28 2e-16 *** TX2 6. 0.1414 42.43 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1 on 297 degrees of freedom Multiple R-squared: 0.8627, Adjusted R-squared: 0.8618 F-statistic: 933.3 on 2 and 297 DF, p-value: 2.2e-16 confint(fm2) 2.5 %97.5 % (Intercept) -3.530132 -3.136535 TX1 3.721685 4.278315 TX2 5.721685 6.278315 I've never seen this approach before, and I'm not how it would work when the variances within groups are heterogeneous or one or more covariates are added to the model. hope this helps, Chuck # Now, create a situation where Cohen's d is impossible to # calculate directly by introducing a continuous covariate: covariate - 1:300 outcome - outcome + rnorm(300, covariate, 2) model - lm(scale(outcome) ~ treatment + scale(covariate)) coef(model) (Intercept) treatment1 treatment2 scale(covariate) -0.17204560.19942510.31671160.8753761 # Proposed way to determine effect size: simulate outcomes for each # treatment level assuming the covariate to have a fixed value (here # its mean value after standardization: zero) library(MCMCpack) no.of.sims - 1 sims.model - MCMCregress(model, mcmc=no.of.sims) sims.model[1:2,] (Intercept) treatment1 treatment2 scale(covariate) sigma2 [1,] -0.1780735 0.2024111 0.33952330.8682119 0.002617449 [2,] -0.1521623 0.1773623 0.29560530.8764573 0.003529013 sims.treat0 - rnorm(no.of.sims, sims.model[,(Intercept)], sqrt(sims.model[,sigma2])) sims.treat1 -
[R] How to search for packages
Hi everybody, I know this might be very off topic and it took me quite a while to up my courage to post this…. But I remember a thread some time ago about how we can find the packages we need to do specific tasks in R if we don’t know before hand which ones actually do it. Now all the packages are listed alphabetically on the web site. Since I am not very advanced in writing my own functions I relay heavily on work already done and only when I have no other choice I modify existing functions. Usually my modifications are only cosmetic. But sometimes I use lots of time to just read the descriptions of packages until I decide that maybe one will do close to what I want. I wonder if there is any way to improve how these packages are displayed on the site and help with this decision. I wonder if the community as a whole can come up with some broader categories such as Bayesian, spatial statistics, bootstrap, vegetation analysis, circular statistics, robust statistics, etc., and the authors of the package can choose 1 or 2 or how many categories they think their package fits the most. On the web page we can have a list of those very broad categories and within each category we can have in alpha order the packages themselves with their description and such as it is now. So if I am interested in vegetation analysis or environmental analysis but I never did it before I go to that category and see which packages are more geared towards that particular subject. For example it was by chance alone and some GOOGLE search that I discovered that the package labdsv has anything to do with vegetation analysis since first of course I looked at any package which might have “veg” or “env” in the title. I also realize that this might mean a lot of work, but R develops so rapidly that soon I think it will be unmanageable to just peruse the list of packages and read descriptions in order to choose which package to install, when you are not familiar with all of them. I hope I didn’t offend the community with this, I would be very sorry since actually I get lots of help here and I learnt a lot from you. I will remain forever greatful. Thanks, Monica _ [[elided Hotmail spam]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Association Measures
dt Excellent [EMAIL PROTECTED] wrote in news:[EMAIL PROTECTED]: Does anybody know if there is an implementation of Goodman-Kruskal lambda measures of association in R? Also, how we can analyze ordered contingency tables and compute the relative measures of associations in R? Thank in advance Laura Thompson's R (and S-PLUS) Manual to Accompany Agrestis Categorical Data Analysis (2002): 2nd edition has two versions. A search on Manual to Accompany Agresti produces multiple places where it can be found. -- David Winsemius __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use classificators learned in R in real-life, e.g. C
On Feb 3, 2008 2:16 PM, [EMAIL PROTECTED] wrote: Hi there, I am interested in using R for machine learning (supervised classification). Currently, I have been investigating especially the rpart, tree, and randomForest package, and have achieved first results. are there any experiences, how the learned classificators could be used in e.g. C ? in other words, I want to transfer the learned predictor from R to C-code. I don't know of any automated way to do this for those models, but the MARS model in the earth package can export a C-friendly version of the prediction function. See ?format.earth for details. The bagged MARS model in the caret package does the same (see ?format.bagEarth). Max -- Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] make dataframe from table
Perhaps: affect - as.data.frame(do.call('rbind', tapply(data$V2, data$V1, table))) merge(age, affect, by.x=1, by.y=0) On 04/02/2008, Boks, M.P.M. [EMAIL PROTECTED] wrote: Dear R-experts, I have got a dataframe: data ID disease V1 V2 1 p1 1 2 p1 3 3 p3 3 4 p3 5 5 p5 1 From which I extract a usefull table: affect affect 1 3 5 p1 1 1 0 p3 0 1 1 p5 1 0 0 I want to merge this with anotherdataframe: age p1 23 p2 24 p3 23 p4 11 p5 45 If have tried as.data.frame(affect) and other solutions to get the following comment going: merge(age,affect, by.x=0, by.y=1) QUESTION: I can get the merging process going with the outcome of the table command. Any help would be great. R for windows v2.6.1 code: data-as.data.frame(matrix(c(p1,p1,p3,p3,p5,1,3,3,5,1),5,2)) age-as.data.frame(matrix(c(p1,p2,p3,p4,p5,23,24,23,11,45),5,2 )) affect-table(data[,1],data[,2]) merge(age,affect, by.x=0, by.y=1) BW Marco [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rpart error when constructing a classification tree
Sorry, I wasn't very helpful. Let me try this again. I have attached a subsample of the data which still gives me the same error as when I use the full data file. I am trying to make a decision tree using rpart. This is my code and output. data - read.table(/Users/randygriffiths/Desktop/data, header=T) attach(data) library(rpart) bookings.cart - rpart(totalRev~., data=data, method=class) bookings.cart n= 50 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 50 15 0 (0.7 0.04 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02) * summary(bookings.cart) Call: rpart(formula = totalRev ~ ., data = data, method = class) n= 50 CP nsplit rel error 1 0 0 1 Error in yval[, 1] : incorrect number of dimensions I have used rpart on past projects without any problems. I can do a regression tree without any error as well (by leaving off the method=class argument). I have used this same data using 'randomForest' with good results and also using 'tree' without this error. I have seen that at least a few people have reported getting this same error with rpart, but I could not find an answer that helped the problem. Can anyone help me? Randy On Jan 30, 2008 2:33 AM, Uwe Ligges [EMAIL PROTECTED] wrote: Randy Griffiths wrote: I am trying to make a decision tree using rpart. The function runs very quickly considering the size of the data (1742, 163). When I call the summary command I get this: summary(bookings.cart) Call: rpart(formula = totalRev ~ ., data = bookings, method = class) n=1741 (1 observation deleted due to missingness) CP nsplit rel error 1 0 0 1 Error in yval[, 1] : incorrect number of dimensions And we get: R summary(bookings.cart) Error in summary(bookings.cart) : object bookings.cart not found hence we cannot reproduce and inspect your problem. Please read the posting guide. Uwe Ligges note: dim(bookings) [1] 1742 163 I have run rpart on past projects without any problems. I have used a catagorical version of the variable totalRev that was partitioned into four levels (and coded as a factor). I tried making a tree with the tree command (in the 'tree' package) and was able to construct a tree without any errors. However, I would much rather use rpart. Does anyone have any ideas that might help? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make reference to R in the method section in a scientific article?
On Mon, 4 Feb 2008, Falco tinnunculus wrote: Ft Dear all, Ft Ft How do I make reference to R in the method section in a scientific article? Ft Should I state the web aderess? Ft Ft And, is this the proper way to report the lme test? Ft Ft Ft The relationships were assumed to be linear. The response, handling time Ft and the explanatory variable, prey mass, were log transformed to obtain Ft normal error distributions. The lme function of the nlme package for R was Ft used to fit linear mixed-effects (LME) models, using restricted maximum Ft likelihood (REML). Model selection were computed by using the Akaike Ft Information Criteria (AIC). The overall test of the model was conducted with Ft ANOVA. Ft Ft Ft Ft Regards Kes Ft Check out citation function. This can be used to find the proper citation for R and for packages that have included a citation. For example: citation(package=MASS) Citing the url is usually a matter of journal style. Some journals now want the date that the url was visited as part of the citation. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] free float of swiss shares (streubesitzanteil)
dear list, i'd like to do some calculation for which i need the free float (german :streubesitzanteil) of a number of swiss firms' shares. but at swx.com i did only find very few information. does anyone know, if and where that information is available? best would be a website where it's extractable via regexp. thanks in advance! josuah __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] precision in seq
FAQ 7.31 On 2/4/08, Eric Elguero [EMAIL PROTECTED] wrote: Hi everybody, this is a warning more than a question. I noticed that seq produces approximate results: seq(0,1,0.05)[19]==0.9 [1] TRUE seq(0,1,0.05)[20]==0.95 [1] FALSE seq(0,1,0.05)[21]==1 [1] TRUE seq(0,1,0.05)[20]-0.95 [1] 1.110223024625157e-16 I do not understand why 0.9 and 1 are correct (within some tolerance or strictly exact?) and 0.95 is not. this one works: ((0:20)/20)[20]==0.95 [1] TRUE Eric Elguero __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Processing dates and generating sequences of dates
hits=-2.6 tests=BAYES_00 X-USF-Spam-Flag: NO On Mon, 2008-02-04 at 10:48 -0500, Gabor Grothendieck wrote: Using zoo's yearmon class: library(zoo) my.dates[!duplicated(as.yearmon(my.dates))] or, although you seem to disallow this in your question, this would be an option: my.dates[!duplicated(format(my.dates, %Y-%m))] Ah, actually, I spoke too soon. Your solutions return the following my.dates[!duplicated(format(my.dates, %Y-%m))] [1] 2005-05-01 2005-06-01 2005-07-01 2005-08-01 2005-09-01 [6] 2005-10-01 2005-11-01 2005-12-01 2006-01-01 2006-02-01 [11] 2006-03-01 2006-04-01 2006-05-01 2006-06-01 2006-07-01 [16] 2006-08-01 2006-09-01 2006-10-01 2006-11-01 2006-12-01 [21] 2007-01-01 2007-02-01 2007-03-01 2007-04-01 2007-05-01 [26] 2007-06-01 2007-07-01 which gives only the months sampled. What I need is a vector of length 36 covering 1st Jan 2005 through 31st Dec 2007 as in (using the seq() call in my original email): new.dates [1] 2005-01-01 2005-02-01 2005-03-01 2005-04-01 2005-05-01 [6] 2005-06-01 2005-07-01 2005-08-01 2005-09-01 2005-10-01 [11] 2005-11-01 2005-12-01 2006-01-01 2006-02-01 2006-03-01 [16] 2006-04-01 2006-05-01 2006-06-01 2006-07-01 2006-08-01 [21] 2006-09-01 2006-10-01 2006-11-01 2006-12-01 2007-01-01 [26] 2007-02-01 2007-03-01 2007-04-01 2007-05-01 2007-06-01 [31] 2007-07-01 2007-08-01 2007-09-01 2007-10-01 2007-11-01 [36] 2007-12-01 This just seems a bit kludgy: new.dates - seq(as.Date(paste(format(min(my.dates), format = %Y), 1, 1, sep = /)), as.Date(paste(format(max(my.dates), format = %Y), 12, 31, sep = /)), by = months) but perhaps there isn't a better way? Cheers, G On Feb 4, 2008 10:39 AM, Gavin Simpson [EMAIL PROTECTED] wrote: hits=-2.6 tests=BAYES_00 X-USF-Spam-Flag: NO Dear List, Say I have the following sequence of dates [*]: start - as.Date(2005-01-05, format = %Y-%d-%m) end - as.Date(2007-10-07, format = %Y-%d-%m) my.dates - seq(start, end, by = days) What I would like to generate is a sequence of dates, by month that goes from the first day of 2005 (the year of the start date) to the last day of 2007 (the year of the end date), so that the output is a vector of 36 dates containing all months of the three calendar years that the sampling spanned. I could do it via manipulation of dates as so: new.dates - seq(as.Date(paste(format(min(my.dates), format = %Y), 1, 1, sep = /)), as.Date(paste(format(max(my.dates), format = %Y), 12, 31, sep = /)), by = months) And then manipulate that to get only the month and year parts of the 36 generated dates. This doesn't seem very elegant to me. Is there a better/easier way than converting back and forth between characters and objects of class Date? Many thanks, G [*] FWIW, my actual application is similar to my.dates, but not sampled every day - indeed there are months where there are no samples - and I am trying to do a levelplot of the numbers of observations per month, per year. Given the following data dat - data.frame(temp = rnorm(length(my.dates)), my.dates = my.dates) dat$year - as.numeric(format(dat$my.dates, format = %Y)) dat$month - format(dat$my.dates, format = %b) dat$month - factor(dat$month, levels = c(Jan,Feb,Mar,Apr, May,Jun,Jul,Aug, Sep,Oct,Nov,Dec)) I can get a table of the number of observations per month per year via (obs.yearmon - with(dat, table(year, month))) Which when converted to a vector provides what I need for levelplot()'s formula method. Now I just need to generate the sequence of 36 months (Jan, Feb etc) and years to go with it. The matrix method of levelplot works, but I am having to hard code a lot of details that I believe the formula method will handle automagically for me. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] make dataframe from table
You need to create a data.frame in a different way is all Try this: df1 - data.frame(rownames(affect),matrix(affect,nrow=3)) merge(age,df1, by.x=1, by.y=1) --- Boks, M.P.M. [EMAIL PROTECTED] wrote: Dear R-experts, I have got a dataframe: data ID disease V1 V2 1 p1 1 2 p1 3 3 p3 3 4 p3 5 5 p5 1 From which I extract a usefull table: affect affect 1 3 5 p1 1 1 0 p3 0 1 1 p5 1 0 0 I want to merge this with anotherdataframe: age p1 23 p2 24 p3 23 p4 11 p5 45 If have tried as.data.frame(affect) and other solutions to get the following comment going: merge(age,affect, by.x=0, by.y=1) QUESTION: I can get the merging process going with the outcome of the table command. Any help would be great. R for windows v2.6.1 code: data-as.data.frame(matrix(c(p1,p1,p3,p3,p5,1,3,3,5,1),5,2)) age-as.data.frame(matrix(c(p1,p2,p3,p4,p5,23,24,23,11,45),5,2 )) affect-table(data[,1],data[,2]) merge(age,affect, by.x=0, by.y=1) BW Marco [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Looking for the perfect gift? Give the gift of Flickr! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to search for packages
On Feb 4, 2008 8:34 AM, Monica Pisica [EMAIL PROTECTED] wrote: Hi everybody, I know this might be very off topic and it took me quite a while to up my courage to post this…. But I remember a thread some time ago about how we can find the packages we need to do specific tasks in R if we don't know before hand which ones actually do it. Now all the packages are listed alphabetically on the web site. Since I am not very advanced in writing my own functions I relay heavily on work already done and only when I have no other choice I modify existing functions. Usually my modifications are only cosmetic. But sometimes I use lots of time to just read the descriptions of packages until I decide that maybe one will do close to what I want. I wonder if there is any way to improve how these packages are displayed on the site and help with this decision. I wonder if the community as a whole can come up with some broader categories such as Bayesian, spatial statistics, bootstrap, vegetation analysis, circular statistics, robust statistics, etc., and the authors of the package can choose 1 or 2 or how many categories they think their package fits the most. On the web page we can have a list of those very broad categories and within each category we can have in alpha order the packages themselves with their description and such as it is now. So if I am interested in vegetation analysis or environmental analysis but I never did it before I go to that category and see which packages are more geared towards that particular subject. For example it was by chance alone and some GOOGLE search that I discovered that the package labdsv has anything to do with vegetation analysis since first of course I looked at any package which might have veg or env in the title. Before Christmas I started working on a solution for this - http://crantastic.org - a site for searching, reviewing and tagging R packages. Unfortunately I've run out of steam lately (and the lack of a 64-bit ubuntu package for R means it's a bit out of date), but the basic ideas are there. If you like how the site is looking so far please let me know, as it will be motivation for me to get the site finished. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] adding the mean and standard deviation to boxplots
Dear list, How can I add the mean and standard deviation to each of the boxplots using the example provided in the boxplot function? boxplot(len ~ dose, data = ToothGrowth, boxwex = 0.25, at = 1:3 - 0.2, subset = supp == VC, col = yellow, main = Guinea Pigs' Tooth Growth, xlab = Vitamin C dose mg, ylab = tooth length, ylim = c(0, 35), yaxs = i) boxplot(len ~ dose, data = ToothGrowth, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2, subset = supp == OJ, col = orange) legend(2, 9, c(Ascorbic acid, Orange juice), fill = c(yellow, orange)) Thanks for any help, Tom - Jämför pris på flygbiljetter och hotellrum: http://shopping.yahoo.se/c-169901-resor-biljetter.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Choosing hardware
Im ordering a new computer to increase my ability to handle large data sets. Ive tried the dual core type and also the dual processor with dual cores each, and have not been satisfied. This seems to agree with all the other postings on the help list. I dont want to do any simulations, I just want to deal with a large data set and do things such as resampling time series data from 50,000 samples per second to 2,000 samples per second. And compute other things that are not easily done on my PC. What type of hardware do I want? I am leaning toward a single processor, since I have had a slow down with the dual core processors. It seems to me that the threading and clustering will not really help my situation. Does anyone have any suggestions? Thanks Todd Remund [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Choosing hardware
On Mon, 04-Feb-2008 at 10:42AM -0700, Todd Remund wrote: | | I'm ordering a new computer to increase my ability to handle large | data sets. I?ve tried the dual core type and also the dual | processor with dual cores each, and have not been satisfied. This | seems to agree with all the other postings on the help list. I | don?t want to do any simulations, I just want to deal with a large | data set and do things such as resampling time series data from | 50,000 samples per second to 2,000 samples per second. And compute | other things that are not easily done on my PC. What type of | hardware do I want? I am leaning toward a single processor, since | I have had a slow down with the dual core processors. It seems to | me that the threading and clustering will not really help my | situation. Does anyone have any suggestions? Sounds as though it's a memory limitation. If that's the case, you might do best with a 64bit machine and a decent amount of memory. If you're using Linux, it's easy since 64 bit has been working for years. If you have to use Windows (you didn't say), you might have to wait a bit longer. best -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_Middle minds discuss events (:_~*~_:)Small minds discuss people (_)-(_) . Anon ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to search for packages
hadley wickham h.wickham at gmail.com writes: Also it is staggering that there are over 1200 packages for R i was suspecting close to 1000 . And the site is a couple of weeks out of date, so there are probably even more. The real answer was Task Views on CRAN (most of the OQs topics *are* already Task Views), so crantastic is very partial. If you have a little time and want to really draw in the masses, try doing clickable image maps from the Bioconductor pkgDepTools: http://bioconductor.org/packages/2.1/bioc/html/pkgDepTools.html because some of the unobtrusive short-name packages are key nodes in package dependency graphs. The dependency trees are very illuminating. Automating the updates would be positive. If you could also run against the Task View listings, which are readily parsable, that would be very helpful. Roger The intent isn't to provide large in-depth reviews, but to make it easy to add comments like yours above. I agree it would be also nice to have some central place that lists all articles which cite a given R package, but that would be a lot of work. Hadley __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to search for packages
Your site is interesting the little example with the graphics is actually what i was thinking about. Now I've tagged 2 packages as robust analysis but if i press the Tags button i see only my tags - i don't see the graphics tags anymore. I would be extremely happy if you get enough motivation to continue this effort. Oops, that was a dumb bug in my code, now fixed. Also it is staggering that there are over 1200 packages for R i was suspecting close to 1000 . And the site is a couple of weeks out of date, so there are probably even more. About reviewing packages - i am not sure i am versed enough for that but at least it might be of interest that some packages do not like to be loaded together for some of their functions to work properly. The example i know of is robust and mvoutlier. I like both packages and i use them quite often but now i know when i can have both loaded together and when i have to detach one or another. On the other hand i've published some scientific articles in which i've used R and different packages. It might be of interest to have a place to cite these articles, even if their main focuss is not R itself, but R was used to perform the analysis. The intent isn't to provide large in-depth reviews, but to make it easy to add comments like yours above. I agree it would be also nice to have some central place that lists all articles which cite a given R package, but that would be a lot of work. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding the mean and standard deviation to boxplots
Not precisely what you asked for but see the notch= argument to boxplot for a graphic measure of variability. If you simply wish to print certain statistics below the numbers already on the X axis then see: https://stat.ethz.ch/pipermail/r-help/2008-January/152994.html On Feb 4, 2008 10:41 AM, Tom Cohen [EMAIL PROTECTED] wrote: Dear list, How can I add the mean and standard deviation to each of the boxplots using the example provided in the boxplot function? boxplot(len ~ dose, data = ToothGrowth, boxwex = 0.25, at = 1:3 - 0.2, subset = supp == VC, col = yellow, main = Guinea Pigs' Tooth Growth, xlab = Vitamin C dose mg, ylab = tooth length, ylim = c(0, 35), yaxs = i) boxplot(len ~ dose, data = ToothGrowth, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2, subset = supp == OJ, col = orange) legend(2, 9, c(Ascorbic acid, Orange juice), fill = c(yellow, orange)) Thanks for any help, Tom - Jämför pris på flygbiljetter och hotellrum: http://shopping.yahoo.se/c-169901-resor-biljetter.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] counting identical data in a column
Hi Peter I have the following data frame with chromosome name, start and end positions: chrN start end 1 chr1 11122333 11122633 2 chr1 11122333 11122633 3 chr3 11122333 11122633 8 chr3 111273334 111273634 7 chr2 12122334 12122634 4 chr1 21122377 21122677 5 chr2 33122355 33122655 6 chr2 33122355 33122655 I would like to count the positions that have the same start and add a new column with the count number; the new data frame should look like this: chrN start end count 1 chr1 11122333 11122633 3 2 chr1 11122333 11122633 3 3 chr3 11122333 11122633 3 8 chr3 111273334 111273634 1 7 chr2 12122334 12122634 1 4 chr1 21122377 21122677 1 5 chr2 33122355 33122655 2 6 chr2 33122355 33122655 2 Can you please show me how to achieve this? Thanks Joseph Be a better friend, newshound, and [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding the mean and standard deviation to boxplots
As for the standard deviation, are you sure you want this? Standard deviation only makes sense if the data are normally distributed... -- -- This is false, of course. What you probably meant to say is something like: The sample standard deviation may not tell you what you think it tells you if the underlying data are not distributed something like a normal distribution. Yes. How very sloppy of me. Consider my wrist slapped. Regards, Richie. Mathematical Sciences Unit HSL ATTENTION: This message contains privileged and confidential inform...{{dropped:20}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ARCH LM test for univariant time series
Thanks Bernhard for the beautiful code!! On 2/4/08, Pfaff, Bernhard Dr. [EMAIL PROTECTED] wrote: Dear All, one can visually inspect ARCH-effects by plotting acf/pacf of the squared residuals from an OLS-estimation. This can be as simple as a demeaned series. Further one can run an auxiliary regression by regressing q lagged squared values and a constant on the squared series itself. This test statistic (N-q)*R^2 is distributed as chisq with q degrees of freedom. Something along the lines: archlmtest - function (x, lags, demean = FALSE) { x - as.vector(x) if(demean) x - scale(x, center = TRUE, scale = FALSE) lags - lags + 1 mat - embed(x^2, lags) arch.lm - summary(lm(mat[, 1] ~ mat[, -1])) STATISTIC - arch.lm$r.squared * length(resid(arch.lm)) names(STATISTIC) - Chi-squared PARAMETER - lags - 1 names(PARAMETER) - df PVAL - 1 - pchisq(STATISTIC, df = PARAMETER) METHOD - ARCH LM-test result - list(statistic = STATISTIC, parameter = PARAMETER, p.value = PVAL, method = METHOD, data.name = deparse(substitute(x))) class(result) - htest return(result) } should work and yield equal results as mentioned earlier in this thread. Best, Bernhard Spencer, The warning message is sent from VAR, it basically lets you know that the data it used had no column names and it had to supply them using y1, y2, y3, etc. It can be suppressed by including options(warn=-1) in the function. Anyway, it seems that the p value from my function does not match FinMetrics'. I guess the function doesn't work... hmm... On 2/2/08, Spencer Graves [EMAIL PROTECTED] wrote: Dear Tom: Your revised function eliminates the discrepancy in the degrees of freedom but is still very different from the numbers reports on Tsay, p. 102: archTest(log(1+as.numeric(m.intc7303)), lag=12) ARCH test (univariate) data: Residual of y1 equation Chi-squared = 13.1483, df = 12, p-value = 0.3584 Warning message: In VAR(s, p = 1, type = const) : No column names supplied in y, using: y1, y2, y3, y4, y5, y6, y7, y8, y9, y10, y11, y12 , instead. TOM: What can you tell me about the warning message? Thanks for your help with this. Spencer Graves tom soyer wrote: Spencer, Sorry, I forgot that the default lag in arch is 16. Here is the fix. Can you try it again and see if it gives the correct (or at least similar compared to a true LM test) result? archTest=function(x, lags=12){ #x is a vector require(vars) s=embed(x,lags) y=VAR(s,p=1,type=const) result=arch(y,lags.single=lags,multi=F)$arch.uni[[1]] return(result) } Thanks and sorry about the bug. On 2/2/08, Spencer Graves [EMAIL PROTECTED] wrote: Dear Tom, Bernhard, Ruey: I can't get that to match Tsay's example, but I have other questions about that. 1. I got the following using Tom's 'archTest' function (below): archTest(log(1+as.numeric(m.intc7303)), lags=12) ARCH test (univariate) data: Residual of y1 equation Chi-squared = 10.8562, df = 16, p-value = 0.8183 Warning message: In VAR(s, p = 1, type = const) : No column names supplied in y, using: y1, y2, y3, y4, y5, y6, y7, y8, y9, y10, y11, y12 , instead. ** First note that the answer has df = 16, even though I supplied lags = 12. 2. For (apparently) this example, S-Plus FinMetrics 'archTest' function returned Test for ARCH Effects: LM Test. Null Hypothesis: no ARCH effects. Test Stat 43.5041, p.value 0.. Dist. under Null: chi-square with 12 degrees of freedom. 3. Starting on p. 101, Ruey mentioned the Lagrange multiplier test of Engle (1982), saying This test is equivalent to the usual F test for no regression, but refers it to a chi-square, not an F distribution. Clearly, there is a gap here, because the expected value of the F distribution is close to 1 [d2/(d2-2), where d2 = denominator degrees of freedom; http://en.wikipedia.org/wiki/F-distribution], while the expected value for a chi-square is the number of degrees of freedom Unfortunately, I don't feel I can afford the time to dig into this further right now. Thanks for your help. Spencer Graves tom soyer wrote: Spencer, how about something like this: archTest=function (x, lags= 16){ #x is a vector require(vars) s=embed(x,lags) y=VAR(s,p=1,type=const) result=arch(y,multi=F)$arch.uni[[1]] return(result) } can you, or maybe Bernhard, check and see whether this function gives the correct result? thanks, On 2/1/08, *Spencer Graves* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Hi, Tom: The 'arch' function in the 'vars' package is supposed to be able to do that.
Re: [R] counting identical data in a column
Is this what you want? x - read.table(textConnection( chrN start end + 1 chr1 11122333 11122633 + 2 chr1 11122333 11122633 + 3 chr3 11122333 11122633 + 8 chr3 111273334 111273634 + 7 chr2 12122334 12122634 + 4 chr1 21122377 21122677 + 5 chr2 33122355 33122655 + 6 chr2 33122355 33122655), header=TRUE) x$count - ave(x$start, x$start, FUN=length) x chrN start end count 1 chr1 11122333 11122633 3 2 chr1 11122333 11122633 3 3 chr3 11122333 11122633 3 8 chr3 111273334 111273634 1 7 chr2 12122334 12122634 1 4 chr1 21122377 21122677 1 5 chr2 33122355 33122655 2 6 chr2 33122355 33122655 2 On 2/4/08, joseph [EMAIL PROTECTED] wrote: Hi Peter I have the following data frame with chromosome name, start and end positions: chrN start end 1 chr1 11122333 11122633 2 chr1 11122333 11122633 3 chr3 11122333 11122633 8 chr3 111273334 111273634 7 chr2 12122334 12122634 4 chr1 21122377 21122677 5 chr2 33122355 33122655 6 chr2 33122355 33122655 I would like to count the positions that have the same start and add a new column with the count number; the new data frame should look like this: chrN start end count 1 chr1 11122333 11122633 3 2 chr1 11122333 11122633 3 3 chr3 11122333 11122633 3 8 chr3 111273334 111273634 1 7 chr2 12122334 12122634 1 4 chr1 21122377 21122677 1 5 chr2 33122355 33122655 2 6 chr2 33122355 33122655 2 Can you please show me how to achieve this? Thanks Joseph Be a better friend, newshound, and [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to search for packages
Hi I think this is a good suggestion. And I would like to add the associated problem of deciding between packages that do the same function which one is better. Or similarly packages are often superceded. I find I have to spend a lot of time learning how to use packages to decide which one is most suitable; or I spend time with a package to eventually find out it was superceded some time ago or has been incorporated in another package. It is also difficult to know how packages fit together (or not). Functionality often overlaps or is duplicated. I have had some difficulty with microarray packages - not being sure if they complement each other or are exclusive to each other. (I know that is more Bioconductor but the same principle applies in pure R packages, though perhaps not as pronounced a problem). Regards JS --- -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Monica Pisica Sent: 04 February 2008 14:34 To: r-help@r-project.org Subject: [R] How to search for packages Hi everybody, I know this might be very off topic and it took me quite a while to up my courage to post this But I remember a thread some time ago about how we can find the packages we need to do specific tasks in R if we don't know before hand which ones actually do it. Now all the packages are listed alphabetically on the web site. Since I am not very advanced in writing my own functions I relay heavily on work already done and only when I have no other choice I modify existing functions. Usually my modifications are only cosmetic. But sometimes I use lots of time to just read the descriptions of packages until I decide that maybe one will do close to what I want. I wonder if there is any way to improve how these packages are displayed on the site and help with this decision. I wonder if the community as a whole can come up with some broader categories such as Bayesian, spatial statistics, bootstrap, vegetation analysis, circular statistics, robust statistics, etc., and the authors of the package can choose 1 or 2 or how many categories they think their package fits the most. On the web page we can have a list of those very broad categories and within each category we can have in alpha order the packages themselves with their description and such as it is now. So if I am interested in vegetation analysis or environmental analysis but I never did it before I go to that category and see which packages are more geared towards that particular subject. For example it was by chance alone and some GOOGLE search that I discovered that the package labdsv has anything to do with vegetation analysis since first of course I looked at any package which might have veg or env in the title. I also realize that this might mean a lot of work, but R develops so rapidly that soon I think it will be unmanageable to just peruse the list of packages and read descriptions in order to choose which package to install, when you are not familiar with all of them. I hope I didn't offend the community with this, I would be very sorry since actually I get lots of help here and I learnt a lot from you. I will remain forever greatful. Thanks, Monica _ [[elided Hotmail spam]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Processing dates and generating sequences of dates
hits=-2.6 tests=BAYES_00 X-USF-Spam-Flag: NO On Mon, 2008-02-04 at 10:48 -0500, Gabor Grothendieck wrote: Using zoo's yearmon class: library(zoo) my.dates[!duplicated(as.yearmon(my.dates))] or, although you seem to disallow this in your question, this would be an option: my.dates[!duplicated(format(my.dates, %Y-%m))] Gabor, Many thanks. Both are exactly what I was looking for. My head was stuck very firmly in a rut of using tools that work with Date objects and didn't think about using duplicated! The latter is especially fine because: 1. I didn't mean to imply that I didn't want to convert to/from character - just that my doing so repeatedly in the example didn't seem very elegant. 2. This whole thing may end up being wrapped into a function for use by people who have restricted computing environments, so a pure R solution is advantageous. All the best, G On Feb 4, 2008 10:39 AM, Gavin Simpson [EMAIL PROTECTED] wrote: hits=-2.6 tests=BAYES_00 X-USF-Spam-Flag: NO Dear List, Say I have the following sequence of dates [*]: start - as.Date(2005-01-05, format = %Y-%d-%m) end - as.Date(2007-10-07, format = %Y-%d-%m) my.dates - seq(start, end, by = days) What I would like to generate is a sequence of dates, by month that goes from the first day of 2005 (the year of the start date) to the last day of 2007 (the year of the end date), so that the output is a vector of 36 dates containing all months of the three calendar years that the sampling spanned. I could do it via manipulation of dates as so: new.dates - seq(as.Date(paste(format(min(my.dates), format = %Y), 1, 1, sep = /)), as.Date(paste(format(max(my.dates), format = %Y), 12, 31, sep = /)), by = months) And then manipulate that to get only the month and year parts of the 36 generated dates. This doesn't seem very elegant to me. Is there a better/easier way than converting back and forth between characters and objects of class Date? Many thanks, G [*] FWIW, my actual application is similar to my.dates, but not sampled every day - indeed there are months where there are no samples - and I am trying to do a levelplot of the numbers of observations per month, per year. Given the following data dat - data.frame(temp = rnorm(length(my.dates)), my.dates = my.dates) dat$year - as.numeric(format(dat$my.dates, format = %Y)) dat$month - format(dat$my.dates, format = %b) dat$month - factor(dat$month, levels = c(Jan,Feb,Mar,Apr, May,Jun,Jul,Aug, Sep,Oct,Nov,Dec)) I can get a table of the number of observations per month per year via (obs.yearmon - with(dat, table(year, month))) Which when converted to a vector provides what I need for levelplot()'s formula method. Now I just need to generate the sequence of 36 months (Jan, Feb etc) and years to go with it. The matrix method of levelplot works, but I am having to hard code a lot of details that I believe the formula method will handle automagically for me. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make reference to R in the method section in a scientific article? (fwd)
On Mon, 4 Feb 2008, Falco tinnunculus wrote: Ft Dear all, Ft Ft How do I make reference to R in the method section in a scientific article? Ft Should I state the web aderess? Ft Ft And, is this the proper way to report the lme test? Ft Ft Ft The relationships were assumed to be linear. The response, handling time Ft and the explanatory variable, prey mass, were log transformed to obtain Ft normal error distributions. The lme function of the nlme package for R was Ft used to fit linear mixed-effects (LME) models, using restricted maximum Ft likelihood (REML). Model selection were computed by using the Akaike Ft Information Criteria (AIC). The overall test of the model was conducted with Ft ANOVA. Ft Ft Ft Ft Regards Kes Ft Check out citation function. This can be used to find the proper citation for R and for packages that have included a citation. For example: citation(package=MASS) Citing the url is usually a matter of journal style. Some journals now want the date that the url was visited as part of the citation. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to search for packages
On Feb 4, 2008 12:17 PM, Dirk Eddelbuettel [EMAIL PROTECTED] wrote: On 4 February 2008 at 10:03, hadley wickham wrote: | Before Christmas I started working on a solution for this - | http://crantastic.org - a site for searching, reviewing and tagging R | packages. Unfortunately I've run out of steam lately (and the lack of | a 64-bit ubuntu package for R means it's a bit out of date), but the Err, that got fixed early (or mid-) January. See http://cran.r-project.org/bin/linux/ubuntu/ and/or the README therein: UBUNTU PACKAGES FOR R R packages for Ubuntu on i386 and amd64 are available. The plans are to support at least the latest two Ubuntu releases and the latest LTS release. As of January 2008, these are Gutsy Gibbon (7.10), Feisty Fawn (7.04) and Dapper Drake (6.06), respectively. Kudos goes to Vincent and Michael for their work on Ubuntu packages based on our Debian packages. Maybe that can help crantastic get back on track. If on the other hand the dreadful winter sapped your energy, well then I'm out of options for help. :) Ooops, I must have missed that. I'll try again later today. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to search for packages
The real answer was Task Views on CRAN (most of the OQs topics *are* already Task Views), so crantastic is very partial. If you have a little time and want I think crantastic and task views solve somewhat different problems (although I agree that crantastic should mirror the task views too). Task views are of considerable effort to set up, and often written by one person. Tags on crantastic are dead simple to set up and can be contributed to by multiple people (although I'm not yet sure what to do about potential conflicts) to really draw in the masses, try doing clickable image maps from the Bioconductor pkgDepTools: http://bioconductor.org/packages/2.1/bioc/html/pkgDepTools.html because some of the unobtrusive short-name packages are key nodes in package dependency graphs. The dependency trees are very illuminating. Automating the updates would be positive. If you could also run against the Task View listings, That's a good idea. I do mean to add dependency links in both direction, and to automate the updates I just need to get a cron job set up. Hadley Roger The intent isn't to provide large in-depth reviews, but to make it easy to add comments like yours above. I agree it would be also nice to have some central place that lists all articles which cite a given R package, but that would be a lot of work. Hadley __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding the mean and standard deviation to boxplots
There are many ways to do it. The following will place a blue point on the boxplot at the mean, then print the mean at the bottom of the plot. In some plots I've gone too far and included median points and values as well. You could also put 95% CI on the same plot, but it would get perhaps too busy. # Plot boxplot of vitamin C subset bx - boxplot(len ~ dose, data = ToothGrowth, boxwex = 0.25, at = 1:3 - 0.2, subset = supp == VC, col = yellow, main = Guinea Pigs' Tooth Growth, xlab = Vitamin C dose mg, ylab = tooth length, ylim = c(0, 35), yaxs = i) # keep location at - c(1:length(bx$names)) # find means, plot as points SubTc -subset(ToothGrowth, supp == VC) means - by(SubTc$len, SubTc$dose, mean,na.rm=TRUE) points(at - 0.2, means, pch = 19, col = blue) # write mean values text(at - 0.1, 1, labels = formatC(means, format = f, digits = 1), pos = 2, cex = 1, col = red) # Orange juice subset boxplot(len ~ dose, data = ToothGrowth, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2, subset = supp == OJ, col = orange) # find means, plot as points SubTo -subset(ToothGrowth, supp == OJ) meano - by(SubTo$len, SubTo$dose, mean,na.rm=TRUE) points(at + 0.2,meano, pch = 19, col = blue) # write mean values text(at + 0.3, 1, labels = formatC(meano, format = f, digits = 1), pos = 2, cex = 1, col = orange) legend(2, 9, c(Ascorbic acid, Orange juice), fill = c(yellow, orange)) -- THT, I'm sure my R code is not as efficient as it could be. Harold Baize Butte County Department of Behavioral Health Tom Cohen-2 wrote: Dear list, How can I add the mean and standard deviation to each of the boxplots using the example provided in the boxplot function? boxplot(len ~ dose, data = ToothGrowth, boxwex = 0.25, at = 1:3 - 0.2, subset = supp == VC, col = yellow, main = Guinea Pigs' Tooth Growth, xlab = Vitamin C dose mg, ylab = tooth length, ylim = c(0, 35), yaxs = i) boxplot(len ~ dose, data = ToothGrowth, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2, subset = supp == OJ, col = orange) legend(2, 9, c(Ascorbic acid, Orange juice), fill = c(yellow, orange)) Thanks for any help, Tom - Jämför pris på flygbiljetter och hotellrum: http://shopping.yahoo.se/c-169901-resor-biljetter.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/adding-the-mean-and-standard-deviation-to-boxplots-tp15271398p15278974.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] extracting AIC scores from lmer and other objects
I have a slight conundrum. I'm attempting to write a scrip that will take a number of objects (lm, glm, and lmer) and return AIC scores and weights. I've run into 3 problems, and was wondering if anyone had any pointers. 1) is there any convenient way to extract the name of the objects? Simply, if I have a vector of objects c(my.lm, my.lmer) and I want to get a character vector c(my.lm, my.lmer), how would one do this? I see this as far simpler than jumping into the ugly details of each object type and extracting coefficient lists and such - not to mention tidier. 2) I'm repeatedly getting the error Error in UseMethod(logLik) : no applicable method for logLik in a variety of different contexts. The first is if I have to get an AIC for an lmer object. AIC(my.lmer) give me the error above. However, I can circumvent this with a very silly solution - myAIC-function(object) {a-logLik(object) return(-2*a[1] +2*attr(a, 'df'))} I use this, and I do not get an error. 3) I do, however, get the above error if I have a vector of model objects. So, again, if I have something like model.list-c(my.lm, my.lmer) or even just c(my.lm, my.lm2) and then call the following on the vector of models aiclist-vector for(index in 1:length(model.list)){ aiclist-c(aiclist, myAIC(model.list[index])) } it again yields the Error in UseMethod(logLik). Given that this is true either for lm, glm, or lmer objects, I'm guessing there's a more general issue here that I'm missing. Any pointers? Thanks! -Jarrett Jarrett Byrnes Population Biology Graduate Group, UC Davis Bodega Marine Lab 707-875-1969 http://www-eve.ucdavis.edu/stachowicz/byrnes.shtml [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to search for packages
Hadley, On Feb 4, 2008 5:03 PM, hadley wickham [EMAIL PROTECTED] wrote: [...] Before Christmas I started working on a solution for this - http://crantastic.org - a site for searching, reviewing and tagging R packages. Unfortunately I've run out of steam lately (and the lack of a 64-bit ubuntu package for R means it's a bit out of date), but the basic ideas are there. If you like how the site is looking so far please let me know, as it will be motivation for me to get the site finished. it's just amazing how you still find some time for things besides ggplot2 et al... Appreciating all your work so far, it'd be great to keep you motivated for crantastic as well. I think crantastic would be a good complement to task views: - more likely to be up-to-date when used by many people - reflecting the opinion/experiences of various users rather than that of a single task view maintainer Some additional benefit could be gained by adding a rating system and by allowing packages to be sorted by number of comments a package has received, average rating, etc. Such sorting options could even be useful within a view for certain selected tags, because I guess some tags - e.g. 'graphics' - might ultimately be given to a large number of packages). Cheers, Tobias __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] C-index
Hi I am using Cox regression to identify at risk groups. How can I get the C-index in R? Thanks, Bereket [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] C-index
Hi I am using Cox regression to identify at risk groups. How can I get the C-index in R? Thanks, Bereket '[EMAIL PROTECTED]' [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using image to show RGB image data ?
Hello all, I'm now using image() to show image data (in my case dumps of SOM weights) but would like to show RGB colour data, not just single z colour values. I've currently been using seq() to skip 4 values, so I can show the R, G or B channels separately as z. But is there a way I can show all three channels simultaneously as a proper colour image? Thanks, B. Bogart __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gc() and memory efficiency
I have a program which reads in a very large data set, performs some analyses, and then repeats this process with another data set. As soon as the first set of analyses are complete, I remove the very large object and clean up to try and make memory available in order to run the second set of analyses. The process looks something like this: 1) read in data set 1 and perform analyses rm(list=ls()) gc() 2) read in data set 2 and perform analyses rm(list=ls()) gc() ... But, it appears that I am not making the memory that was consumed in step 1 available back to the OS as R complains that it cannot allocate a vector of size X as the process tries to repeat in step 2. So, I close and reopen R and then drop in the code to run the second analysis. When this is done, I close and reopen R and run the third analysis. This is terribly inefficient. Instead I would rather just source in the R code and let the analyses run over night. Is there a way that I can use gc() or some other function more efficiently rather than having to close and reopen R at each iteration? I'm using Windows XP and r 2.6.1 Harold [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gc() and memory efficiency
On 4 February 2008 at 20:45, Doran, Harold wrote: | I have a program which reads in a very large data set, performs some analyses, and then repeats this process with another data set. As soon as the first set of analyses are complete, I remove the very large object and clean up to try and make memory available in order to run the second set of analyses. The process looks something like this: | | 1) read in data set 1 and perform analyses | rm(list=ls()) | gc() | 2) read in data set 2 and perform analyses | rm(list=ls()) | gc() | ... | | But, it appears that I am not making the memory that was consumed in step 1 available back to the OS as R complains that it cannot allocate a vector of size X as the process tries to repeat in step 2. | | So, I close and reopen R and then drop in the code to run the second analysis. When this is done, I close and reopen R and run the third analysis. | | This is terribly inefficient. Instead I would rather just source in the R code and let the analyses run over night. | | Is there a way that I can use gc() or some other function more efficiently rather than having to close and reopen R at each iteration? I haven't found one. Every (trading) I process batches of data with R, and the only reliable way I have found is to use fresh R sessions. Otherwise, the fragmented memory will eventually result in the all-too-familiar 'cannot allocate X mb' for rather small values of X relative to my total ram. C'est la vie. As gc() seems to help somewhat yet not 'sufficiently', fresh starts are an alternative help, And Rscript starts faster than the main R. Now, I happen to be partial to littler [1] which starts even faster, so I use that ( on Linux and am not sure if it can be built on Windows as we embed R directly and hence start faster than Rscript). But either one should help you with some batch files -- given you a way to run overnight. And once you start batching things, it is only a small step to regain efficiency by parallel execution using something like MPI or NWS Hth, Dirk [1] littler is the predecessor to Rscript by Jeff and myself. See either http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/LittleR or http://dirk.eddelbuettel.com/code/littler.html for more on littler and feel free to email us. -- Three out of two people have difficulties with fractions. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using image to show RGB image data ?
I'm now using image() to show image data (in my case dumps of SOM weights) but would like to show RGB colour data, not just single z colour values. You can do this fairly readily with ggplot2: install.packages(ggplot2) library(ggplot2) qplot(x, y, data=mydata, fill=rgb, geom=tile) + scale_fill_identity() (assuming that your variable containing the rgb colour is called rgb) If your data is originally in the matrix form used by image, see the examples on http://had.co.nz/ggplot2/geom_tile.html on how to change to the data.frame form used by ggplot. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Processing dates and generating sequences of dates
Here is another solution. It uses only R core functions. Note that as.Date(cut(x, years)) gives Jan 1 of x's year where x is of class Date: y - as.Date(cut(range(my.dates), years)) + c(0, 364) seq(y[1], y[2], months) On Mon, Feb 4, 2008 at 1:00 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: OK. Here are two zoo solutions and one using only core functions. In the first dd is the first of the month of Jan and month of Dec of the first and last year respectively so we apply seq.Dates to that. In the second yr is Jan of the first and Jan of the last year as yearmon class and we adjust, apply seq and convert that to Date. In the last we convert to POSIXlt and update the internals of that object finally converting back to Date and using seq. library(zoo) dd - as.Date(floor(as.yearmon(range(my.dates))) + c(0, 11/12)) seq(dd[1], dd[2], by = month) library(zoo) yr - floor(as.yearmon(range(my.dates))) as.Date(seq(yr[1], yr[2]+11/12, by = 1/12)) dd - as.POSIXlt(range(my.dates)) dd$mon - c(0, 11) dd$mday - c(1, 1) dd - as.Date(dd) seq(dd[1], dd[2], by = month) On Feb 4, 2008 12:09 PM, Gavin Simpson [EMAIL PROTECTED] wrote: On Mon, 2008-02-04 at 10:48 -0500, Gabor Grothendieck wrote: Using zoo's yearmon class: library(zoo) my.dates[!duplicated(as.yearmon(my.dates))] or, although you seem to disallow this in your question, this would be an option: my.dates[!duplicated(format(my.dates, %Y-%m))] Ah, actually, I spoke too soon. Your solutions return the following my.dates[!duplicated(format(my.dates, %Y-%m))] [1] 2005-05-01 2005-06-01 2005-07-01 2005-08-01 2005-09-01 [6] 2005-10-01 2005-11-01 2005-12-01 2006-01-01 2006-02-01 [11] 2006-03-01 2006-04-01 2006-05-01 2006-06-01 2006-07-01 [16] 2006-08-01 2006-09-01 2006-10-01 2006-11-01 2006-12-01 [21] 2007-01-01 2007-02-01 2007-03-01 2007-04-01 2007-05-01 [26] 2007-06-01 2007-07-01 which gives only the months sampled. What I need is a vector of length 36 covering 1st Jan 2005 through 31st Dec 2007 as in (using the seq() call in my original email): new.dates [1] 2005-01-01 2005-02-01 2005-03-01 2005-04-01 2005-05-01 [6] 2005-06-01 2005-07-01 2005-08-01 2005-09-01 2005-10-01 [11] 2005-11-01 2005-12-01 2006-01-01 2006-02-01 2006-03-01 [16] 2006-04-01 2006-05-01 2006-06-01 2006-07-01 2006-08-01 [21] 2006-09-01 2006-10-01 2006-11-01 2006-12-01 2007-01-01 [26] 2007-02-01 2007-03-01 2007-04-01 2007-05-01 2007-06-01 [31] 2007-07-01 2007-08-01 2007-09-01 2007-10-01 2007-11-01 [36] 2007-12-01 This just seems a bit kludgy: new.dates - seq(as.Date(paste(format(min(my.dates), format = %Y), 1, 1, sep = /)), as.Date(paste(format(max(my.dates), format = %Y), 12, 31, sep = /)), by = months) but perhaps there isn't a better way? Cheers, G On Feb 4, 2008 10:39 AM, Gavin Simpson [EMAIL PROTECTED] wrote: hits=-2.6 tests=BAYES_00 X-USF-Spam-Flag: NO Dear List, Say I have the following sequence of dates [*]: start - as.Date(2005-01-05, format = %Y-%d-%m) end - as.Date(2007-10-07, format = %Y-%d-%m) my.dates - seq(start, end, by = days) What I would like to generate is a sequence of dates, by month that goes from the first day of 2005 (the year of the start date) to the last day of 2007 (the year of the end date), so that the output is a vector of 36 dates containing all months of the three calendar years that the sampling spanned. I could do it via manipulation of dates as so: new.dates - seq(as.Date(paste(format(min(my.dates), format = %Y), 1, 1, sep = /)), as.Date(paste(format(max(my.dates), format = %Y), 12, 31, sep = /)), by = months) And then manipulate that to get only the month and year parts of the 36 generated dates. This doesn't seem very elegant to me. Is there a better/easier way than converting back and forth between characters and objects of class Date? Many thanks, G [*] FWIW, my actual application is similar to my.dates, but not sampled every day - indeed there are months where there are no samples - and I am trying to do a levelplot of the numbers of observations per month, per year. Given the following data dat - data.frame(temp = rnorm(length(my.dates)), my.dates = my.dates) dat$year - as.numeric(format(dat$my.dates, format = %Y)) dat$month - format(dat$my.dates, format = %b) dat$month - factor(dat$month, levels = c(Jan,Feb,Mar,Apr, May,Jun,Jul,Aug, Sep,Oct,Nov,Dec)) I can get a table of the number of observations per month per year via
Re: [R] gc() and memory efficiency
1) See ?Memory-limits: it is almost certainly memory fragmentation. You don't need to give the memory back to the OS (and few OSes actually do so). 2) I've never seen this running a 64-bit version of R. 3) You can easily write a script to do this. Indeed, you could write an R script to run multiple R scripts in separate processes in turn (via system(Rscript fileN.R) ). For example. Uwe Ligges uses R to script building and testing of packages on Windows. On Mon, 4 Feb 2008, Doran, Harold wrote: I have a program which reads in a very large data set, performs some analyses, and then repeats this process with another data set. As soon as the first set of analyses are complete, I remove the very large object and clean up to try and make memory available in order to run the second set of analyses. The process looks something like this: 1) read in data set 1 and perform analyses rm(list=ls()) gc() 2) read in data set 2 and perform analyses rm(list=ls()) gc() ... But, it appears that I am not making the memory that was consumed in step 1 available back to the OS as R complains that it cannot allocate a vector of size X as the process tries to repeat in step 2. So, I close and reopen R and then drop in the code to run the second analysis. When this is done, I close and reopen R and run the third analysis. This is terribly inefficient. Instead I would rather just source in the R code and let the analyses run over night. Is there a way that I can use gc() or some other function more efficiently rather than having to close and reopen R at each iteration? I'm using Windows XP and r 2.6.1 Harold [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Warnings from apply longer object length is not a multiple of shorter object length
Hi, What was I doing wrong, such that B) gave warning messages ? I want the computation of thr to be outside the apply function. A) Uses a simple matrix of 9 elements. No warning messages. data2_1 - matrix (c(1,2,3,NA,4,5,NA,NA,6), 3,3) mean - colMeans(data2_1, na.rm = TRUE) sd - sd(data2_1, na.rm = TRUE) for (i in 1:2) { thr - mean + i*sd num - apply(data2_1, 2, function(x) { temp - thr sum(x (temp), na.rm = TRUE)}) } B) Replace data2_1 with a larger matrix, gives 20 warning messages i.e., In x (temp) : = longer object length is not a multiple of shorter object length data2_1 - matrix(c(0.9584190, 0.2710325, -0.9495618, -0.1301772, -0.7539687, 0.5344464, -0.8205933, 0.1581723, -0.5351588, 0.04448065, 0.9936430, 0.2278786, -0.8160700, -0.3314779, -0.4047975, 0.1168152, -0.7458182, - 0.2231588, -0.5051651, -0.74871174, 0.9450363, 0.4797723, -0.9033313, - 0.5825065, 0.8523742, 0.7402795, -0.7134312, -0.8162558, 0.6345438, - 0.05704138), 3,10) mean - colMeans(data2_1, na.rm = TRUE) sd - sd(data2_1, na.rm = TRUE) for (i in 1:2) { thr - mean + i*sd num - apply(data2_1, 2, function(x) { temp - thr sum(x (temp), na.rm = TRUE)}) } C) To remove the 20 warning messages, I have to replace temp - thr with temp - mean + i*sd data2_1 - matrix(c(0.9584190, 0.2710325, -0.9495618, -0.1301772, -0.7539687, 0.5344464, -0.8205933, 0.1581723, -0.5351588, 0.04448065, 0.9936430, 0.2278786, -0.8160700, -0.3314779, -0.4047975, 0.1168152, -0.7458182, - 0.2231588, -0.5051651, -0.74871174, 0.9450363, 0.4797723, -0.9033313, - 0.5825065, 0.8523742, 0.7402795, -0.7134312, -0.8162558, 0.6345438, - 0.05704138), 3,10) mean - colMeans(data2_1, na.rm = TRUE) sd - sd(data2_1, na.rm = TRUE) for (i in 1:2) { thr - mean + i*sd num - apply(data2_1, 2, function(x) { temp - mean + i*sd sum(x (temp), na.rm = TRUE)}) } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.