Re: [R] Interactive/Dynamic plots with R
Hi Greg Ability to zoom in/out on x/y axis to begin with and then adding/removing data sets. Thanks! -Abhi On Fri, Feb 25, 2011 at 12:52 PM, Greg Snow greg.s...@imail.org wrote: What types of interaction do you want? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Abhishek Pratap Sent: Friday, February 25, 2011 12:37 AM To: r-help@r-project.org Subject: [R] Interactive/Dynamic plots with R Hi Guys In order to look at a dense plot I would like to have the capability to plot dynamic/interactive. Before I try rgobi which I heard can help me; I would like to take your opinion. Thanks! -Abhi __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using uniroot() with output from deriv() or D()
I need to find the root of the second derivative of many curves and do not want to cut and paste the expression results from the deriv() or D() functions every time. Below is an example. What I need to do is refer to fn2nd in the uniroot() function, but when I try something like uniroot(fn2nd,c(0,1)) I get an error, so I have resorted to pasting in the expression, but this is highly inefficient. Thanks, J y - c(9.9,10,10,9.5,9,6,3,1,0,0,0) b - seq(0,1,by=0.1) dat - data.frame(y = y, b = b) plot(y ~ b, xlim = c(0,1), ylim= c(-12,12)) fn - nls(y ~ asym/(1 + exp(p * b - q)),data = dat, list(asym=10,p=16,q=7), trace=T) curve(10.001/(1 + exp(14.094 * x - 7.551)), from = 0, to = 1, add = T) fn2 - expression(10.001/(1 + exp(14.094 * x - 7.551))) fn2nd - D(D(fn2, x), x) ex - seq.int(from=0, to=1, length.out = 1000) y1 - eval(fn2nd, envir = list(x = ex), enclos = parent.frame()) lines(ex, y1, type = l) r - uniroot(function(x) -(10.001 * (exp(14.094 * x - 7.551) * 14.094 * 14.094)/(1 + exp(14.094 * x - 7.551))^2 - 10.001 * (exp(14.094 * x - 7.551) * 14.094) * (2 * (exp(14.094 * x - 7.551) * 14.094 * (1 + exp(14.094 * x - 7.551/((1 + exp(14.094 * x - 7.551))^2)^2),interval=c(0,1),tol = 0.0001) r$root abline(h=0, col = red) abline(v=r$root, col = green) arrows(0.6, -2, r$root,0, length = 0.1, angle = 30, code = 2, col = red) text(0.765,-2.3,paste(b = ,r$root,sep=)) -- View this message in context: http://r.789695.n4.nabble.com/Using-uniroot-with-output-from-deriv-or-D-tp3325635p3325635.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Boxplot for X Vs Y variable grouped by ID
Dear All, I am new to R. I amazed by this software. I have a question regarding boxplot. I have three columns say X, Y,ID (X is time from 0 to 12 hrs, Y is a variable dependent on X) I can plot a simple boxplot if it is just one group. But I have 10 groups and I want to plot all of them in one graphs. Something like multiple boxplots in one graph. I am not in R mailing list please reply to me directly. Thank you very much for your help. -- Regards, Shankar Lanke Ph.D. University at Buffalo Office # 716-645-4853 Fax # 716-645-2886 Cell # 678-232-3567 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A problem about realized garch model
Hi, I am trying to write the Realized GARCH model with order (1,1) The model can be describe bellow: r_t = sqrt( h_t) * z_t logh_t = w + b*logh_(t-1) + r*logx_(t-1) logx_t = c + q*logh_t + t1*z_t +t2*(z_t ^2 -1) + u_t and z follow N(0,1) , u follow N(0, sigma.u^2) But I'm troubled with the simulation check for my code. After I simulate data from the model and estimate the data, I can't get precise estimation for my setting parameters. This is my simulation code: === sim-function(theta) { omega = theta[1] bet = theta[2] gam = theta[3] xi = theta[4] phi = theta[5] tau1 = theta[6] tau2 = theta[7] sigma.u = theta[8] n = theta[9] n.warm = 500 n = n+n.warm z = rnorm(n+1) u = rnorm((n+1),0,sigma.u) logh.pre = 1 logx.pre = xi+phi*logh.pre+tau1*z[1]+tau2*(z[1]^2-1)+u[1] logh = c(logh.pre,rep(0,n)) r = c(rep(0,(n+1))) logx = c(logx.pre,rep(0,n)) for(i in 2:(n+1)) { logh[i] = omega+bet*logh[(i-1)]+gam*logx[(i-1)] logx[i] = xi+phi*logh[i]+tau1*z[i]+tau2*(z[i]^2-1)+u[i] r[i] = sqrt(exp(logh[i]))*z[i] } gdata = rbind(r,exp(logx)) ans = gdata[,-(1:(n.warm+1))] ans } === This is my estimation code: LLH-function(data,theta) { r = data[1,] x = data[2,] n = dim(data)[2] logx = log(x) logh = rep(0,n) u = rep(0,n) garchDist = function(r, hh) { dnorm(x = r/hh)/hh } measureDist = function(u,sigma){dnorm(x = u/sigma)/sigma} omega = theta[1] bet = theta[2] gam = theta[3] xi = theta[4] phi = theta[5] tau1 = theta[6] tau2 = theta[7] sigma.u = theta[8] logh.pre = 0.1 logx.pre = 0.1 logh[1]-omega+bet*logh.pre+gam*logx.pre u[1]-logx[1]-xi-phi*logh[1]-tau1*(r[1]/sqrt(exp(logh[1])))-tau2*((r[1]/sqrt(exp(logh[1])))^2-1) for(i in 2:n) { logh[i] = omega+bet*logh[(i-1)]+gam*logx[(i-1)] u[i] = logx[i]-xi-phi*logh[i]-tau1*(r[i]/sqrt(exp(logh[i])))-tau2*((r[i]/sqrt(exp(logh[i])))^2-1) } h = exp(logh) hh = sqrt(h) llh = -sum(log(garchDist(r, hh)))-sum(log(measureDist(u,sigma.u))) llh } fitting-function(data) { LLH1-function(theta) {LLH(data,theta)} ini = c(0.1,0.6,0.3,0.07,1,-0.03,0.1,0.4) for(i in 1:10) { fit = optim(ini,LLH1,control=list(trace=2),hessian=T) ini = fit$par } hessian = fit$hessian std = sqrt(diag(solve(hessian))) A = rbind(fit$par,std) A } = Does anyone can point out that where I am wrong? Thank you very much! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Autocorrelation Rho Greater Than One
Dear R Users, Kindly advice me what's wrong in my programming. I'm using the Cochrane-Orcutt two stage procedure with Prais Wisten transformation, below is my R programming : Y-c(60.8,62.5,64.6,66.1,67.7,69.1,71.7,73.5,76.2,77.3,78.8,80.2,82.6,84.3,83.3,84.1,86.4,87.6,89.1,89.3,89.1, , + 89.3,90.4,90.3,90.7,92.0,94.9,95.2,96.5,95.0,96.2,97.4,100.0,99.7,99.0,98.7,99.4,100.5,105.2,108.0,112.0,113.5, + 115.7,117.7,119.0,120.2) X-c(48.9,50.6,52.9,55.0,56.8,58.8,61.2,62.5,64.7,65.0,66.3,69.0,71.2,73.4,72.3,74.8,77.1,78.5,79.3,79.3,79.2, , + 80.8,80.1,83.0,85.2,87.1,89.7,90.1,91.5,92.4,94.4,95.9,100.0,100.4,101.3,101.5,104.5,106.5,109.5,112.8,116.1, + 119.1,124.0,128.7,132.7,135.7) model-lm(Y~X) e-resid(model) mylag-function(e,d=1) { + n-length(e) + c(rep(NA,d),e)[1:n] + } n-length(e) e1-mylag(e) modele-lm(e~e1-1) rho-coef(modele) rho e1 0.8875926 n-length(e) xstar-c(X[1]*(1-rho^2)^0.5,X[2:n]-rho*X[1:(n-1)]) ystar-c(Y[1]*(1-rho^2)^0.5,Y[2:n]-rho*Y[1:(n-1)]) modelb-lm(ystar~xstar) bstar-coef(modelb) a-(bstar[[1]][[1]])/(1-rho) a[1:n]-a b-bstar[[2]][[1]] u-Y-(a+X*b) u-u myu-function(u,d=1) { + n-length(u) + c(rep(NA,d),u)[1:n] + } u1-myu(u) modelu-lm(u~u1-1) Rho-coef(modelu) Rho u1 1.029970 The answer should be less than one but I got 1.029970. Any correction in the programming part? Any preventive action on this matter? Thank you. Regards, Lim Hock Ann [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Boxplot for X Vs Y variable grouped by ID
On Feb 26, 2011, at 1:07 AM, Shankar Lanke wrote: Dear All, I am new to R. I amazed by this software. I have a question regarding boxplot. I have three columns say X, Y,ID (X is time from 0 to 12 hrs, Y is a variable dependent on X) I can plot a simple boxplot if it is just one group. But I have 10 groups and I want to plot all of them in one graphs. Something like multiple boxplots in one graph. I am not in R mailing list please reply to me directly. Work through the examples on the boxplot help page. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with RODBC sqlSave
Hi, I'm able to establish a successful odbc connection using RODBC 1.3-2 on Win 7 and R 2.12.0. But I'm getting the following error message when I try to save a data frame into the debase as shown below. library(RODBC) bbdb - odbcConnect(bbdb) odbcGetInfo(bbdb) # returns ok sqlSave(bbdb, USArrests, rownames = state, addPK=TRUE) # example from the RODBC manual Error in sqlSave(bbdb, USArrests, rownames = state, addPK = TRUE) : [RODBC] ERROR: Could not SQLExecDirect 'CREATE TABLE USArrests (state varchar(255) NOT NULL PRIMARY KEY, Murder double, Assault integer, UrbanPop integer, Rape double)' Thanks in advance, Lars. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sourcing a linux file with system
Dear R community, I would like to source a file in my linux system to set a few environment variables. I have tried: system(source /home/james/build//ccp4-6.1.13/include/ccp4.setup) and got: sh: source: not found In fact, using Sys.which(source) I get an empty string. How can I source that ccp4.setup file from within an R session? J Dr James Foadi PhD Membrane Protein Laboratory (MPL) Diamond Light Source Ltd Diamond House Harewell Science and Innovation Campus Chilton, Didcot Oxfordshire OX11 0DE Email: james.fo...@diamond.ac.uk Alt Email: j.fo...@imperial.ac.uk Personal web page: http://www.jfoadi.me.uk -- This e-mail and any attachments may contain confidential...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in different OS
Not sure exactly what the original poster was after, but for distinguishing when I am working on different machines with different OS, I use something like this: ### Set some state variables opSys - Sys.info()[sysname] if (opSys == Windows){ linux - FALSE } else { linux - TRUE } David Scott On 26/02/2011 10:00 a.m., Ista Zahn wrote: Hi, see ?R.version Something like if(version$os == mingw32) { path = /ABC} else { path = /DEF } might do it, but I'm not sure exactly what possible values version$os can take or what determines the value exactly. Best, Ista On Fri, Feb 25, 2011 at 1:23 PM, Hui Duhui...@dataventures.com wrote: Hi All, I have two Rs, one has been installed in Windows system and another one has been installed under UNIX system. Is there any environmental variable or function to tell me which R I am using? The reason that I need to know it is under different system, the data path could be different. I want to do something like if it is R under Windows path = /ABC else if it is R under UNIX, path = /DEF Any idea? Thanks. Best Regards, HXD [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sourcing a linux file with system
And what do you think 'source' does? Or system() does (check the help page)? Hint: source is not an external command on Linux, but a builtin command of some shells (e.g. csh, bash), but not of sh -- in sh the equivalent is '.'. On Sat, 26 Feb 2011, james.fo...@diamond.ac.uk wrote: Dear R community, I would like to source a file in my linux system to set a few environment variables. Hmm, how is that going to work? That will set environment variables in the shell launched by system(), not for the R process. The only way to set environment variables for the running R process is to call Sys.setenv. I have tried: system(source /home/james/build//ccp4-6.1.13/include/ccp4.setup) and got: sh: source: not found In fact, using Sys.which(source) I get an empty string. How can I source that ccp4.setup file from within an R session? J Dr James Foadi PhD Membrane Protein Laboratory (MPL) Diamond Light Source Ltd Diamond House Harewell Science and Innovation Campus Chilton, Didcot Oxfordshire OX11 0DE Email: james.fo...@diamond.ac.uk Alt Email: j.fo...@imperial.ac.uk Personal web page: http://www.jfoadi.me.uk -- This e-mail and any attachments may contain confidential...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in different OS
It is less clear what you are after, but the canonical way to decide if your R session is on Windows is .Platform$OS.type == windows Unlike {R.}version$os and Sys.info()[sysname], the set of values here is known and documented. As ?R.version does say: Do _not_ use ‘R.version$os’ to test the platform the code is running on: use ‘.Platform$OS.type’ instead. Slightly different versions of the OS may report different values of ‘R.version$os’, as may different versions of R. On Sun, 27 Feb 2011, David Scott wrote: Not sure exactly what the original poster was after, but for distinguishing when I am working on different machines with different OS, I use something like this: ### Set some state variables opSys - Sys.info()[sysname] if (opSys == Windows){ linux - FALSE } else { linux - TRUE } David Scott On 26/02/2011 10:00 a.m., Ista Zahn wrote: Hi, see ?R.version Something like if(version$os == mingw32) { path = /ABC} else { path = /DEF } might do it, but I'm not sure exactly what possible values version$os can take or what determines the value exactly. Best, Ista On Fri, Feb 25, 2011 at 1:23 PM, Hui Duhui...@dataventures.com wrote: Hi All, I have two Rs, one has been installed in Windows system and another one has been installed under UNIX system. Is there any environmental variable or function to tell me which R I am using? The reason that I need to know it is under different system, the data path could be different. I want to do something like if it is R under Windows path = /ABC else if it is R under UNIX, path = /DEF Any idea? Thanks. Best Regards, HXD [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding pairs with least magnitude difference from mean
I have what I think is some kind of linear programming question. Basically, what I want to figure out is if I have a vector of numbers, x - rnorm(10) x [1] -0.44305959 -0.26707077 0.07121266 0.44123714 -1.10323616 -0.19712807 0.20679494 -0.98629992 0.97191659 -0.77561593 mean(x) [1] -0.2081249 Using each number only once, I want to find the set of five pairs where the magnitude of the differences between the mean(x) and each pairs sum is least. y - outer(x, x, +) - (2 * mean(x)) With this matrix, if I put together a combination of pairs which uses each number only once, the sum of the corresponding numbers is 0. For example, compare the SD between this set of 5 pairs sd(c(y[10,1], y[9,2], y[8,3], y[7,4], y[6,5])) [1] 1.007960 versus this hand-selected, possibly lowest SD combination of pairs sd(c(y[3,1], y[6,2], y[10,4], y[9,5], y[8,7])) [1] 0.2367030 Your selection is not bad, as only about 0.4% of all possible distinct combinations have a smaller value -- the minimum is 0.1770076, for example [10 7 9 5 8 4 6 2 3 1]. (1) combinat() from the 'combinations' package seems slow, try instead the permutations() function from 'e1071'. (2) Yes, except your vector is getting much larger in which case brute force is no longer feasible. (3) This is not a linear programming, but a combinatorial optimization task. You could try optim() with the SANN method, or some mixed-integer linear program (e.g., lpSolve, Rglpk, Rsymphony) by intelligently using binary variables to define the sets. This does not mean that some specialized approach might not be more appropriate. --Hans Werner I believe that if I could test all the various five pair combinations, the combination with the lowest SD of values from the table would give me my answer. I believe I have 3 questions regarding my problem. 1) How can I find all the 5 pair combinations of my 10 numbers so that I can perform a brute force test of each set of combinations? I believe there are 45 different pairs (i.e. choose(10,2)). I found combinations from the {Combinations} package but I can't figure out how to get it to provide pairs. 2) Will my brute force strategy of testing the SD of each of these 5 pair combinations actually give me the answer I'm searching for? 3) Is there a better way of doing this? Probably something to do with real linear programming, rather than this method I've concocted. Thanks for any help you can provide regarding my question. Best regards, James __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in different OS
Thanks Brian, I stand corrected. David Scott On 27/02/2011 12:32 a.m., Prof Brian Ripley wrote: It is less clear what you are after, but the canonical way to decide if your R session is on Windows is .Platform$OS.type == windows Unlike {R.}version$os and Sys.info()[sysname], the set of values here is known and documented. As ?R.version does say: Do _not_ use ‘R.version$os’ to test the platform the code is running on: use ‘.Platform$OS.type’ instead. Slightly different versions of the OS may report different values of ‘R.version$os’, as may different versions of R. On Sun, 27 Feb 2011, David Scott wrote: Not sure exactly what the original poster was after, but for distinguishing when I am working on different machines with different OS, I use something like this: ### Set some state variables opSys- Sys.info()[sysname] if (opSys == Windows){ linux- FALSE } else { linux- TRUE } David Scott On 26/02/2011 10:00 a.m., Ista Zahn wrote: Hi, see ?R.version Something like if(version$os == mingw32) { path = /ABC} else { path = /DEF } might do it, but I'm not sure exactly what possible values version$os can take or what determines the value exactly. Best, Ista On Fri, Feb 25, 2011 at 1:23 PM, Hui Duhui...@dataventures.com wrote: Hi All, I have two Rs, one has been installed in Windows system and another one has been installed under UNIX system. Is there any environmental variable or function to tell me which R I am using? The reason that I need to know it is under different system, the data path could be different. I want to do something like if it is R under Windows path = /ABC else if it is R under UNIX, path = /DEF Any idea? Thanks. Best Regards, HXD [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] invalid type (list) for variable
Hi, I have a problem with an invalid variable type. My sample dataset is looking like this: str(factivaTDM.train) 'data.frame': 4 obs. of 14 variables: $ access: num 2 3 2 1 $ billion : num 2 1 2 2 $ compani : num 7 6 1 2 $ data : num 6 2 3 4 $ inc : num 5 3 3 2 $ local : num 3 2 1 1 $ mcleodusa : num 15 6 2 10 $ network : num 9 1 2 1 $ provid: num 4 3 1 2 $ servic: num 11 4 3 6 $ share : num 9 5 3 5 $ splitrock : num 18 6 4 5 $ stock : num 3 3 1 2 $ classifiedPositive:List of 4 ..$ : chr yes ..$ : chr yes ..$ : chr yes ..$ : chr yes ..- attr(*, class)= chr AsIs and the command: NaiveBayes.model - NB(classifiedPositive ~ ., data = factivaTDM.train) generates the error: invalid type (list) for variable 'classifiedPositive' Here is another dataset which causes no errors: str(weatherNominal.train) 'data.frame': 13 obs. of 5 variables: $ outlook: Factor w/ 3 levels overcast,rainy,..: 3 3 1 2 2 2 1 3 3 2 ... $ temperature: Factor w/ 3 levels cool,hot,mild: 2 2 2 3 1 1 1 3 1 3 ... $ humidity : Factor w/ 2 levels high,normal: 1 1 1 1 2 2 2 1 2 2 ... $ windy : logi FALSE TRUE FALSE FALSE FALSE TRUE ... $ play : Factor w/ 3 levels ?,no,yes: 2 2 3 3 3 2 3 2 3 3 ... So I suppose, I have to convert classifiedPositive:List to classifiedPositive:Factor How can I do that? I will send you more information about my script if you need it. Thank you very much, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Pulling p values
Hi folks, I'm doing ANOVA with multiple comparisons. I've used both the TukeyHSD function and the multcomp procedure. In both cases, I get some tantalizing results such as this... e = aov(lm(d.all[,(n+4)] ~ d.all[,4]) TukeyHSD(e) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = lm(d.all[, (n + 4)] ~ d.all[, 4], contrast = C)) $`d.all[, 4]` diff lwr upr p adj 10 mpk-0 mpk-0.677375 -1.1779417 -0.1768083 0.0071092 30 mpk-0 mpk -1.074500 -1.5750667 -0.5739333 0.656 30 mpk-10 mpk -0.397125 -0.8976917 0.1034417 0.1369760 I ask for the pvalues and get... i = h$pvalues i 10 mpk - 0 mpk 30 mpk - 0 mpk 30 mpk - 10 mpk 2.630022e-03 2.288575e-05 5.863517e-02 It looks like a matrix, but it is a vector of length 3 and the third element, for example, is length(i) [1] 3 q = i[3] q 30 mpk - 10 mpk 0.05863517 The contrast label is bound to the p-value as a single element. Does anyone know how to extract just the p-value from this procedure? Or a simple way to generate these comparisons? Thanks, Bob Gallavan An obvious newbie --This message is the property of i3Global and may include confidential and/or proprietary information, and may be used only by the person or entity to which it is addressed. If the reader of this e-mail is not the intended recipient or his or her authorized agent, the reader is hereby notified that any dissemination, distribution or copying of this e-mail is prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and delete this e-mail immediately.== [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sourcing a linux file with system
On 26 February 2011 11:00, james.fo...@diamond.ac.uk wrote: Dear R community, I would like to source a file in my linux system to set a few environment variables. I have tried: system(source /home/james/build//ccp4-6.1.13/include/ccp4.setup) and got: sh: source: not found Someone has been writing unportable code. Instead change it to system(. /home/james/build//ccp4-6.1.13/include/ccp4.setup) A dot will source a file, in a manner which avoids the need of a command that is not defined by POSIX. Dave __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pulling p values
On 2011-02-26 04:12, Gallavan, Robert wrote: Hi folks, I'm doing ANOVA with multiple comparisons. I've used both the TukeyHSD function and the multcomp procedure. In both cases, I get some tantalizing results such as this... e = aov(lm(d.all[,(n+4)] ~ d.all[,4]) TukeyHSD(e) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = lm(d.all[, (n + 4)] ~ d.all[, 4], contrast = C)) $`d.all[, 4]` diff lwr upr p adj 10 mpk-0 mpk-0.677375 -1.1779417 -0.1768083 0.0071092 30 mpk-0 mpk -1.074500 -1.5750667 -0.5739333 0.656 30 mpk-10 mpk -0.397125 -0.8976917 0.1034417 0.1369760 I ask for the pvalues and get... i = h$pvalues i 10 mpk - 0 mpk 30 mpk - 0 mpk 30 mpk - 10 mpk 2.630022e-03 2.288575e-05 5.863517e-02 It looks like a matrix, but it is a vector of length 3 and the third element, for example, is length(i) [1] 3 q = i[3] q 30 mpk - 10 mpk 0.05863517 The contrast label is bound to the p-value as a single element. Well, 'bound to' just means that you have a vector with names. So you can just unname() it. But do let us in on your secret: what's 'h'? Does anyone know how to extract just the p-value from this procedure? Or a simple way to generate these comparisons? Thanks, Bob Gallavan An obvious newbie Newbies need to peruse the posting guide; In particular, note the comment about HTML mail. Peter Ehlers --This message is the property of i3Global and may include confidential and/or proprietary information, and may be used only by the person or entity to which it is addressed. If the reader of this e-mail is not the intended recipient or his or her authorized agent, the reader is hereby notified that any dissemination, distribution or copying of this e-mail is prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and delete this e-mail immediately.== [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using uniroot() with output from deriv() or D()
I need to find the root of the second derivative of many curves and do not want to cut and paste the expression results from the deriv() or D() functions every time. Below is an example. What I need to do is refer to fn2nd in the uniroot() function, but when I try something like uniroot(fn2nd,c(0,1)) I get an error, so I have resorted to pasting in the expression, but this is highly inefficient. Thanks, J [...] What is so wrong with using r - uniroot(function(x) eval(fn2nd, list(x=x)), interval=c(0, 1), tol=0.0001) (I thought you were almost there) or even fn2nd_fun - function(x) eval(fn2nd, list(x=x)) ex - seq(from=0, to=1, length.out = 1000) y1 - fn2nd_fun(ex) ... r - uniroot(fn2nd_fun, interval=c(0, 1), tol=0.0001) --Hans Werner r$root abline(h=0, col = red) abline(v=r$root, col = green) arrows(0.6, -2, r$root,0, length = 0.1, angle = 30, code = 2, col = red) text(0.765,-2.3,paste(b = ,r$root,sep=)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] BFGS versus L-BFGS-B
Bert has put a finger on one of the key questions: Are we there yet? Global optima are very difficult to find in many, possibly most problems. We can check for local optima. In package optimx (currently resetting it on CRAN due to a variable naming glitch with Rvmmin -- should be there in a couple of days) we carry out the Kuhn Karush Tucker tests, but even these have scale dependencies. On R-forge there's a new package under our optimization and solving packages project called kktc that separates off the tests. However, that's only a start. Constraints require us to do more work, too. A couple of very large scale problems have been noted. The minimization of energy for collections of atoms or molecules was suggested, but my experience with that type of problem is that it has a huge number of local minima and is not amenable to the techniques that the original poster had in mind, and certainly optim(BFGS) nor Rcgmin nor optim(L-BFGS-B) are appropriate as they stand, though they may be helpful in some sort of polyalgorithm. ALso I did not post, but mentioned in private emails, that L-BFGS-B and Rcgmin should offer much lower memory demands for problems in large numbers of parameters than optim(BFGS) which has an explicit inverse Hessian approximation i.e., n*n doubles to store. BFGS and L-BFGS-B have very different internals. JN On 02/25/2011 08:46 PM, Bert Gunter wrote: Thanks to all for clarifications. So I'm off base, but whether waaay off base depends on whether there is a reasonably well defined optimum to converge to. Which begs the question, I suppose: How does one know whether one has converged to such an optimum? This is always an issue, of course, even for a few parameters; but maybe more so with so many. -- Bert On Fri, Feb 25, 2011 at 3:35 PM, Ravi Varadhan rvarad...@jhmi.edu wrote: I have worked on a 2D image reconstruction problem in PET (positron emission tomography) using a Poisson model. Here, each pixel intensity is an unknown parameter. I have solved problems of size 128 x 128 using an accelerated EM algorithm. Ken Lange has shown that you can achieve term by term separation using a minorization inequality, and hence the problem simplifies greatly. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Prof. John C Nash Sent: Friday, February 25, 2011 5:55 PM To: Bert Gunter Cc: r-help@r-project.org Subject: Re: [R] BFGS versus L-BFGS-B For functions that have a reasonable structure i.e., 1 or at most a few optima, it is certainly a sensible task. Separable functions are certainly nicer (10K 1D minimizations), but it is pretty easy to devise functions e.g., generalizations of Rosenbrock, Chebyquad and other functions that are high dimension but not separable. Admittedly, there are not a lot of real-world examples that are publicly available. More would be useful. JN On 02/25/2011 05:06 PM, Bert Gunter wrote: On Fri, Feb 25, 2011 at 12:00 PM, Brian Tsai btsa...@gmail.com wrote: Hi John, Thanks so much for the informative reply! I'm currently trying to optimize ~10,000 parameters simultaneously - for some reason, -- Some expert (Ravi, John ?) please correct me, but: Is the above not complete nonsense? I can't imagine poking around usefully in 10K dimensional space for an extremum unless maybe one can find the extremum by 10K separate 1-dim optimizations. And maybe not then either. Am I way offbase here, or has Brian merely described just another inefficient way to produce random numbers? -- Bert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tansformation of variables in cph from rms package
Yao, I wonder how likely it is that log transformations fit the data. More often I find that the flexibility of restricted cubic splines is needed. One of the things that people take for granted when using log() is that they are assuming that k=0 in making the transformation log(x + k). It is often the case the the proper origin is not zero. This also impacts the problem you encountered. Besides all that, you can specify limits to predictors to nomogram() in a variety of ways as listed in the help file. Try that. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/tansformation-of-variables-in-cph-from-rms-package-tp3325545p3325911.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using uniroot() with output from deriv() or D()
On Sat, Feb 26, 2011 at 2:16 AM, jjheath heath_jer...@hotmail.com wrote: I need to find the root of the second derivative of many curves and do not want to cut and paste the expression results from the deriv() or D() functions every time. Below is an example. What I need to do is refer to fn2nd in the uniroot() function, but when I try something like uniroot(fn2nd,c(0,1)) I get an error, so I have resorted to pasting in the expression, but this is highly inefficient. Thanks, J y - c(9.9,10,10,9.5,9,6,3,1,0,0,0) b - seq(0,1,by=0.1) dat - data.frame(y = y, b = b) plot(y ~ b, xlim = c(0,1), ylim= c(-12,12)) fn - nls(y ~ asym/(1 + exp(p * b - q)),data = dat, list(asym=10,p=16,q=7), trace=T) curve(10.001/(1 + exp(14.094 * x - 7.551)), from = 0, to = 1, add = T) fn2 - expression(10.001/(1 + exp(14.094 * x - 7.551))) fn2nd - D(D(fn2, x), x) ex - seq.int(from=0, to=1, length.out = 1000) y1 - eval(fn2nd, envir = list(x = ex), enclos = parent.frame()) lines(ex, y1, type = l) r - uniroot(function(x) -(10.001 * (exp(14.094 * x - 7.551) * 14.094 * 14.094)/(1 + exp(14.094 * x - 7.551))^2 - 10.001 * (exp(14.094 * x - 7.551) * 14.094) * (2 * (exp(14.094 * x - 7.551) * 14.094 * (1 + exp(14.094 * x - 7.551/((1 + exp(14.094 * x - 7.551))^2)^2),interval=c(0,1),tol = 0.0001) r$root abline(h=0, col = red) abline(v=r$root, col = green) arrows(0.6, -2, r$root,0, length = 0.1, angle = 30, code = 2, col = red) text(0.765,-2.3,paste(b = ,r$root,sep=)) Try this: f - function(x) {} body(f) - fn2nd -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unable to access index for repository
On 11-02-25 6:15 PM, Kalicin, Sarah wrote: Hi, I have two questions: 1) Since I switched to Windows 2007 and downloaded the current R version (2.12.1; 2010-12-16) for Windows 7 a month ago, I cannot down load packages through the GUI drop down menu. I get: Warning: unable to access index for repository http://cran.cnr.Berkeley.edu/bin/windows/contrib/2.12 Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/2.12 Warning messages: 1: In open.connection(con, r) : unable to connect to 'cran.r-project.org' on port 80. In fact, just to assign the CRAN location in the drop down menu takes forever. From web searches, the first check is to see if you have internet access. I do and can download the packages directly to my hard drive and load them indirectly into R by using library(MASS, lib.loc=C:\\Users\\R\\R packages). This is starting to become a pain. Any suggestions on looking at potential security settings that I might need to deactiviate or any other issues? You've probably got a firewall or proxy getting in the way. You could re-install R with the internet2 option, or run setInternet2(TRUE) in a session to fix this. (Or maybe you already did that, in which case setInternet2(FALSE) will turn it off.) 2) Any suggestions how I can connect to a MySQL database through windows using R. I found this example on the web, but there is not a package for RMYSQL for windows. Bugger!!! library(RMySQL) drv = dbDriver(MySQL) con = dbConnect(drv,host=mars,dbname=sat133,user=sat133,pass=T0pSecr3t) album = dbGetQuery(con,statement=select * from tasks) Install an ODBC driver for MySQL, and use RODBC. Duncan Murdoch Thanks for any input!! Sarah [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Forced inclusion of varaibles in validate command as well as step
Jon, Version 3.3-0 of rms will be released within 2-3 days. It has a new option force for fastbw, validate, calibrate. force is an integer vector of the parameter numbers to force into every model. It is meant to work with type='individual' and its performance with type='residual' needs to be studied (I doubt if it works as you want). Suppose you have a Cox model (which does not have an intercept to include in the sequential numbering for force) like this: f - cph(S ~ sex + pol(age,2) + rcs(height,4)) and you want to force age into every model. You would specify force=2:3. Typing coef(f) will show you the sequential parameters in the model. Someday I'll extend this to force='age' but that's not in the current version. Note that fastbw always pools age and age^2 effects when judging significance. As always use stepwise methods at your own risk. They are dangerous to your health. If you are using Linux I can send the new version by e-mail. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Forced-inclusion-of-varaibles-in-validate-command-as-well-as-step-tp3324901p3325922.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hierarchical Power Analysis
Downey, Patrick PDowney at urban.org writes: Hello, A colleague is trying to do a fairly complicated power analysis for a project. The project would be evaluating random assignment to one of three conditions within each of 8 sites. The dependent variable would be binary (we do not care at this point whether it would be analyzed with logit or probit). We can simulate the data in a bunch of iterations and do analyses in each iteration, but it seems like there might be an easier way. Are there any packages in R with the capability to do such power analyses? I vaguely recall someone setting up a package to do power analysis of hierarchical designs, but ... there are unlikely to be any closed-form solutions for the binary GLMM case (since there are so few closed-form solutions even for the null distributions of the test statistics!) ... so I think if you have a working solution with simulations that's probably the best you will do. I would try library(sos); findFn(power hierarchical) or findFn(power mixed) just in case ... Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MCMCpack combining chains
On 24.02.2011 17:48, Alan Kelly wrote: Deal all, as MCMClogit does not allow for the specification of several chains, I have run my model 3 times with different random number seeds and differently dispersed multivariate normal priors. For example: res1 = MCMClogit(y~x,b0=0,B0=0.001,data=mydat, burnin=500, mcmc=5500, seed=1234, thin=5) res2 = MCMClogit(y~x,b0=1,B0=0.01,data=mydat, burnin=500, mcmc=5500, seed=5678, thin=5) res3 = MCMClogit(y~x,b0=5,B0=0.0001,data=mydat, burnin=500, mcmc=5500, seed=91011, thin=5) Each result produces an object of class mcmc. In order to use the Gelman-Rubin diagnostic test via coda, I need to combine these 3 mcmc objects appropriately. I thought that this would be possible using as.mcmc.list() as the function description says: The function ‘mcmc.list’ is used to represent parallel runs of the same chain, with different starting values and random seeds. The list must be balanced: each chain in the list must have the same iterations and the same variables. So I try the following: res123=as.mcmc.list(res1, res2, res3) Use mcmc.list() rather than as.mcmc.list(). The latter is for conversion, not a constructor from separate mcmc objects. Uwe Ligges class(res123) [1] mcmc.list Then I try gelman.diag() from coda - gelman.diag(temp123) Error in gelman.diag(temp123) : You need at least two chains So, how may I combine the separate mcmc objects so that I have multiple chains? Many thanks, Alan Kelly [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotmath: how to create a vector of expressions?
On 2011-02-25 15:57, Marius Hofert wrote: Dear expeRts, I would like to use LaTeX-like symbols in keys via plotmath. Below is a minimal example. I get: Error in fun(key = list(x = 0.5, y = 0.8, text = list(list(expression(paste((, : first component of text must be vector of labels What's wrong? How can I create the required vector (of expressions)? Cheers, Marius library(lattice) val- matrix(1, nrow = 10, ncol = 3) nplot- 10 labs- sapply(1:3, function(i) substitute(expression(paste((,i,), ,l==len,mm,sep=)), list(len=2*i))) myplot- xyplot(0~0, panel=function(...){ for(i in 1:3){ panel.xyplot(1,val[,i],type=l) panel.text(2,val[nplot,i],label=paste((,i,),sep=)) } }, key = list(x=0.5,y=0.8,text=list(labs), align=TRUE,transparent=TRUE)) myplot Try this: labs - vector(expression, 3) for(i in 1:3) labs[i] - substitute(expression( group((, k, )) * , ~~ l == len ~ italic(mm) ), list(len = 2*i, k=i) )[2] The idea is to create an expression vector and then fill its components appropriately. We need the second component of the substitute call. (I put in the 'italic' just for fun.) Maybe there's a better way, but I don't know it. Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Boxplot not doing what I think it should
On 24.02.2011 19:15, Greg Snow wrote: Look at ?quantile, especially the detail section on type and the second reference (and the see also section). There are many different definitions of quantiles/quartiles and different functions use different versions. Particularly ?boxplots points to ?boxplot.stats and that points to Tukey's original ?fivenum. Uwe Ligges -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Lewis Berman Sent: Thursday, February 24, 2011 8:38 AM To: r-help@R-project.org Subject: [R] Boxplot not doing what I think it should My box plot below is drawing its upper whisker all the way to the last point, instead of showing the point as an outlier. Am I misunderstanding, or is it a bug? Help(boxplot) states for the parameter range that this determines how far the plot whiskers extend out from the box. If range is positive, the whiskers extend to the most extreme data point which is no more than range times the interquartile range from the box. A value of zero causes the whiskers to extend to the data extremes. To my understanding, the code below, which drew the boxplot, should have drawn the right-hand whisker only to 387 and shown 496 as an outlier point. But it didn't. Using a range of 1.3 does. Horizontal versus vertical does not matter. Also, the right edge of the box does not show the same 3rd Quartile as does summary. Explain, please. sort(t5g24times) [1] 111 135 201 234 283 283 284 285 300 370 387 496 summary(t5g24times) Min. 1st Qu. MedianMean 3rd Qu.Max. 111.0 225.8 283.5 280.8 317.5 496.0 iqr - 317.5 - 225.8 iqr [1] 91.7 317.5 + 1.5 * iqr [1] 455.05 pdf() boxplot(c(t5g24times), horizontal=1, range=1.5) R version info: platform x86_64-apple-darwin9.8.0 arch x86_64 os darwin9.8.0 system x86_64, darwin9.8.0 status major 2 minor 12.0 year 2010 month 10 day15 svn rev53317 language R version.string R version 2.12.0 (2010-10-15) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Wired behavior of a 2-by-2 matrix indicies
Dear R, I found a very wired behavior for a 2-by-2 matrix, see this example A - matrix(1:4, 2) idx4A - matrix(1:4, 2) A[idx4A] Error in A[idx4A] : subscript out of bounds But other matrices are fine, B - matrix(1:9, 3) idx4B - matrix(1:9, 3) B[idx4B] [1] 1 2 3 4 5 6 7 8 9 I can reproduce this for both 32bit windows and 64bit linux with R 2.12.1 and some earlier versions. Can someone tell me whether this a bug or a feature or something I don't know? Thanks! Feng -- Feng Li Department of Statistics Stockholm University 106 91 Stockholm, Sweden http://feng.li/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interactive/Dynamic plots with R
There is an rggobi package that makes transferring data between R and ggobi easy. But I would probably start with the playwith package which I think will do most of what you want. Some other options are the iplots package, rpanel package, the fgui package and the tkexamp function in the TeachingDemos package (the last requires the most work on your part, but gives the most control). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Abhishek Pratap [mailto:abhishek@gmail.com] Sent: Saturday, February 26, 2011 1:09 AM To: Greg Snow Cc: r-help@r-project.org Subject: Re: [R] Interactive/Dynamic plots with R Hi Greg Ability to zoom in/out on x/y axis to begin with and then adding/removing data sets. Thanks! -Abhi On Fri, Feb 25, 2011 at 12:52 PM, Greg Snow greg.s...@imail.org wrote: What types of interaction do you want? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Abhishek Pratap Sent: Friday, February 25, 2011 12:37 AM To: r-help@r-project.org Subject: [R] Interactive/Dynamic plots with R Hi Guys In order to look at a dense plot I would like to have the capability to plot dynamic/interactive. Before I try rgobi which I heard can help me; I would like to take your opinion. Thanks! -Abhi __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wired behavior of a 2-by-2 matrix indicies
Hello, On Sat, Feb 26, 2011 at 12:11 PM, Feng Li m...@feng.li wrote: Dear R, I found a very wired behavior for a 2-by-2 matrix, see this example A - matrix(1:4, 2) idx4A - matrix(1:4, 2) A[idx4A] Error in A[idx4A] : subscript out of bounds This shouldn't work. From the help for [, When indexing arrays by ‘[’ a single argument ‘i’ can be a matrix with as many columns as there are dimensions of ‘x’; the result is then a vector with elements corresponding to the sets of indices in each row of ‘i’. Since there are two dimensions to A, and two rows to idx4A, then this form is used. The first row of idx4A is c(1, 3) so the first value extracted is A[1, 3] which doesn't exist, thus the error message. But other matrices are fine, B - matrix(1:9, 3) idx4B - matrix(1:9, 3) B[idx4B] [1] 1 2 3 4 5 6 7 8 9 The part that surprises me is that this works. Apparently there is an implicit conversion to vector here because dim(B) != ncol(idx4B) What you seem to want is A[as.vector(idx4A)] and B[as.vector(idx4B)] but you neglected to tell us what you expected the result to be. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weird behavior of a 2-by-2 matrix indicies
The real weird thing is I wrote weird and it comes out wired... The background is that I did a lot of sparse matrices calculations. Therefore need some indices tweaks, e.g subtract a block from a big matrix. Then I just create an indicies matrix and use it directly as a vector since a matrix is just a vector with dimensions attributed in R. I know I can convert them to a vector but I did not do it anyway. Thanks for this information! Feng On Sat, Feb 26, 2011 at 6:29 PM, Sarah Goslee sarah.gos...@gmail.comwrote: Hello, On Sat, Feb 26, 2011 at 12:11 PM, Feng Li m...@feng.li wrote: Dear R, I found a very wired behavior for a 2-by-2 matrix, see this example A - matrix(1:4, 2) idx4A - matrix(1:4, 2) A[idx4A] Error in A[idx4A] : subscript out of bounds This shouldn't work. From the help for [, When indexing arrays by â[â a single argument âiâ can be a matrix with as many columns as there are dimensions of âxâ; the result is then a vector with elements corresponding to the sets of indices in each row of âiâ. Since there are two dimensions to A, and two rows to idx4A, then this form is used. The first row of idx4A is c(1, 3) so the first value extracted is A[1, 3] which doesn't exist, thus the error message. But other matrices are fine, B - matrix(1:9, 3) idx4B - matrix(1:9, 3) B[idx4B] [1] 1 2 3 4 5 6 7 8 9 The part that surprises me is that this works. Apparently there is an implicit conversion to vector here because dim(B) != ncol(idx4B) What you seem to want is A[as.vector(idx4A)] and B[as.vector(idx4B)] but you neglected to tell us what you expected the result to be. Sarah -- Sarah Goslee http://www.functionaldiversity.org -- Feng Li Department of Statistics Stockholm University 106 91 Stockholm, Sweden http://feng.li/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] select element from vector
On 25.02.2011 12:37, zem wrote: Hi Jessica, try this: Q[k:c(k+3)] 1. You have written to the R-help mailing list rather than to Jessica. Please answer also directly: not all posters are on the mailing list. 2. Please quote the original question. Uwe Ligges __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] J48 - contrasts can be applied only to factors with 2 or more levels
On 24.02.2011 15:29, Patrick Skorupka wrote: Hi everybody, I recently sent out a first message using this tool and was astonished about the quick and helpful replys. So, thank you in advance for your help, I really appreciate that! As already recently mentioned I am new to using R as I have to implement some decision trees with R respectively RWeka for my master thesis. I have tried so built some J48 trees and could accomplish first, very basic results. But when I increased the number of independent variables the tree in fact is getting created, but when I try to plot it, R displays âError in `contrasts-`(`*tmp*`, value = contr.treatment) : contrasts can be applied only to factors with 2 or more levels. I found out that this was apparently due to IVs with only one value and so excluded all these variables from the dataset and double-checked it by using the summary function. Anyway the same error occurs. Anyone any idea? Is there a specific function to test my dataset in this way instead of doing it manually or with the summary function? FYI, the dataset contains 54 attributes, all kind of scales and about 5,000 rows. You may want to supply a reproducible example of that behaviour as the posting guide asks you to do. Best, Uwe Ligges __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kohonen: Argument data should be numeric
On 25.02.2011 15:33, Jay wrote: Hi, I'm trying to utilize the kohonen package to build SOM's. However, trying this on my data I get the error: Argument data should be numeric when running the som(data.train, grid = somgrid(6, 6, hexagonal)) function. As you see, there is a problem with the data type of data.train which is a list. When I try to convert it to numeric I get the error: (list) object cannot be coerced to type 'double' What should I do? I can convert the data.train if I take only one column of the list: data.train[[1]], but that is naturally not what I want. How did I end up with this data format? What I did: data1- read.csv(data1.txt, sep = ;) training- sample(nrow(data1), 1000) data.train- data1[training,2:20] I tried to use scan as the import method (read about this somewhere) and unlist, but I'm not really sure how I should get it to numeric/ working. We do not know what data types are present in data1. If all columns are numeric, just convert to a matrix (using as.matrix()). Uwe Ligges Thanks, Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate with cumsum
On 25.02.2011 16:16, stephenb wrote: Bill, what will be the fastest way to output not just single lines but small data frames of about 60 rows? I prefer writing to a text file because the final output is large 47k times 60 rows and since I do not know the size of it I have to use rbind to build the object which creates the memory problems described here: http://www.matthewckeller.com/html/memory.html look at the swiss cheese paragraph. kind regards Stephen Stephen Bond, 1. You have written to the R-help mailing list rather than to Bill. Please answer also directly: not all posters are on the mailing list. 2. Please quote the original question. Uwe Ligges __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tansformation of variables in cph from rms package
On 26.02.2011 15:43, Frank Harrell wrote: Yao, I wonder how likely it is that log transformations fit the data. More often I find that the flexibility of restricted cubic splines is needed. One of the things that people take for granted when using log() is that they are assuming that k=0 in making the transformation log(x + k). It is often the case the the proper origin is not zero. This also impacts the problem you encountered. Besides all that, you can specify limits to predictors to nomogram() in a variety of ways as listed in the help file. Try that. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University Frank, looks like you are also using this Nabble interface now: 1. You have written to the R-help mailing list only rather than to Yao. Please answer also directly: not all posters are on the mailing list. 2. Please quote the original question, otherwise readers of the list cannot remember and do not learn too much without having the context apparent. Best, Uwe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Transform a dataset from long to wide using reshape2
I seem to be running into the same problem reported in https://stat.ethz.ch/pipermail/r-help/2010-November/258265.html I cannot seem to transform a dataset from long to wide using reshape2. Clearly I am missing something very simple but a look at the manual and the reshape paper in JSS does not suggest anything. Any advice would be welcome ===load rehape2 library(reshape2) Working example=== # Example from above URL dat - read.table(textConnection('Subject Item Score Subject 1 Item 1 1 Subject 1 Item 2 0 Subject 1 Item 3 1 Subject 2 Item 1 1 Subject 2 Item 2 1 Subject 2 Item 3 0'), header=TRUE) closeAllConnections() acast(dat, Subject~Item) # Seems fine =My attemp=== # set.seed(1) mydata - data.frame( x=sample(LETTERS[23:26],100, replace=TRUE), y=rnorm(100, mean=2), id=sample(letters[1:4], 100, replace=TRUE)) acast(mydata , id ~ x) # not so fine = produces this : Using id as value column: use value_var to override. Aggregation function missing: defaulting to length W X Y Z a 7 5 5 7 b 6 9 6 12 c 1 4 5 6 d 6 14 5 2 Why does reshape2 think it needs an aggregation function here Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Transform a dataset from long to wide using reshape2
John - In the dat data frame, there is exactly one observation for each Subject/Item combination, so it's clear how to rearrange the values of Score. In the mydata data frame there are multiple values for most combinations of x and id, so reshape2 needs to aggregate y before it can rearrange the values. Hope this helps. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Sat, 26 Feb 2011, John Kane wrote: I seem to be running into the same problem reported in https://stat.ethz.ch/pipermail/r-help/2010-November/258265.html I cannot seem to transform a dataset from long to wide using reshape2. Clearly I am missing something very simple but a look at the manual and the reshape paper in JSS does not suggest anything. Any advice would be welcome ===load rehape2 library(reshape2) Working example=== # Example from above URL dat - read.table(textConnection('Subject Item Score Subject 1 Item 1 1 Subject 1 Item 2 0 Subject 1 Item 3 1 Subject 2 Item 1 1 Subject 2 Item 2 1 Subject 2 Item 3 0'), header=TRUE) closeAllConnections() acast(dat, Subject~Item) # Seems fine =My attemp=== # set.seed(1) mydata - data.frame( x=sample(LETTERS[23:26],100, replace=TRUE), y=rnorm(100, mean=2), id=sample(letters[1:4], 100, replace=TRUE)) acast(mydata , id ~ x) # not so fine = produces this : Using id as value column: use value_var to override. Aggregation function missing: defaulting to length W X Y Z a 7 5 5 7 b 6 9 6 12 c 1 4 5 6 d 6 14 5 2 Why does reshape2 think it needs an aggregation function here Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] invalid type (list) for variable
Michael, i have no experience with the Bayesian models you use, but I could imagine that the problem is that this classifiedPositive variable is a list (as you suggest as well). With unlist() as.character() as.factor() you can convert the variable to anything you may need as an input. HTH Jannis On 02/26/2011 12:40 PM, Michael Hecker wrote: Hi, I have a problem with an invalid variable type. My sample dataset is looking like this: str(factivaTDM.train) 'data.frame': 4 obs. of 14 variables: $ access: num 2 3 2 1 $ billion : num 2 1 2 2 $ compani : num 7 6 1 2 $ data : num 6 2 3 4 $ inc : num 5 3 3 2 $ local : num 3 2 1 1 $ mcleodusa : num 15 6 2 10 $ network : num 9 1 2 1 $ provid: num 4 3 1 2 $ servic: num 11 4 3 6 $ share : num 9 5 3 5 $ splitrock : num 18 6 4 5 $ stock : num 3 3 1 2 $ classifiedPositive:List of 4 ..$ : chr yes ..$ : chr yes ..$ : chr yes ..$ : chr yes ..- attr(*, class)= chr AsIs and the command: NaiveBayes.model- NB(classifiedPositive ~ ., data = factivaTDM.train) generates the error: invalid type (list) for variable 'classifiedPositive' Here is another dataset which causes no errors: str(weatherNominal.train) 'data.frame': 13 obs. of 5 variables: $ outlook: Factor w/ 3 levels overcast,rainy,..: 3 3 1 2 2 2 1 3 3 2 ... $ temperature: Factor w/ 3 levels cool,hot,mild: 2 2 2 3 1 1 1 3 1 3 ... $ humidity : Factor w/ 2 levels high,normal: 1 1 1 1 2 2 2 1 2 2 ... $ windy : logi FALSE TRUE FALSE FALSE FALSE TRUE ... $ play : Factor w/ 3 levels ?,no,yes: 2 2 3 3 3 2 3 2 3 3 ... So I suppose, I have to convert classifiedPositive:List to classifiedPositive:Factor How can I do that? I will send you more information about my script if you need it. Thank you very much, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] preventing repeat in paste
This is great, thanks a lot, Gabor! On Fri, Feb 25, 2011 at 6:18 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Fri, Feb 25, 2011 at 5:21 PM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Hello! s-start; e-end middle-as.character(c(1,2,3)) I would like to get the following result: start 123 end or start 1 2 3 end or start 1,2,3 end How can I avoide this (undesired) result: paste(s,middle,e,sep= ) Try this: paste(s, toString(middle), e) [1] start 1, 2, 3 end -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com -- Dimitri Liakhovitski Ninah Consulting www.ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 2D Convolution in R
Dear R-Helpers, I want to try the 2D (two-dimensional) convolution in R. For example, let us we have the following kernel and data. kernel - (1,2,3,2,1) data - array(1:100, dim=c(10,10)) I know the function 'convolve' only for one-dimensional convolution, but it is just for a 1D sequence. Is there any function for 2D convolution? For theory, please refer to the following link: http://www.songho.ca/dsp/convolution/convolution2d_example.html Thank you for your help in advance. Best Regards, Wonsang [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interactive/Dynamic plots with R
Hello Abhishek, Also notice that if you are on windows, then rggobi doesn't work for the latest R (with ggobi 2.1.8) because of GTK/dll issues. In such a case, Greg's other suggestions are probably the way to go for the time being. Best, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Sat, Feb 26, 2011 at 7:15 PM, Greg Snow greg.s...@imail.org wrote: n rggobi package that makes transferring data between R and ggobi easy. But I would probably start with the playwith pac [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to remove rows in which 2 or more observations are smaller than a given threshold?
Hello The data set I am examining has 7425 observations (rows with unique identifiers) and 46 samples(columns). I have been trying to generate a dataset that filters out observations that are negligible The definition of negligible is absolute value less or equal to 1.58. The rule that I would like to adopt to create a new data is: drop rows in which 2 or more observations have absolute values = 1.58. Since I have unique identifier per row, I have tried to reshape the data so I could create a new variable using an ifelse statement that would flag observations =1.58 but I am not getting anywhere with this approach I could not come up with an apply function that counts the number of observations for which the absolute values are below the cutoff I've specified. All observations are numerical and I don't have missing values. Thank you in advance for the help, Hind __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to remove rows in which 2 or more observations are smaller than a given threshold?
If the matrix in question is named mymat, then mymat[apply(mymat,1,function(x)sum(abs(x) = 1.58) 2),] (untested due to a lack of a reproducible example) should give you a matrix without any rows containing two or more values with absolute value less than 1.58. I'm not sure ifelse would be of much use here. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Sat, 26 Feb 2011, hind lazrak wrote: Hello The data set I am examining has 7425 observations (rows with unique identifiers) and 46 samples(columns). I have been trying to generate a dataset that filters out observations that are negligible The definition of negligible is absolute value less or equal to 1.58. The rule that I would like to adopt to create a new data is: drop rows in which 2 or more observations have absolute values = 1.58. Since I have unique identifier per row, I have tried to reshape the data so I could create a new variable using an ifelse statement that would flag observations =1.58 but I am not getting anywhere with this approach I could not come up with an apply function that counts the number of observations for which the absolute values are below the cutoff I've specified. All observations are numerical and I don't have missing values. Thank you in advance for the help, Hind __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to remove rows in which 2 or more observations are smaller than a given threshold?
You didn't say if your data set was a matrix or data.frame. Here are 2 functions that do the job on either and one that only works with data.frames, but is faster (a similar speedup is available for matrices as well). They all compute the number of small values in each row, nSmall, and extract the rows for which nSmall is less than 2. f0 - function (x) { nSmall - apply(x, 1, function(row) sum(abs(row) = 1.58) x[nSmall2, , drop = FALSE] } f1 - function (x) { nSmall- rowSums(abs(x) 1.58) x[nSmall2, , drop = FALSE] } f2 - function (x) { stopifnot(is.data.frame(x)) nSmall - 0 for (column in x) { nSmall - nSmall + (abs(column) 1.58) } x[nSmall 2, , drop = FALSE] } For a 10^5 row by 50 column data.frame I got the following times: system.time(r0 - f0(z)) user system elapsed 2.390.042.51 system.time(r1 - f1(z)) user system elapsed 0.420.080.51 system.time(r2 - f2(z)) user system elapsed 0.210.050.24 identical(r0, r1) identical(r0, r2) [1] TRUE Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of hind lazrak Sent: Saturday, February 26, 2011 3:37 PM To: r-help@r-project.org Subject: [R] how to remove rows in which 2 or more observations are smaller than a given threshold? Hello The data set I am examining has 7425 observations (rows with unique identifiers) and 46 samples(columns). I have been trying to generate a dataset that filters out observations that are negligible The definition of negligible is absolute value less or equal to 1.58. The rule that I would like to adopt to create a new data is: drop rows in which 2 or more observations have absolute values = 1.58. Since I have unique identifier per row, I have tried to reshape the data so I could create a new variable using an ifelse statement that would flag observations =1.58 but I am not getting anywhere with this approach I could not come up with an apply function that counts the number of observations for which the absolute values are below the cutoff I've specified. All observations are numerical and I don't have missing values. Thank you in advance for the help, Hind __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to remove rows in which 2 or more observations are smaller than a given threshold?
Dear Bill and Phil Many thanks for your help, your solutions worked perfectly. Bill: I did not specify whether the data was a matrix or dataframe because it is in fact the Expression file in an eset object (bioBase). Thank you so much again! Hind On Sat, Feb 26, 2011 at 4:34 PM, William Dunlap wdun...@tibco.com wrote: You didn't say if your data set was a matrix or data.frame. Here are 2 functions that do the job on either and one that only works with data.frames, but is faster (a similar speedup is available for matrices as well). They all compute the number of small values in each row, nSmall, and extract the rows for which nSmall is less than 2. f0 - function (x) { nSmall - apply(x, 1, function(row) sum(abs(row) = 1.58) x[nSmall2, , drop = FALSE] } f1 - function (x) { nSmall- rowSums(abs(x) 1.58) x[nSmall2, , drop = FALSE] } f2 - function (x) { stopifnot(is.data.frame(x)) nSmall - 0 for (column in x) { nSmall - nSmall + (abs(column) 1.58) } x[nSmall 2, , drop = FALSE] } For a 10^5 row by 50 column data.frame I got the following times: system.time(r0 - f0(z)) user system elapsed 2.39 0.04 2.51 system.time(r1 - f1(z)) user system elapsed 0.42 0.08 0.51 system.time(r2 - f2(z)) user system elapsed 0.21 0.05 0.24 identical(r0, r1) identical(r0, r2) [1] TRUE Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of hind lazrak Sent: Saturday, February 26, 2011 3:37 PM To: r-help@r-project.org Subject: [R] how to remove rows in which 2 or more observations are smaller than a given threshold? Hello The data set I am examining has 7425 observations (rows with unique identifiers) and 46 samples(columns). I have been trying to generate a dataset that filters out observations that are negligible The definition of negligible is absolute value less or equal to 1.58. The rule that I would like to adopt to create a new data is: drop rows in which 2 or more observations have absolute values = 1.58. Since I have unique identifier per row, I have tried to reshape the data so I could create a new variable using an ifelse statement that would flag observations =1.58 but I am not getting anywhere with this approach I could not come up with an apply function that counts the number of observations for which the absolute values are below the cutoff I've specified. All observations are numerical and I don't have missing values. Thank you in advance for the help, Hind __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reproducibility issue in gbm (32 vs 64 bit)
I have heard about this before happening on other platforms. Frankly I'm not positive how this happens. My best guess is that there's a tiny bit of numeric instability in the 9+ decimal place so that on a given iteration a one variable choice at random looks better than the other. Any other ideas? Greg - Original Message - From: Joshua Wiley jwiley.ps...@gmail.com To: Axel Urbiz axel.ur...@gmail.com Cc: R-help@r-project.org R-help@r-project.org; Ridgeway, Greg Sent: Fri Feb 25 22:16:02 2011 Subject: Re: [R] Reproducibility issue in gbm (32 vs 64 bit) Hi Axel, I do not have a nice explanation why the results differ off the top of my head. I can say I can replicate what you get on 32/64 (both Windows 7) bit with the development version of R and gbm_1.6-3.1. Here is an even simpler example that shows the difference: gbmfit - gbm(1:50 ~ I(50:1) + I(60:11), distribution = gaussian) summary(gbmfit) I copied that package maintainer. Cheers, Josh On Fri, Feb 25, 2011 at 7:29 PM, Axel Urbiz axel.ur...@gmail.com wrote: Dear List, The gbm package on Win 7 produces different results for the relative importance of input variables in R 32-bit relative to R 64-bit. Any idea why? Any idea which one is correct? Based on this example, it looks like the relative importance of 2 perfectly correlated predictors is diluted by half in 32-bit, whereas in 64-bit, one of these predictors gets all the importance and the other gets none. I found this interesting. ### Sample code library(gbm) set.seed(12345) xc=matrix(rnorm(100*20),100,20) y=sample(1:2,100,replace=TRUE) xc[,2] - xc[,1] gbmfit - gbm(y~xc[,1]+xc[,2] +xc[,3], distribution=gaussian) summary(gbmfit) ### Results on R 2.12.0 (32-bit) var rel.inf 1 xc[, 3] 49.76143 2 xc[, 1] 27.27432 3 xc[, 2] 22.96425 ### Results on R 2.12.0 (64-bit) summary(gbmfit) var rel.inf 1 xc[, 1] 50.23857 2 xc[, 3] 49.76143 3 xc[, 2] 0.0 Thanks, Axel. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-ME] Fwd: Re: ANOVA and Pseudoreplication in R
On 25/02/2011 21:22, Ben Ward wrote: Original Message Subject:Re: [R] ANOVA and Pseudoreplication in R Date: Fri, 25 Feb 2011 12:10:14 -0800 From: Bert Guntergunter.ber...@gene.com To: Ben Wardbenjamin.w...@bathspa.org CC: r-helpr-help@r-project.org I can hopefully save bandwidth here by suggesting that this belongs on the R-sig-mixed-models list. -- Bert As an aside, shouldn't you be figuring this out yourself or seeking local consulting expertise? I did consult with the lecturer at university that knows most about stats, and he advised me: Pseudo replication is really about a lack of independence between measurements, So you need to work backwards and see where you are building in a known lack of independence. And where that is the case you need to use means of all the values. And I have done this and came to the conclusion I mentioned as to where I thought Pseudoreplicaton was comming from, however, I do not know about the one other 'potential' source as it really is for me at least, a grey area. I've consulted a few forums that deal with the theory more and await any response. Until then I'll have to try and get as many opinions on it as possible. -Ben W. On Fri, Feb 25, 2011 at 9:08 AM, Ben Wardbenjamin.w...@bathspa.org wrote: Hi, As part of my dissertation, I'm going to be doing an Anova, comparing the dead zone diameters on plates of microbial growth with little paper disks loaded with antimicrobial, a clear zone appears where death occurs, the size depending on the strength and succeptibility. So it's basically 4 different treatments, and I'm comparing the diameters (in mm) of circles. I'm concerned however, about Pseudoreplication and how to deal with it in R, (I thought of using the Error() term. I have four levels of one factor(called Treatment): NE.Dettol, EV.Dettol, NE.Garlic, EV.Garlic. (NE.Dettol is E.coli not evolved to dettol, exposed to dettol to get dead zones. And the same for NE.Garlic, but with garlic, not dettol. EV.Dettol is E.coli that has been evolved against dettol, and then tested afterwards against dettol to get the dead zones. Same applies for EV.Garlic but with garlic). You see from the four levels (or treatments) there are two chemicals involved. So my first concern is whether they should be analysed using two seperate ANOVA's. NE.Dettol and NE.Garlic are both the same organism - a lab stock E.coli, just exposed to two different chemicals. EV.Dettol and EV.Garlic, are in principle, likely to be two different forms of the organism after the many experimental doses of their respective chemical. For NE.Garlic and NE.Dettol I have 5, what I've called Lineages, basically seperate bottles of them (10 in total). Then I have 5 Bottles (Lineages) of EV.Dettol, and 5 of EV.Garlic. - This was done because there was the possiblity that, whilst I'm expecting them all to respond in a similar manner, there are many evolutionary paths to the same result, and previous research and reading shows that occasionally one or two react differently to the rest through random chance. The point I observed above (NE.Dettol and NE.Garlic are both the same organism...) is also applicable to the 5 bottles: The 5 bottles each of NE.Garlic and NE.Dettol are supposed to be all the same organism - from a stock one kept in store in the lab. There is potential though for the 5 of EV.Garlic, to be different from one another, and potential for the 5 EV.Dettol to be different from one another. The Lineage (bottle) is also a factor then, with 5 levels (1,2,3,4,5). Because they may be different. To get the measurements of the diamter of the zones. I take out a small amount from a tube and spread it on a plate, then take three paper disks, soaked in their respective chemical, either Dettol or Garlic. and press them and and incubate them. Then when the zones have appeared after a day or 2. I take 4 diameter measurements from each zone, across the zone at different angles, to take account for the fact, that there may be a weird shape, or not quite circular. I'm concerned about pseudoreplication, such as the multiple readings from one disk, and the 5 lineages - which might be different from one another in each of the Two EV. treatments, but not with NE. treatments. I read that I can remove pseudoreplication from the multiple readings from each disk, by using the 4 readings on each disk, to produce a mean for the disks, and analyse those means - Exerciseing caution where there are extreme values. I think the 3 disks for each lineage themselves are not pseudoreplication, because they are genuinley 3 disks on a plate: the Disk Diffusion Test replicated 3 times - but the multiple readings from one disk if eel, is pseudoreplication. I've also read about including Error() terms in a formula. I'm unsure of the two NE. Treatments comming
Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph
Thank you very very very much Prof Harrell!! You've made my day!! -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Prof-Frank-Harrell-in-Design-validate-cph-tp3316820p3325844.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject)
Greetings, I have been trying to use the RInside package with C++ however I am running into an error. I use Ubuntu Linux 10.04 (64-bit) and my R version is 2.12.2. I compiled R from source to give me a shared R library, shared BLAS (ATLAS). When I try to compile one of the examples given in the RInside package it compiles perfectly fine however when I run the program it gives me an error: error while loading shared libraries: libR.so: cannot open shared object file: No such file or directory This is strange because libR.so does indeed exist and I have all the correct paths leading to R. I compile the program using: g++ -I/usr/local/lib64/R/include -I/usr/local/lib64/R/library/Rcpp/include -I/usr/local/lib64/R/library/RInside/include -g -O3 -Wall -I/usr/local/include rinside.cpp -L/usr/local/lib64/R/lib -lR -L/usr/local/lib64/R/lib -lRblas -L/usr/local/lib64/R/lib -lRlapack -L/usr/local/lib64/R/library/Rcpp/lib -lRcpp -Wl,-rpath,/usr/local/lib64/R/library/Rcpp/lib -L/usr/local/lib64/R/library/RInside/lib -lRInside -Wl,-rpath,/usr/local/lib64/R/library/RInside/lib -o rinside I know RInside works on my system as I have used it before successfully. If somebody can help me out here I would appreciate it. Thanks, Hamaad Musharaf Shah. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help: Error en model.frame.defaul
Hi and thanks to all R developers and helpers, I wanrt to take the 1 and 3 dimensions of this object to create a new one without the years and with the next functions to do some specific calculations: str(PMpa) num [1:499105, 1:60, 1:12] 29.8 55.8 29.7 25.1 25 ... - attr(*, dimnames)=List of 3 ..$ punt: NULL ..$ any : chr [1:60] 1950 1951 1952 1953 ... ..$ mes : chr [1:12] Gener Febrer Març Abril ... I'm writing a function where appears the next error message: Error en model.frame.default(formula = x ~ c(1950:2009), drop.unused.levels = TRUE) : variable lengths differ (found for 'c (1950:2009)') Script: library(robustbase) load(PMpa.Rdat) R2 - function(x) var(predict(x))/var(predict(x)+resid(x)) TPMEN-t(apply(PMpa,c(1,3),function(x){ kk-lmrob(x~c(1950:2009)); c(kk$coefficients[2],R2(kk), (1-summary(kk)$coefficients[2,4])*100, (confint(kk,level=.95)*10)[2,])})) dimnames(TPMEN)[[3]][1:3]-c(Pendent,R2,Significancia) save(TPMEN,file=TPMEN.Rdat) working: TPMEN-t(apply(PMpa,c(1,3),function(x){ + kk-lmrob(x~c(1950:2009)); + c(kk$coefficients[2],R2(kk),(1-summary(kk)$coefficients[2,4])*100,(confint(kk,level=.95)*10)[2,])})) Error en model.frame.default(formula = x ~ c(1950:2009), drop.unused.levels = TRUE) : las longitudes de las variables son diferentes (encontrada para 'c(1950:2009)') Thanks to all for your time and help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help please ..simple question regarding output the p-value inside a function and lm
Hi Jorge and R users Thank you so much for the responses. You input helped me alot and potentially can help me to solve one more problem, but I got error message. I am sorry to ask you again but if you can find my problem in quick look that will be great. I hope this will not cost alot of your time as this is based on your idea. # Just data X1 - c(1,3,4,2,2) X2 - c(2,1,3,1,2) X3 - c(4,3,2,1,1) X4- c(1,1,1,2,3) X5 - c(3,2,1,1,2) X6 - c(1,1,2,2,3) odataframe - data.frame(X1,X2,X3,X4,X5,X6) My objective here is sort the value of the pair of variables (X1 and X2, X3 and X4, X5 and X6 and so on.) in such way that the second column in pair is always higher than the first one (X2 X1, X4 X3, X6 X5 and so on...). Here is my attempt: nmrk - 3 nvar - 2*nmrk lapply(1:nvar, function(ind){ # indices for the variables we need a - seq(1, nvar, by = 2) b - seq(2, nvar, by = 2) # shorting column tx[, a[ind]] = ifelse(odataframe[, a[ind]] odataframe[,b[ind]], odataframe[, a[ind]], odataframe[, b[ind]]) tx[, b[ind]] = ifelse(odataframe[, b[ind]] dataframe[,a[ind]], odataframe[,b[ind]], odataframe[,a[ind]]) df1 - transform( odataframe, odataframe[, a[ind]]= tx[, a[ind]], odataframe[, b[ind]]= tx[, b[ind]])) } I got the following error: Error: Error: unexpected '=' in: tx[, b[ind]] = ifelse(odataframe[, b[ind]] dataframe[,a[ind]], odataframe[,b[ind]], odataframe[,a[ind]]) df1 - transform( odataframe, odataframe[, a[ind]]= Thanks; Umesh R [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error
Mean doesn't work either... I understand that the message replacement has 0 items, need 37597770 implies that the function is not returning any values, but I don't understand why then this is not the case in the example. DF = data.frame(read.table(textConnection(A B C D E 1 1 a 1999 1 0 2 1 b 1999 0 1 3 1 c 1999 0 1 4 1 d 1999 1 0 5 2 c 2001 1 0 6 2 d 2001 0 1 7 3 a 2004 0 1 8 3 b 2004 0 1 9 3 d 2004 0 1 10 4 b 2001 1 0 11 4 c 2001 1 0 12 4 d 2001 0 1),head=TRUE,stringsAsFactors=FALSE)) DF = DF[order(DF$B,DF$C),] #first option - works fine in example and my target data frame DF$F = ave(DF$D,DF$B, FUN = function(x) cumsum(x)-x) DF$G = ave(DF$E,DF$B, FUN = function(x) cumsum(x)-x) #second option - works fine in example but not in my target data frame foo - function(x) { unlist(lapply(x, FUN = function(z) cumsum(z) - z)) } n-ave(DF[,c(4:5)],DF$B,FUN = foo) Why is this second option not working in my target data frame (which is much bigger than the example)? Thanks -- View this message in context: http://r.789695.n4.nabble.com/Error-tp3324531p3325857.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 2D Convolution Function
Dear R-Helpers, I want to try the 2D (two-dimensional) convolution in R. For example, let us we have the following kernel and data. kernel - (1,2,3,2,1) data - array(1:100, dim=c(10,10)) I know the function 'convolve' only for one-dimensional convolution, but it is just for a 1D sequence. Is there any function for 2D convolution? For theory, please refer to the following link: http://www.songho.ca/dsp/convolution/convolution2d_example.html Thank you for your help in advance. Best Regards, Wonsang - Wonsang You Leibniz Institute for Neurobiology -- View this message in context: http://r.789695.n4.nabble.com/2D-Convolution-Function-tp3326198p3326198.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping and counting in dataframe
sry, new try: tm-c(12345,42352,12435,67546,24234,76543,31243,13334,64562,64123) gr-c(1,3,1,2,2,4,2,3,3,3) d-data.frame(cbind(time,gr)) where tm are unix times and gr the factor grouping by i have a skalar for example k=500 now i need to calculate in for every row how much examples in the group are in the interval [i-500;i+500] and i is the active tm-element, like this: d time gr ct 1 12345 1 2 2 42352 3 0 3 12435 1 2 4 67546 2 0 5 24234 2 0 6 76543 4 0 7 31243 2 0 8 13334 3 0 9 64562 3 2 10 64123 3 2 i hope that was a better illustration of my problem -- View this message in context: http://r.789695.n4.nabble.com/grouping-and-counting-in-dataframe-tp3325476p3326338.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hello!
Hi,i would like to know what is the best way to write a procedure in R,it seems that when i run a script it doesn't wait for previouse code lines to be excuted,i would like to know how to force R to hold on a code line until it is processed,for example:i want to choose from a list the name of the sheet in which to do an analisys,then i want to run a goodness of fit test for the data on that chosen sheet,how can i do that?and if it is not too much i have couple more questions,1. how to retrieve the names of excel's file sheets to an array ?2. how to retrieve the names of excel's Sheet columns names to an array ? thanks a lot,Tsvi Sabo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 2D Convolution Function
Dear R-Helpers, I want to try the 2D (two-dimensional) convolution in R. For example, let us we have the following kernel and data. kernel - (1,2,3,2,1) data - array(1:100, dim=c(10,10)) I know the function 'convolve' only for one-dimensional convolution, but it is just for a 1D sequence. Is there any function for 2D convolution? For theory, please refer to the following link: http://www.songho.ca/dsp/convolution/convolution2d_example.html Thank you for your help in advance. Best Regards, Wonsang - Wonsang You Leibniz Institute for Neurobiology -- View this message in context: http://r.789695.n4.nabble.com/2D-Convolution-Function-tp3326158p3326158.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] wireframe() display a graph with two colors, not a gradient.
Hi all, I'm quite new to wireframe, essentially what I want to do is display a graph, and z-values 1 would be yellow and those 1 would be blue. This is a bit of my data. 0.334643563 0.350913807 0.383652307 0.370325283 0.38779016 0.42387392 0.39861579 0.418389687 0.460692165 0.43888516 0.468015843 0.520560489 0.499544084 0.535099422 0.60982153 0.569888047 0.634351734 0.717646392 0.717202578 0.810887467 0.935152498 0.901982916 1.044895388 1.228306176 1.12856184 1.314210456 1.462055626 1.377314404 1.540372345 1.6206216 1.494044604 1.618244219 1.631295797 data.m = as.matrix(read.table(/Users/James/Desktop/c.txt, sep='\t')) library(lattice) wireframe(data.m,aspect = c(0.3), shade=TRUE, screen = list(z = 0, x = -45), light.source = c(0,0,10), distance = 0.2) i know I probably need to use the col.regions or level.colors argument, but I'm not sure what I actually need to supply to the arguments to get it to work. All the examples I've seen also have a gradient of color between two or more colors, I want to color half the graph with z values 1 yellow, and 1 blue. Thanks for any help. James [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using uniroot() with output from deriv() or D()
Hans, Both your methods worked great! They was exactly what I was looking for. I ended up using the second method as it is a little more efficient: fn2nd_fun - function(x) eval(fn2nd, list(x=x)) ex - seq(from=0, to=1, length.out = 1000) y1 - fn2nd_fun(ex) ... r - uniroot(fn2nd_fun, interval=c(0, 1), tol=0.0001) Thanks, J -- View this message in context: http://r.789695.n4.nabble.com/Using-uniroot-with-output-from-deriv-or-D-tp3325635p3326296.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-ME] Fwd: Re: ANOVA and Pseudoreplication in R
Hi Ben: 1) Confession: I did not and have not read your post in detail. 2) IMHO, the following advice: Pseudo replication is really about a lack of independence between measurements, So you need to work backwards and see where you are building in a known lack of independence. And where that is the case you need to use means of all the values. is baloney. The lack of independence part is correct. The baloney part is take the mean. That is exactly where mixed models -- or hierarchical modeling of some sort -- is required. That's why I referred you to R-sig-mixed-models. Again, please note: IMHO. Maybe I'm the one full of baloney. It's free advice, after all, so beware of what you get. Cheers, Bert On Sat, Feb 26, 2011 at 7:44 AM, Ben Ward benjamin.w...@bathspa.org wrote: On 25/02/2011 21:22, Ben Ward wrote: Original Message Subject: Re: [R] ANOVA and Pseudoreplication in R Date: Fri, 25 Feb 2011 12:10:14 -0800 From: Bert Guntergunter.ber...@gene.com To: Ben Wardbenjamin.w...@bathspa.org CC: r-helpr-help@r-project.org I can hopefully save bandwidth here by suggesting that this belongs on the R-sig-mixed-models list. -- Bert As an aside, shouldn't you be figuring this out yourself or seeking local consulting expertise? I did consult with the lecturer at university that knows most about stats, and he advised me: Pseudo replication is really about a lack of independence between measurements, So you need to work backwards and see where you are building in a known lack of independence. And where that is the case you need to use means of all the values. And I have done this and came to the conclusion I mentioned as to where I thought Pseudoreplicaton was comming from, however, I do not know about the one other 'potential' source as it really is for me at least, a grey area. I've consulted a few forums that deal with the theory more and await any response. Until then I'll have to try and get as many opinions on it as possible. -Ben W. On Fri, Feb 25, 2011 at 9:08 AM, Ben Wardbenjamin.w...@bathspa.org wrote: Hi, As part of my dissertation, I'm going to be doing an Anova, comparing the dead zone diameters on plates of microbial growth with little paper disks loaded with antimicrobial, a clear zone appears where death occurs, the size depending on the strength and succeptibility. So it's basically 4 different treatments, and I'm comparing the diameters (in mm) of circles. I'm concerned however, about Pseudoreplication and how to deal with it in R, (I thought of using the Error() term. I have four levels of one factor(called Treatment): NE.Dettol, EV.Dettol, NE.Garlic, EV.Garlic. (NE.Dettol is E.coli not evolved to dettol, exposed to dettol to get dead zones. And the same for NE.Garlic, but with garlic, not dettol. EV.Dettol is E.coli that has been evolved against dettol, and then tested afterwards against dettol to get the dead zones. Same applies for EV.Garlic but with garlic). You see from the four levels (or treatments) there are two chemicals involved. So my first concern is whether they should be analysed using two seperate ANOVA's. NE.Dettol and NE.Garlic are both the same organism - a lab stock E.coli, just exposed to two different chemicals. EV.Dettol and EV.Garlic, are in principle, likely to be two different forms of the organism after the many experimental doses of their respective chemical. For NE.Garlic and NE.Dettol I have 5, what I've called Lineages, basically seperate bottles of them (10 in total). Then I have 5 Bottles (Lineages) of EV.Dettol, and 5 of EV.Garlic. - This was done because there was the possiblity that, whilst I'm expecting them all to respond in a similar manner, there are many evolutionary paths to the same result, and previous research and reading shows that occasionally one or two react differently to the rest through random chance. The point I observed above (NE.Dettol and NE.Garlic are both the same organism...) is also applicable to the 5 bottles: The 5 bottles each of NE.Garlic and NE.Dettol are supposed to be all the same organism - from a stock one kept in store in the lab. There is potential though for the 5 of EV.Garlic, to be different from one another, and potential for the 5 EV.Dettol to be different from one another. The Lineage (bottle) is also a factor then, with 5 levels (1,2,3,4,5). Because they may be different. To get the measurements of the diamter of the zones. I take out a small amount from a tube and spread it on a plate, then take three paper disks, soaked in their respective chemical, either Dettol or Garlic. and press them and and incubate them. Then when the zones have appeared after a day or 2. I take 4 diameter measurements from each zone, across the zone at different angles, to take account for the fact, that there may be a weird shape, or not quite
[R] Weighting of variables
Dear All, I have run PCA using prcomp() and generated a loading plot using biplot(). In addition, I would like to extract the weighting for each of the variables in the loading plot so I can see which of the variables have the most influence on the distribution of the points in a 2D plot of PC1 and PC2. Is there a simple function for this? Thanks! -Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hello!
Are you a fan of James Joyce? Is the Caps key on your keyboard broken? -- Bert On Sat, Feb 26, 2011 at 10:09 AM, tsvi sabo tsvis...@yahoo.com wrote: Hi,i would like to know what is the best way to write a procedure in R,it seems that when i run a script it doesn't wait for previouse code lines to be excuted,i would like to know how to force R to hold on a code line until it is processed,for example:i want to choose from a list the name of the sheet in which to do an analisys,then i want to run a goodness of fit test for the data on that chosen sheet,how can i do that?and if it is not too much i have couple more questions,1. how to retrieve the names of excel's file sheets to an array ?2. how to retrieve the names of excel's Sheet columns names to an array ? thanks a lot,Tsvi Sabo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Transform a dataset from long to wide using reshape2
Thanks. I suddenly realised this as I was cycling down to the local grocery store, about 10 minutes after posting the problem. I felt like an idiot. Thanks for confirming my stupidity. john --- On Sat, 2/26/11, Phil Spector spec...@stat.berkeley.edu wrote: From: Phil Spector spec...@stat.berkeley.edu Subject: Re: [R] Transform a dataset from long to wide using reshape2 To: John Kane jrkrid...@yahoo.ca Cc: R R-help r-h...@stat.math.ethz.ch Received: Saturday, February 26, 2011, 2:24 PM John - In the dat data frame, there is exactly one observation for each Subject/Item combination, so it's clear how to rearrange the values of Score. In the mydata data frame there are multiple values for most combinations of x and id, so reshape2 needs to aggregate y before it can rearrange the values. Hope this helps. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Sat, 26 Feb 2011, John Kane wrote: I seem to be running into the same problem reported in https://stat.ethz.ch/pipermail/r-help/2010-November/258265.html I cannot seem to transform a dataset from long to wide using reshape2. Clearly I am missing something very simple but a look at the manual and the reshape paper in JSS does not suggest anything. Any advice would be welcome ===load rehape2 library(reshape2) Working example=== # Example from above URL dat - read.table(textConnection('Subject Item Score Subject 1 Item 1 1 Subject 1 Item 2 0 Subject 1 Item 3 1 Subject 2 Item 1 1 Subject 2 Item 2 1 Subject 2 Item 3 0'), header=TRUE) closeAllConnections() acast(dat, Subject~Item) # Seems fine =My attemp=== # set.seed(1) mydata - data.frame( x=sample(LETTERS[23:26],100, replace=TRUE), y=rnorm(100, mean=2), id=sample(letters[1:4], 100, replace=TRUE)) acast(mydata , id ~ x) # not so fine = produces this : Using id as value column: use value_var to override. Aggregation function missing: defaulting to length W X Y Z a 7 5 5 7 b 6 9 6 12 c 1 4 5 6 d 6 14 5 2 Why does reshape2 think it needs an aggregation function here Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nested case-control study
Hi, I am wondering if there is a package for doing conditional logistic regression for nested case-control study as described in Estimation of absolute risk from nested case-control data by Langholz and Borgan (1997) where Horvitz-Thompson sampling weight (log of (number in the risk set divided by the number sampled)) is used with regression. In SAS Proc Phreg, this is implemented as an offset (offset=logweight). I checked clogistic() in Epi package and clogit() in survival package, but couldn't figure out how to incorporate this weighting with either. Also when considering nested case-control sampling for Cox proportional hazards model, the above method can estimate absolute risk of developing disease over a specified time interval. Appreciate if anyone has any suggestion on how to do this in R. Thanks very much! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-ME] Fwd: Re: ANOVA and Pseudoreplication in R
I think you are over-concerned with the term pseudo-replication. All this means is that the error source is nested, and not a full replicate. What you haven't done, and need to do, is to describe your experiment in terms of the real variables and error sources. Then code them as fixed or random and write the model form. You description is very complex and confusing. You need to identify: 1. Exactly what culture is used in each trial. 2. All subcultures: NE vs EV, which chemical, etc. 3. Which plate. 4. Which circle on which plate. 5. Which measurement on which circle. After you have identified all descriptors that identify each individual measurement, put them in a file with columns for factors and the outcome. At this point it should be clear that you have a number of random factors: plate, circle, measurement, plus any culture replicates, if you have them. One you successfully do this, you can write the model equation with the proper nesting. The time to do all this is BEFORE you conduct the experiment. At 10:44 AM 2/26/2011, Ben Ward wrote: On 25/02/2011 21:22, Ben Ward wrote: Original Message Subject:Re: [R] ANOVA and Pseudoreplication in R Date: Fri, 25 Feb 2011 12:10:14 -0800 From: Bert Guntergunter.ber...@gene.com To: Ben Wardbenjamin.w...@bathspa.org CC: r-helpr-help@r-project.org I can hopefully save bandwidth here by suggesting that this belongs on the R-sig-mixed-models list. -- Bert As an aside, shouldn't you be figuring this out yourself or seeking local consulting expertise? I did consult with the lecturer at university that knows most about stats, and he advised me: Pseudo replication is really about a lack of independence between measurements, So you need to work backwards and see where you are building in a known lack of independence. And where that is the case you need to use means of all the values. And I have done this and came to the conclusion I mentioned as to where I thought Pseudoreplicaton was comming from, however, I do not know about the one other 'potential' source as it really is for me at least, a grey area. I've consulted a few forums that deal with the theory more and await any response. Until then I'll have to try and get as many opinions on it as possible. -Ben W. On Fri, Feb 25, 2011 at 9:08 AM, Ben Wardbenjamin.w...@bathspa.org wrote: Hi, As part of my dissertation, I'm going to be doing an Anova, comparing the dead zone diameters on plates of microbial growth with little paper disks loaded with antimicrobial, a clear zone appears where death occurs, the size depending on the strength and succeptibility. So it's basically 4 different treatments, and I'm comparing the diameters (in mm) of circles. I'm concerned however, about Pseudoreplication and how to deal with it in R, (I thought of using the Error() term. I have four levels of one factor(called Treatment): NE.Dettol, EV.Dettol, NE.Garlic, EV.Garlic. (NE.Dettol is E.coli not evolved to dettol, exposed to dettol to get dead zones. And the same for NE.Garlic, but with garlic, not dettol. EV.Dettol is E.coli that has been evolved against dettol, and then tested afterwards against dettol to get the dead zones. Same applies for EV.Garlic but with garlic). You see from the four levels (or treatments) there are two chemicals involved. So my first concern is whether they should be analysed using two seperate ANOVA's. NE.Dettol and NE.Garlic are both the same organism - a lab stock E.coli, just exposed to two different chemicals. EV.Dettol and EV.Garlic, are in principle, likely to be two different forms of the organism after the many experimental doses of their respective chemical. For NE.Garlic and NE.Dettol I have 5, what I've called Lineages, basically seperate bottles of them (10 in total). Then I have 5 Bottles (Lineages) of EV.Dettol, and 5 of EV.Garlic. - This was done because there was the possiblity that, whilst I'm expecting them all to respond in a similar manner, there are many evolutionary paths to the same result, and previous research and reading shows that occasionally one or two react differently to the rest through random chance. The point I observed above (NE.Dettol and NE.Garlic are both the same organism...) is also applicable to the 5 bottles: The 5 bottles each of NE.Garlic and NE.Dettol are supposed to be all the same organism - from a stock one kept in store in the lab. There is potential though for the 5 of EV.Garlic, to be different from one another, and potential for the 5 EV.Dettol to be different from one another. The Lineage (bottle) is also a factor then, with 5 levels (1,2,3,4,5). Because they may be different. To get the measurements of the diamter of the zones. I take out a small amount from a tube and spread it on a plate, then take three paper disks, soaked in their respective chemical,