[R] Need help to split a given matrix is a sequential way
I need to split a given matrix in a sequential order. Let my matrix is : dat - cbind(sample(c(100,200), 10, T), sample(c(50,100, 150, 180), 10, T), sample(seq(20, 200, by=20), 10, T)); dat [,1] [,2] [,3] [1,] 200 100 80 [2,] 100 180 80 [3,] 200 150 180 [4,] 200 50 140 [5,] 100 150 60 [6,] 100 50 60 [7,] 100 100 100 [8,] 200 150 100 [9,] 100 50 120 [10,] 200 50 180 Now I need to split above matrix according to unique numbers in the 2nd column. Therefore I have following : dat1 - dat[which(dat[,1] == unique(dat[,1])[1]),] dat2 - dat[-which(dat[,1] == unique(dat[,1])[1]),]; dat1; dat2 [,1] [,2] [,3] [1,] 200 100 80 [2,] 200 150 180 [3,] 200 50 140 [4,] 200 150 100 [5,] 200 50 180 [,1] [,2] [,3] [1,] 100 180 80 [2,] 100 150 60 [3,] 100 50 60 [4,] 100 100 100 [5,] 100 50 120 Now each of dat1 and dat2 needs to be splited according to the it's 2nd column i.e. dat11 - dat1[which(dat1[,2] == unique(dat1[,2])[1]),] dat12 - dat1[which(dat1[,2] == unique(dat1[,2])[2]),] dat13 - dat1[which(dat1[,2] == unique(dat1[,2])[3]),]; dat11; dat12; dat13 [1] 200 100 80 [,1] [,2] [,3] [1,] 200 150 180 [2,] 200 150 100 [,1] [,2] [,3] [1,] 200 50 140 [2,] 200 50 180 similarly for dat2.. This kind of sequential spliting would continue for (no_of_cols_of_ogirinal_matrix -1) times. It would be greate if again I can put all those matrices within a list object for further calculations. Therefore you see if the original matrix is of small_size then that can be handled manually. However for a moderately large matrix that task would be very clumbersome. Therefore I am looking for some mechanized way to do that for an arbitrary matrix. Can anyone here help me on this regard? Thank you so much for your kind attention. -- View this message in context: http://n4.nabble.com/Need-help-to-split-a-given-matrix-is-a-sequential-way-tp1744803p1744803.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help to split a given matrix is a sequential way
Hi, Not sure exactly how, but I think using a combination of unique() and split() could do what you're looking for. I hope it will help you Ivan Le 3/30/2010 09:20, Megh a écrit : I need to split a given matrix in a sequential order. Let my matrix is : dat- cbind(sample(c(100,200), 10, T), sample(c(50,100, 150, 180), 10, T), sample(seq(20, 200, by=20), 10, T)); dat [,1] [,2] [,3] [1,] 200 100 80 [2,] 100 180 80 [3,] 200 150 180 [4,] 200 50 140 [5,] 100 150 60 [6,] 100 50 60 [7,] 100 100 100 [8,] 200 150 100 [9,] 100 50 120 [10,] 200 50 180 Now I need to split above matrix according to unique numbers in the 2nd column. Therefore I have following : dat1- dat[which(dat[,1] == unique(dat[,1])[1]),] dat2- dat[-which(dat[,1] == unique(dat[,1])[1]),]; dat1; dat2 [,1] [,2] [,3] [1,] 200 100 80 [2,] 200 150 180 [3,] 200 50 140 [4,] 200 150 100 [5,] 200 50 180 [,1] [,2] [,3] [1,] 100 180 80 [2,] 100 150 60 [3,] 100 50 60 [4,] 100 100 100 [5,] 100 50 120 Now each of dat1 and dat2 needs to be splited according to the it's 2nd column i.e. dat11- dat1[which(dat1[,2] == unique(dat1[,2])[1]),] dat12- dat1[which(dat1[,2] == unique(dat1[,2])[2]),] dat13- dat1[which(dat1[,2] == unique(dat1[,2])[3]),]; dat11; dat12; dat13 [1] 200 100 80 [,1] [,2] [,3] [1,] 200 150 180 [2,] 200 150 100 [,1] [,2] [,3] [1,] 200 50 140 [2,] 200 50 180 similarly for dat2.. This kind of sequential spliting would continue for (no_of_cols_of_ogirinal_matrix -1) times. It would be greate if again I can put all those matrices within a list object for further calculations. Therefore you see if the original matrix is of small_size then that can be handled manually. However for a moderately large matrix that task would be very clumbersome. Therefore I am looking for some mechanized way to do that for an arbitrary matrix. Can anyone here help me on this regard? Thank you so much for your kind attention. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help to split a given matrix is a sequential way
Hi: Does this work for you? dat - as.data.frame(dat) lapply(split(dat, dat$V1), function(x) split(x, x$V2)) The result contains two components for 100 and 200, and subcomponents within each component. HTH, Dennis On Tue, Mar 30, 2010 at 12:20 AM, Megh megh700...@yahoo.com wrote: I need to split a given matrix in a sequential order. Let my matrix is : dat - cbind(sample(c(100,200), 10, T), sample(c(50,100, 150, 180), 10, T), sample(seq(20, 200, by=20), 10, T)); dat [,1] [,2] [,3] [1,] 200 100 80 [2,] 100 180 80 [3,] 200 150 180 [4,] 200 50 140 [5,] 100 150 60 [6,] 100 50 60 [7,] 100 100 100 [8,] 200 150 100 [9,] 100 50 120 [10,] 200 50 180 Now I need to split above matrix according to unique numbers in the 2nd column. Therefore I have following : dat1 - dat[which(dat[,1] == unique(dat[,1])[1]),] dat2 - dat[-which(dat[,1] == unique(dat[,1])[1]),]; dat1; dat2 [,1] [,2] [,3] [1,] 200 100 80 [2,] 200 150 180 [3,] 200 50 140 [4,] 200 150 100 [5,] 200 50 180 [,1] [,2] [,3] [1,] 100 180 80 [2,] 100 150 60 [3,] 100 50 60 [4,] 100 100 100 [5,] 100 50 120 Now each of dat1 and dat2 needs to be splited according to the it's 2nd column i.e. dat11 - dat1[which(dat1[,2] == unique(dat1[,2])[1]),] dat12 - dat1[which(dat1[,2] == unique(dat1[,2])[2]),] dat13 - dat1[which(dat1[,2] == unique(dat1[,2])[3]),]; dat11; dat12; dat13 [1] 200 100 80 [,1] [,2] [,3] [1,] 200 150 180 [2,] 200 150 100 [,1] [,2] [,3] [1,] 200 50 140 [2,] 200 50 180 similarly for dat2.. This kind of sequential spliting would continue for (no_of_cols_of_ogirinal_matrix -1) times. It would be greate if again I can put all those matrices within a list object for further calculations. Therefore you see if the original matrix is of small_size then that can be handled manually. However for a moderately large matrix that task would be very clumbersome. Therefore I am looking for some mechanized way to do that for an arbitrary matrix. Can anyone here help me on this regard? Thank you so much for your kind attention. -- View this message in context: http://n4.nabble.com/Need-help-to-split-a-given-matrix-is-a-sequential-way-tp1744803p1744803.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A link to a collection of tutorials and videos on R.
A link to a collection of tutorials and videos on R. Tutorials: http://www.dataminingtools.net/browsetutorials.php?tag=rdmt Videos: http://www.dataminingtools.net/videos.php?id=8 -- View this message in context: http://n4.nabble.com/A-link-to-a-collection-of-tutorials-and-videos-on-R-tp1744835p1744835.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error bars
Dear friends, I have a statistical question. Sometimes, if I compare boys to girls on a specific variable, the error bars (confidence interval of means) seem to overlap slightly. Still, when I run a t-test, I find statistically significant differences. The rule is clear: if the confidence intervals do not overlap, then there is statistically significant difference. But if they overlap slightly, we have to use a t-test to know for sure if the the two means differ significantly. The point is: is there a rule of thumb to say, for example, if the overlap is less than 20% of the length of the standard error, then a t-test would give significant results? thank you for your time P.S.1 is there an easy way to plot error bars in R? P.S.2 an interesting discussion about this - highly recommended to read it - can be found at http://scienceblogs.com/cognitivedaily/2007/03/ill_bet_you_dont_understand_er.php jason Dr. Iasonas Lamprianou Assistant Professor (Educational Research and Evaluation) Department of Education Sciences European University-Cyprus P.O. Box 22006 1516 Nicosia Cyprus Tel.: +357-22-713178 Fax: +357-22-590539 Honorary Research Fellow Department of Education The University of Manchester Oxford Road, Manchester M13 9PL, UK Tel. 0044 161 275 3485 iasonas.lampria...@manchester.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use logical in cor.test
Thanks for the replies. In response to Erik: What does Both[,1] show you? Both[,1] [1] 3.36 NA NA NA NA NA NA 3.92 3.50 NA NA NA NA 3.76 3.19 3.83 NA 3.66.. What does Both[,1] 2.5 show you? Both[,1]2.5 [1] TRUENANANANANANA TRUE TRUENANA NANA TRUE TRUE I understand a logical variable is binary, but don't know how to select a subset of the data (have tried the subset function, but can't seem to get it to work) Bill, when I run what you suggested, I get: tBoth - Both is.na(tBoth[tBoth 2.5]) - TRUE Error in is.na(tBoth[tBoth 2.5]) - TRUE : NAs are not allowed in subscripted assignments R - cor(tBoth, use = complete.obs) R[1,2] [1] 0.7750889 Any idea with the error message? Thanks again, Paul -- View this message in context: http://n4.nabble.com/use-logical-in-cor-test-tp1744701p1744896.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting CI's for certain y of nls fitted curve
...it's of course simply using the desired x in the predict function. in this case: predict(mod1,data.frame(press = x_tenth[1]). it must have been a trivial syntax error, why this didn't work in the first place. kay -- View this message in context: http://n4.nabble.com/getting-CI-s-for-certain-y-of-nls-fitted-curve-tp1695025p1744909.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confusing concept of vector and matrix in R
On Tue, Mar 30, 2010 at 2:42 AM, Rolf Turner r.tur...@auckland.ac.nz wrote: Well then, why don't you go away and design and build your own statistics and data analysis language/package to replace R? You can then make whatever design decisions you like, and you won't have to live with the design decisions made by such silly and inept people as John Chambers and Rick Becker and their ilk. Aah, argument by (ironic) reference to learned authority! Even Einstein was wrong (God does not play dice). He was also right, thought he was wrong, and then we've discovered he may have been right all along (The Cosmological Constant, Dark Energy etc). How many of us have _never_ interfaced our foreheads with the keyboard when something breaks because we didn't put ,drop=FALSE in a matrix subscript? There is no doubt that R plays fast and loose with many concepts of type and structure that Computer Scientists would turn their nose up at. I would love to go away and redesign it, but I'd just end up with python. Truth is that R's statistical power is what makes it great because of the vast wealth of CRAN, not the R language per se with its features that so fluster my comp-sci friends. And many a beginner. We work round them by bashing our heads on the keyboards, typing ,drop=FALSE, and vowing never to do it again. And writing more unit tests. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help in matlab - r code
Dear Susanne, Thank you for your answer :-) and for the other people that helped privately. I have been running the code with a friend, and we reached a similar conclusion. Matlab, apparently automatically transforms the matrices in vector and does the correlation between vectors thus obtaining one value only. This differs also from octave, which as R, do the correlation of the matrices*matrices. I managed to sort this out, transforming the matrices into vectors, simply by adding a c before. We also found this aspect of 0-1 and 1-255 intriguing.In the end, I used a formula from a friend to do the transformation, getting a similar values in both programmes. Here's the code: ImageWidth = dim(x)[2] #number of col MaxOffset = 99; #defined variable ImageWidthToProcess = ImageWidth-MaxOffset; #col-defined variable AutoCData=0 #dif from matlab: in R we should create the matrix/dataframe where we will store the data created by the loop for(Offset in 1:MaxOffset){ #Offset=2 OffsetPlaquette = x[,c(1+Offset):c(ImageWidthToProcess+Offset)] AutoCData[Offset] = cor(c(x[,1:ImageWidthToProcess]), c(OffsetPlaquette)) print(Offset) } AutoCData plot(AutoCData) ### COOL :-) The results were very similar to matlab. I still have many lines to go :-) Using the function you very well produced (thank you so much), the difference between the results of the two are low summary(AutoCData2-AutoCData) Min.1st Qu. Median Mean3rd Qu. -2.581e-15 -2.741e-16 -2.082e-17 5.723e-17 2.637e-16 Max. 5.329e-15 which is good! All the best, Marta 2010/3/29 Susanne Schmidt s.schm...@bham.ac.uk Dear Marta, I did it in Matlab, and fiddled around with R code until I had *almost* the same result. The almost is probably due to R handling the picture values (ranging from 0 to 1) differently than Matlab (ranging from 0 to 255), and simply multiplying the R picture values by 255 did NOT result in exactly the same values as the Matlab values. [what seems white in the picture is 245 in Matlab, although values potentially range to 255, and white is 0.9642549 in R, which multiplied by 255 gives 245.12, e.g.] But maybe the precision of this solution is good enough for you .. The corr2 demand from matlab is a 2D correlation coefficient - the R command cor works elementwise, and is not the solution here. Below I tried to implement the formula given in the following matlab page: http://www.mathworks.com/access/helpdesk/help/toolbox/images/corr2.html Maybe somebody on the list has a nice idea how to make the code more elegant This is the complete code in R setwd(D:/ wherever ) library(ReadImages) x - read.jpeg( whichever .jpg) #open image plot(x) #plot image x - rgb2grey(x) #convert to greyscale plot(x) # check ;-) the image is in grey scale ImageWidth = dim(x)[2] #number of col MaxOffset = 99; #defined variable ImageWidthToProcess = ImageWidth-MaxOffset; #col-defined variable ## this one does NOT work because matrices not square: for(k in 1: MaxOffset) { OffsetPlaquette - x[ , c((1+ k) : (ImageWidthToProcess + k))] dataToProcess - x[,c(1:ImageWidthToProcess)] AutoCData[k] - mantel(OffsetPlaquette, dataToProcess) } AutoCData ## END this one does not work because matrices not square AutoCData - rep(0, MaxOffset) sumBothM - rep(0, MaxOffset) sum1stMsq - rep(0, MaxOffset) sum2ndMsq - rep(0, MaxOffset) for(k in 1: MaxOffset) { OffsetPlaquette - x[ ,(1+k) : (ImageWidthToProcess + k)] dataToProcess - x[,c(1:ImageWidthToProcess)] meanM - mean(OffsetPlaquette); meanM2 - mean(dataToProcess) for(j in 1:dim(dataToProcess)[2]){ for(i in 1:dim(OffsetPlaquette)[1]){ sumBothM[k] - sumBothM[k] + (OffsetPlaquette[i,j]-meanM)*(dataToProcess[i,j]-meanM2) sum1stMsq[k] - sum1stMsq[k] + (OffsetPlaquette[i,j]-meanM)^2 sum2ndMsq[k] - sum2ndMsq[k] + (dataToProcess[i,j]-meanM2)^2 } } AutoCData[k] - sumBothM[k]/(sqrt(sum1stMsq[k] * sum2ndMsq[k])) } AutoCData Best wishes, Susanne [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] S3 vs S4
Dear R users, I'm still a beginner and I'm wondering whether S3 or S4 methods really differ for my use. I understand more or less the distinction between the 2 classes from the documentation I've read but the big question is: _*does it make a difference in practice**?*_ Up to now, I've worked without noticing anything, but it might be important to differentiate and to know which one to use and how. Thank you for your help Regards, Ivan -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confusing concept of vector and matrix in R
Reframe the problem. Rethink why you need to keep dimensions. I never ever had to use drop. My .02 something mario Barry Rowlingson wrote: On Tue, Mar 30, 2010 at 2:42 AM, Rolf Turner r.tur...@auckland.ac.nz wrote: Well then, why don't you go away and design and build your own statistics and data analysis language/package to replace R? You can then make whatever design decisions you like, and you won't have to live with the design decisions made by such silly and inept people as John Chambers and Rick Becker and their ilk. Aah, argument by (ironic) reference to learned authority! Even Einstein was wrong (God does not play dice). He was also right, thought he was wrong, and then we've discovered he may have been right all along (The Cosmological Constant, Dark Energy etc). How many of us have _never_ interfaced our foreheads with the keyboard when something breaks because we didn't put ,drop=FALSE in a matrix subscript? There is no doubt that R plays fast and loose with many concepts of type and structure that Computer Scientists would turn their nose up at. I would love to go away and redesign it, but I'd just end up with python. Truth is that R's statistical power is what makes it great because of the vast wealth of CRAN, not the R language per se with its features that so fluster my comp-sci friends. And many a beginner. We work round them by bashing our heads on the keyboards, typing ,drop=FALSE, and vowing never to do it again. And writing more unit tests. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ing. Mario Valle Data Analysis and Visualization Group| http://www.cscs.ch/~mvalle Swiss National Supercomputing Centre (CSCS) | Tel: +41 (91) 610.82.60 v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R package licences
Hi, please, can SOMEBODY help me to find the right characters to fit into the field \details{ ... ... License: \tab \cr } of the description file for a new R-package? When building the package I always get: * checking DESCRIPTION meta-information ... WARNING Non-standard license specification: What license is it under? Last time I just used GPL and it worked, this time it doesn't ... I tried the following character strings: GPL GPL-2 GPL-3 LGPL-2 LGPL-2.1 LGPL-3 AGPL-3 Artistic-1.0 Artistic-2.0 all with the same results. Thanks in advance, Ove [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] remove from the R mailing list
I would like to be removed from the R mailing list. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error bars
you can write a function for yourself using arrows: #x,y: dataset as vectors #xe,ye: errors per entry as vectors, if the errors are symmetric arrbar-function(x,y,xe,ye){ l-length(x) for (i in 1:l) { arrows(x[i],y[i],x[i]-xe[i]/2,y[i],angle=90,length=0.05) arrows(x[i],y[i],x[i]+xe[i]/2,y[i],angle=90,length=0.05) arrows(x[i],y[i],x[i],y[i]-ye[i]/2,angle=90,length=0.05) arrows(x[i],y[i],x[i],y[i]+ye[i]/2,angle=90,length=0.05) } } Iasonas Lamprianou schrieb: Dear friends, I have a statistical question. Sometimes, if I compare boys to girls on a specific variable, the error bars (confidence interval of means) seem to overlap slightly. Still, when I run a t-test, I find statistically significant differences. The rule is clear: if the confidence intervals do not overlap, then there is statistically significant difference. But if they overlap slightly, we have to use a t-test to know for sure if the the two means differ significantly. The point is: is there a rule of thumb to say, for example, if the overlap is less than 20% of the length of the standard error, then a t-test would give significant results? thank you for your time P.S.1 is there an easy way to plot error bars in R? P.S.2 an interesting discussion about this - highly recommended to read it - can be found at http://scienceblogs.com/cognitivedaily/2007/03/ill_bet_you_dont_understand_er.php jason Dr. Iasonas Lamprianou Assistant Professor (Educational Research and Evaluation) Department of Education Sciences European University-Cyprus P.O. Box 22006 1516 Nicosia Cyprus Tel.: +357-22-713178 Fax: +357-22-590539 Honorary Research Fellow Department of Education The University of Manchester Oxford Road, Manchester M13 9PL, UK Tel. 0044 161 275 3485 iasonas.lampria...@manchester.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- ___ Dipl.-Phys. Markus Schmotz Universität Konstanz Fachbereich Physik, Lehrstuhl Leiderer Postfach M 676 D-78457 Konstanz Tel.: +49 7531 88 3803, Fax: 3127 Mail: markus.schm...@uni-konstanz.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove from the R mailing list
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de zoe zhang Enviado el: martes, 30 de marzo de 2010 12:18 Para: r-help@r-project.org Asunto: [R] remove from the R mailing list I would like to be removed from the R mailing list. Thanks. --- Hi, Would you like me to remove you? Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R package licences
On Tue, Mar 30, 2010 at 10:15 AM, Uwe Behrens ubehre...@gmail.com wrote: Hi, please, can SOMEBODY help me to find the right characters to fit into the field \details{ ... ... License: \tab \cr } What file is this? Because \details belongs in a .Rd documentation file, but the license is specified in the DESCRIPTION file, which doesn't have \details... Are you editing a .Rd and not the DESCRIPTION file? of the description file for a new R-package? When building the package I always get: * checking DESCRIPTION meta-information ... WARNING Non-standard license specification: What license is it under? Last time I just used GPL and it worked, this time it doesn't ... I tried the following character strings: GPL GPL-2 GPL-3 LGPL-2 LGPL-2.1 LGPL-3 AGPL-3 Artistic-1.0 Artistic-2.0 all with the same results. Most of those (if not all) should be valid. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SHLIB not working (Win Vista)
On 29/03/2010 11:50 PM, Remko Duursma wrote: Dear R-helpers, I tried to build a DLL like I have done so many times, but this time on my new machine, but it gives the erorr: (from cmd window) R CMD SHLIB Boxcnt.f MAKE Version 5.2 Copyright (c) 1987, 2000 Borland Error c:/PROGRA~1/R/R-210~1.1/share/make/winshlib.mk 4: Command syntax error *** 1 errors during make *** You're using the wrong make, presumably because your path is wrong. You should put the Rtools/bin directory first on your path, but you have a Borland make ahead of it. Duncan Murdoch The error is not in my Fortran file, because I also tried other files or even without any arguments (it gives the same error msg regardless). System: Windows Vista R 2.10.1 Rtools installed (version 2.11) thanks, Remko - Remko Duursma Research Lecturer Centre for Plants and the Environment University of Western Sydney Hawkesbury Campus Richmond NSW 2753 Dept of Biological Science Macquarie University North Ryde NSW 2109 Australia Mobile: +61 (0)422 096908 www.remkoduursma.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error when checking a package.
On 30/03/2010 1:59 AM, Jim Lemon wrote: On 03/30/2010 04:39 PM, Dong H. Oh wrote: ... * checking R code for possible problems ... NOTE Found possibly global 'T' or 'F' in the following function: ar.dual.dea Error in ar.dual.dea(ar.dat, noutput = 1, orientation = 1, rts = 1, ar.l = matrix(c(0, : F used instead of FALSE Execution halted Hi Dong-hyun, It looks like the R core team is getting serious about the TRUE/FALSE business. I would suggest that you replace all occurrences of T or F in your code with TRUE and FALSE respectively and see what happens. That test has been around at least since 2003: it applies in package testing, not to people typing in the console. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] library(): load library from a specified location
Dear list memmbers, I would like to load a R library from a specified folder with library() and need help on how to call the command. The reason is that I am loading this library on a remote machine where I have no admin rights. Furthermore a library with the same name is already installed on that machine. I have modified this library slightly by modifying the source code and created a personal version of this library that I now want to load instead of the standart one. Running: R CMD INSTALL 'path*/*packagename* --library=*path*/Software/R-packages i managed to compile my modified version on the remote machine and save the library in a folder on that machine. When I now start R and run library(packagename) R still seem to load the version installed on the remote computer, even though i added my folder to the library path of R by running: .libPaths(*path*/Software/R-Packages) Probably this is due to the fact that the package is also available in the standart library of R. Ist there any way of loading the Package from only one specified path? I read the help for library() and could imagine that lib.loc could be the key to sucess but am not sure which argument it needs? Thanks a lot Jannis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error singular gradient matrix at initial parameter estimates in nls
I am using nls to fit a non linear function to some data. The non linear function is: y= 1- exp(-(k0+k1*p1+ + kn*pn)) I have chosen algorithm port, with lower boundary is 0 for all of the ki parameters, and I have tried many start values for the parameters ki (including generating them at random). If I fit the non linear function to the same data using an external algorithm, it fits perfectly and finds the parameters. As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64 bit), I keep getting the error: Error in nlsModel(formula, mf, start, wts, upper) : singular gradient matrix at initial parameter estimates I have read all the previous postings and the documentation, but to no avail: the error is there to stay. I am sure the problem is with nls, because the external fitting algorithm perfectly fits it in less than a second. Also, if my n is 4, then the nls works perfectly (but that excludes all the k5 kn). Can anyone help me with suggestions? Thanks in advance. Alternatively, what do you suggest I should do? Shall I abandon nls in favour of optim? Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] library(): load library from a specified location
On 30/03/2010 7:01 AM, Jannis wrote: Dear list memmbers, I would like to load a R library from a specified folder with library() and need help on how to call the command. The reason is that I am loading this library on a remote machine where I have no admin rights. Furthermore a library with the same name is already installed on that machine. I have modified this library slightly by modifying the source code and created a personal version of this library that I now want to load instead of the standart one. Running: R CMD INSTALL 'path*/*packagename* --library=*path*/Software/R-packages i managed to compile my modified version on the remote machine and save the library in a folder on that machine. When I now start R and run library(packagename) R still seem to load the version installed on the remote computer, even though i added my folder to the library path of R by running: .libPaths(*path*/Software/R-Packages) Probably this is due to the fact that the package is also available in the standart library of R. Ist there any way of loading the Package from only one specified path? I read the help for library() and could imagine that lib.loc could be the key to sucess but am not sure which argument it needs? Is the package loaded before you make the change to .libPaths? The base packages are loaded at startup, but this can be suppressed: see ?Startup. If that's not it, then I think we need more specific information, because what you're doing should work. Show us the result of sessionInfo() .libPaths(*path*/Software/R-Packages) .libPaths() library(packagename) sessionInfo() Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error singular gradient matrix at initial parameter estimates in nls
You could try method=brute-force in the nls2 package to find starting values. On Tue, Mar 30, 2010 at 7:03 AM, Corrado ct...@york.ac.uk wrote: I am using nls to fit a non linear function to some data. The non linear function is: y= 1- exp(-(k0+k1*p1+ + kn*pn)) I have chosen algorithm port, with lower boundary is 0 for all of the ki parameters, and I have tried many start values for the parameters ki (including generating them at random). If I fit the non linear function to the same data using an external algorithm, it fits perfectly and finds the parameters. As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64 bit), I keep getting the error: Error in nlsModel(formula, mf, start, wts, upper) : singular gradient matrix at initial parameter estimates I have read all the previous postings and the documentation, but to no avail: the error is there to stay. I am sure the problem is with nls, because the external fitting algorithm perfectly fits it in less than a second. Also, if my n is 4, then the nls works perfectly (but that excludes all the k5 kn). Can anyone help me with suggestions? Thanks in advance. Alternatively, what do you suggest I should do? Shall I abandon nls in favour of optim? Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to recode variables using base R
Hi, Is there an efficient way recoding variables in a data.frame using base R? My purpose is to create new variables and attach them into old data.frame. The basic idea is shown below, but how to create recoding for A, B and C and assing them into new variables? df - data.frame(A = c(1:5), B = c(3,6,2,8,10), C = c(0,15,5,9,12)) df$A[df$A = 3] - x df$A[df$A 3 df$A = 8] - y df$A[df$A = 16] - z Thanks, -J __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] library(): load library from a specified location
Sorry folks! My way worked already! I was just too blind to realize. Treat this post as solved. Anybody trying to achieve the same as me is adviced to try the way I described in my earlier post! And thanks a lot for the advice I already recievd. Cheers Jannis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to recode variables using base R
Dear Johannes, You can use cascading ifelse()s: df$A - with(df, ifelse(A = 3, x, ifelse(A 3 A = 8, y, z))) df$A [1] x x x y y This command assumes that you want all values that don't map into xs and ys to be zs, but you could adapt it if that's not what you want (and no values in your example become zs anyway). I hope this helps, John John Fox Senator William McMaster Professor of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of johannes rara Sent: March-30-10 7:31 AM To: r-help@r-project.org Subject: [R] How to recode variables using base R Hi, Is there an efficient way recoding variables in a data.frame using base R? My purpose is to create new variables and attach them into old data.frame. The basic idea is shown below, but how to create recoding for A, B and C and assing them into new variables? df - data.frame(A = c(1:5), B = c(3,6,2,8,10), C = c(0,15,5,9,12)) df$A[df$A = 3] - x df$A[df$A 3 df$A = 8] - y df$A[df$A = 16] - z Thanks, -J __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to recode variables using base R
You could try this also: cut(df$A, c(-Inf, 3, 8), labels = c('x', 'y')) On Tue, Mar 30, 2010 at 8:30 AM, johannes rara johannesr...@gmail.com wrote: Hi, Is there an efficient way recoding variables in a data.frame using base R? My purpose is to create new variables and attach them into old data.frame. The basic idea is shown below, but how to create recoding for A, B and C and assing them into new variables? df - data.frame(A = c(1:5), B = c(3,6,2,8,10), C = c(0,15,5,9,12)) df$A[df$A = 3] - x df$A[df$A 3 df$A = 8] - y df$A[df$A = 16] - z Thanks, -J __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Paik, et al., NEJM, 2004, Fig. 4, rate of event at 10 years as a function of covariate
Does anyone know how to make a plot like Fig. 4 of Paik, et al., New England Journal of Medicine, Dec. 30, 2004? Given survival data and a covariate, they plot a curve giving Rate of Distant Recurrence at 10 Yr (% of patients) on the y-axis versus the covariate on the x-axis. They also plot curves giving a 95% confidence interval. Thanks very much. -Ben The information in this e-mail is intended only for the ...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Finding positions in array
Hello, I need a function to check what positions of the array are greater than y and return to positions in another array z. x-array(E(gaux)$weight) x [1] 3 8 10 6 If y = 7 z [1] 2 3 Thanks a lot! Romild [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error bars
Iasonas, In response to PS.1 try error.bars in the psych package. Bill At 1:04 AM -0700 3/30/10, Iasonas Lamprianou wrote: Dear friends, I have a statistical question. Sometimes, if I compare boys to girls on a specific variable, the error bars (confidence interval of means) seem to overlap slightly. Still, when I run a t-test, I find statistically significant differences. The rule is clear: if the confidence intervals do not overlap, then there is statistically significant difference. But if they overlap slightly, we have to use a t-test to know for sure if the the two means differ significantly. The point is: is there a rule of thumb to say, for example, if the overlap is less than 20% of the length of the standard error, then a t-test would give significant results? thank you for your time P.S.1 is there an easy way to plot error bars in R? P.S.2 an interesting discussion about this - highly recommended to read it - can be found at http://scienceblogs.com/cognitivedaily/2007/03/ill_bet_you_dont_understand_er.php jason Dr. Iasonas Lamprianou Assistant Professor (Educational Research and Evaluation) Department of Education Sciences European University-Cyprus P.O. Box 22006 1516 Nicosia Cyprus Tel.: +357-22-713178 Fax: +357-22-590539 Honorary Research Fellow Department of Education The University of Manchester Oxford Road, Manchester M13 9PL, UK Tel. 0044 161 275 3485 iasonas.lampria...@manchester.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- William Revelle http://revelle.net/revelle.html 2815 Lakeside Court http://revelle.net/lakeside Evanston, Illinois It is 6 minutes to midnight http://www.thebulletin.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error bars
thank you Dr. Iasonas Lamprianou Assistant Professor (Educational Research and Evaluation) Department of Education Sciences European University-Cyprus P.O. Box 22006 1516 Nicosia Cyprus Tel.: +357-22-713178 Fax: +357-22-590539 Honorary Research Fellow Department of Education The University of Manchester Oxford Road, Manchester M13 9PL, UK Tel. 0044 161 275 3485 iasonas.lampria...@manchester.ac.uk --- On Tue, 30/3/10, William Revelle li...@revelle.net wrote: From: William Revelle li...@revelle.net Subject: Re: [R] error bars To: Iasonas Lamprianou lampria...@yahoo.com, r-help@r-project.org Date: Tuesday, 30 March, 2010, 13:56 Iasonas, In response to PS.1 try error.bars in the psych package. Bill At 1:04 AM -0700 3/30/10, Iasonas Lamprianou wrote: Dear friends, I have a statistical question. Sometimes, if I compare boys to girls on a specific variable, the error bars (confidence interval of means) seem to overlap slightly. Still, when I run a t-test, I find statistically significant differences. The rule is clear: if the confidence intervals do not overlap, then there is statistically significant difference. But if they overlap slightly, we have to use a t-test to know for sure if the the two means differ significantly. The point is: is there a rule of thumb to say, for example, if the overlap is less than 20% of the length of the standard error, then a t-test would give significant results? thank you for your time P.S.1 is there an easy way to plot error bars in R? P.S.2 an interesting discussion about this - highly recommended to read it - can be found at http://scienceblogs.com/cognitivedaily/2007/03/ill_bet_you_dont_understand_er.php jason Dr. Iasonas Lamprianou Assistant Professor (Educational Research and Evaluation) Department of Education Sciences European University-Cyprus P.O. Box 22006 1516 Nicosia Cyprus Tel.: +357-22-713178 Fax: +357-22-590539 Honorary Research Fellow Department of Education The University of Manchester Oxford Road, Manchester M13 9PL, UK Tel. 0044 161 275 3485 iasonas.lampria...@manchester.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- William Revelle http://revelle.net/revelle.html 2815 Lakeside Court http://revelle.net/lakeside Evanston, Illinois It is 6 minutes to midnight http://www.thebulletin.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding positions in array
Try: which(x y) On Tue, Mar 30, 2010 at 9:54 AM, Romildo Martins romildo.mart...@gmail.com wrote: Hello, I need a function to check what positions of the array are greater than y and return to positions in another array z. x-array(E(gaux)$weight) x [1] 3 8 10 6 If y = 7 z [1] 2 3 Thanks a lot! Romild [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to recode variables using base R
Thanks John and Henrique, my intention is to do this for A, B and C (all at once), so I'll have to wrap your solution into lapply or for loop? -J 2010/3/30 Henrique Dallazuanna www...@gmail.com: You could try this also: cut(df$A, c(-Inf, 3, 8), labels = c('x', 'y')) On Tue, Mar 30, 2010 at 8:30 AM, johannes rara johannesr...@gmail.com wrote: Hi, Is there an efficient way recoding variables in a data.frame using base R? My purpose is to create new variables and attach them into old data.frame. The basic idea is shown below, but how to create recoding for A, B and C and assing them into new variables? df - data.frame(A = c(1:5), B = c(3,6,2,8,10), C = c(0,15,5,9,12)) df$A[df$A = 3] - x df$A[df$A 3 df$A = 8] - y df$A[df$A = 16] - z Thanks, -J __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] simple loop iteration
Hi R mailing list, probably a very basic problem here, I try to do the following: Q-c(1,2,3) P-c(4,5,6) A- data.frame(Q,P) A Q P 1 1 4 2 2 5 3 3 6 this is my simplified data.frame (matrix) now I try to create following loop for subtraction of element within the data.frame: for(i in length(A[,P]-1){ delta[i]- A[i,P]-A[i+1,P] } All I get is a vector of the correct length but with no readings. Thanks for any help on this. -- Niklaus Hürlimann Université de Lausanne Institut de Minéralogie et Géochimie L'Anthropole CH-1015 Lausanne Suisse E-mail: niklaus.hurlim...@unil.ch Tel:+41(0)21 692 4452 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Paik, et al., NEJM, 2004, Fig. 4, rate of event at 10 years as a function of covariate
Wittner, Ben, Ph.D. wrote: Does anyone know how to make a plot like Fig. 4 of Paik, et al., New England Journal of Medicine, Dec. 30, 2004? Given survival data and a covariate, they plot a curve giving Rate of Distant Recurrence at 10 Yr (% of patients) on the y-axis versus the covariate on the x-axis. They also plot curves giving a 95% confidence interval. Thanks very much. -Ben Such a plot is easy to do with the rms package if using a Cox or accelerated failure time model, e.g. require(rms) dd - datadist(mydata); options(datadist='dd') f - cph(Surv(rtime, event) ~ rcs(covariate,4) + sex + ..., x=TRUE, y=TRUE) # restricted cubic spline with 4 knots plot(Predict(f, covariate, sex, time=10)) # separate curves for male and female; omit sex to make one curve; add age=50 to predict for a 50 year old -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to recode variables using base R
Using lapply: as.data.frame(lapply(df, cut, breaks = c(-Inf, 3, 8, 16), labels = c('x', 'y', 'z'))) On Tue, Mar 30, 2010 at 10:14 AM, johannes rara johannesr...@gmail.com wrote: Thanks John and Henrique, my intention is to do this for A, B and C (all at once), so I'll have to wrap your solution into lapply or for loop? -J 2010/3/30 Henrique Dallazuanna www...@gmail.com: You could try this also: cut(df$A, c(-Inf, 3, 8), labels = c('x', 'y')) On Tue, Mar 30, 2010 at 8:30 AM, johannes rara johannesr...@gmail.com wrote: Hi, Is there an efficient way recoding variables in a data.frame using base R? My purpose is to create new variables and attach them into old data.frame. The basic idea is shown below, but how to create recoding for A, B and C and assing them into new variables? df - data.frame(A = c(1:5), B = c(3,6,2,8,10), C = c(0,15,5,9,12)) df$A[df$A = 3] - x df$A[df$A 3 df$A = 8] - y df$A[df$A = 16] - z Thanks, -J __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple loop iteration
Try this: Reduce(-, as.data.frame(embed(A$P, 2))) On Tue, Mar 30, 2010 at 10:15 AM, Niklaus Hurlimann niklaus.hurlim...@unil.ch wrote: Hi R mailing list, probably a very basic problem here, I try to do the following: Q-c(1,2,3) P-c(4,5,6) A- data.frame(Q,P) A Q P 1 1 4 2 2 5 3 3 6 this is my simplified data.frame (matrix) now I try to create following loop for subtraction of element within the data.frame: for(i in length(A[,P]-1){ delta[i]- A[i,P]-A[i+1,P] } All I get is a vector of the correct length but with no readings. Thanks for any help on this. -- Niklaus Hürlimann Université de Lausanne Institut de Minéralogie et Géochimie L'Anthropole CH-1015 Lausanne Suisse E-mail: niklaus.hurlim...@unil.ch Tel:+41(0)21 692 4452 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Competing with SPSS and SAS: improving code that loops throughrows (data manipulation)
Below is the script that was based on your help - it works very fast: ### Creating the example data set: set.seed(123) MyData-data.frame(group=c(rep(first,10),rep(second,10)),a=abs(round(rnorm(20,mean=0, sd=.55),2)), b=abs(round(rnorm(20,mean=0, sd=.55),2))) MyData ### Specifying parameters used in the code below: vars-names(MyData)[2:3] # names of variables to be transformed nr.vars-length(vars) # number of variables to be transformed group.var-names(MyData)[1] # name of the grouping variable ### For EACH subgroup: indexing variables a and b to their maximum in that subgroup; ### These indexed variables will be used to build the new ones: system.time({ temp - cbind(MyData, do.call(cbind, lapply(vars, function(x){#x-b unlist(by(MyData, MyData[[group.var]], function(y) y[,x] / max(y[,x]))) }))) colnames(temp)[(length(MyData)+1):(length(MyData)+nr.vars)] - paste(vars, 'IndToMax', sep = '.') }) # Grabbing names of the newly created variables that end with IndToMax indexed.vars-names(temp)[grep(IndToMax$, names(temp))] # variables indexed to subgroup max # Specifying parameters used for transformation below: old.length-length(temp) hl-c(.3,.6,1:5) hrf-seq(.15,.90,.15) ### Actual Transformation: library(fortunes) # will use function Reduce from the package fortunes system.time({ constants - expand.grid(vars = indexed.vars, HL = hl, HRF = hrf) results - lapply(seq(nrow(constants)), function(x){ dat - temp[, as.character(constants[x, 1])] D - exp(log(0.5) / constants[x, 2]) L - -10 * log(1 - constants[x, 3]) unlist(by(dat, temp[[group.var]], function(y) # function Reduce Reduce(function(u, v) 1 - ((1 - u * D) / (exp(v * L))), y, accumulate = T, init = 0)[-1])) }) final - cbind(temp, do.call(cbind, results)) colnames(final)[-(1:old.length)] - paste(vars, constants$HL, 100*constants$HRF, '.transformed', sep = '.') }) Thanks again for all your help! Dimitri On Mon, Mar 29, 2010 at 4:16 PM, Dimitri Liakhovitski ld7...@gmail.com wrote: Would like to thank every one once more for your great help. I was able to reduce the time from god knows how many hours to about 2 minutes! Really appreciate it! Dimitri On Sat, Mar 27, 2010 at 11:43 AM, Martin Morgan mtmor...@fhcrc.org wrote: On 03/26/2010 06:40 PM, Dimitri Liakhovitski wrote: My sincere apologies if it looked large. Let me try again with less code. It's hard to do less than that. In fact - there is nothing in this code but 1 formula and many loops, which is the problem I am not sure how to solve. I also tried to be as clear as possible with the comments. Dimitri ## START OF THE CODE TO PRODUCE SMALL DATA EXAMPLE set.seed(123) data-data.frame(group=c(rep(first,10),rep(second,10)),a=abs(round(rnorm(20,mean=0, sd=.55),2)), b=abs(round(rnorm(20,mean=0, sd=.55),2))) data # data it is the data frame to work with ## END OF THE CODE TO PRODUCE SMALL DATA EXAMPLE. In real life data would contain up to 150-200 rows PER SUBGROUP ### Specifying useful parameters used in the slow code below: vars-names(data)[2:3] # names of variables used in transformation; in real life - up to 50-60 variables group.var-names(data)[1] # name of the grouping variable subgroups-levels(data[[group.var]]) # names of subgroups; in real life - up to 30 subgroups # OBJECTIVE: # Need to create new variables based on the old ones (a b) # For each new variable, the value in a given row is a function of (a) 2 constants (that have several levels each), # (b) value of the original variable (e.g., a.ind.to.max), and the value in the previous row on the same new variable # Plus - it has to be done by subgroup (variable group) # Defining 2 constants: constant1-c(1:3) # constant 1 used in transformation - has 3 levels, in real life - up to 7 levels constant2-seq(.15,.45,.15) # constant 2 used in transformation - has 3 levels, in real life - up to 7 levels ### CODE THAT IS SLOW. Reason - too many loops with the inner-most loop being very slow - as it is looping through rows: for(var in vars){ # looping through variables for(c1 in 1:length(constant1)){ # looping through values of constant1 for(c2 in 1:length(constant2)){ # looping through values of constant2 d=log(0.5)/constant1[c1] l=-log(1-constant2[c2]) name-paste(var,constant1[c1],constant2[c2]*100,.transf,sep=.) data[[name]]-NA for(subgroup in subgroups){ # looping through subgroups data[data[[group.var]] %in% subgroup, name][1] = 1-((1-0*exp(1)^d)/(exp(1)^(data[data[[group.var]] %in% subgroup, var][1]*l*10))) ### THIS SECTION IS THE SLOWEST - BECAUSE I AM LOOPING THROUGH ROWS: for(case in 2:nrow(data[data[[group.var]] %in% subgroup, ])){ # looping through rows data[data[[group.var]] %in% subgroup, name][case]=
Re: [R] Reshaping a data frame with a series of factors and 23 repeated measures
Ista, I have looked at the reshape package and have used Âmelt¹ successfully on simpler tables. I tried it here, but have not been successful. I think I just need to gain experience. I am loving R and am having a difficult time with data structure issues. I am attaching the data set that I am trying to manipulate. Ultimately, I would like to be able to analyze these data with ANOVA and repeated measures, and also be able to plot growth, wt * days. I appreciate any help or guidance on references to read that will help me solve my problems. The data set represents wt over time (starting with days=0, birth day) for steers. Factors include Stockering and Finishing treatments. Regards, Bill On 3/29/10 3:39 PM, Ista Zahn [via R] ml-node+1695531-1043721504-210...@n4.nabble.com wrote: Hi Bill, Without an example dataset it's hard to see exactly what you need to do. But you can get started by looking at the documentation for the reshape function (?reshape), and by looking at the reshape package. The reshape package has an associated web page (http://had.co.nz/reshape/) with links to papers and other information to help you get started. Best, Ista On Mon, Mar 29, 2010 at 3:15 PM, wclapham [hidden email] http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1695531i=0 http://n4.nabble.com/user/SendEmail.jtp?type=nodeamp;node=1695531amp;i=0 wrote: I have a data frame that I created using read.table on a csv spreadsheet. The data look like the following: Steer.ID  stocker.trt  Finish.trt  Date  Days Wt .. Steer.Id, stocker.trt, Finish.trt are factors-- Date, Days, Wt are data that are repeated 23 times (wide format). I want to reshape the data such that I have the correct Steer.ID, stocker.trt, Finish.trt identifying all of the repeated measures data in a long  format. I am a newbie at R and need to develop the skill in reshaping data, so that I can handle routine problems like described above. Thanks so much in advance for help or advice. Bill -- View this message in context: http://n4.nabble.com/Reshaping-a-data-frame-with-a-series-of-factors-and-23-r epeated-measures-tp1695500p1695500.html Sent from the R help mailing list archive at Nabble.com. __ [hidden email] http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1695531i=1 http://n4.nabble.com/user/SendEmail.jtp?type=nodeamp;node=1695531amp;i=1 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://n4.nabble.com/Reshaping-a-data-frame-with-a-series-of-factors-and-23-repeated-measures-tp1695500p1745223.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Easy to use R interface into Macroeconomics data at the Fed, the IMF and Eurostat
Dear R Users, Does anyone know if there is an easy to use interface for the macroeconomics databases of the Fed, Eurostat and the IMF ? Thanks in advance, Tolga Uzuner This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple loop iteration
Perhaps you're just looking for the diff() function? See ?diff. -Peter Ehlers On 2010-03-30 7:15, Niklaus Hurlimann wrote: Hi R mailing list, probably a very basic problem here, I try to do the following: Q-c(1,2,3) P-c(4,5,6) A- data.frame(Q,P) A Q P 1 1 4 2 2 5 3 3 6 this is my simplified data.frame (matrix) now I try to create following loop for subtraction of element within the data.frame: for(i in length(A[,P]-1){ delta[i]- A[i,P]-A[i+1,P] } All I get is a vector of the correct length but with no readings. Thanks for any help on this. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reshaping a data frame with a series of factors and 23 repeated measures
Try this: cattle - read.csv(http://n4.nabble.com/attachment/1745223/0/CattleGrowth.csv;) long - reshape(cattle, dir = long, idvar = Steer.ID, varying = list(grep(Date, names(cattle)), grep(Days, names(cattle)), grep(Wt,names(cattle On Tue, Mar 30, 2010 at 9:34 AM, wclapham william.clap...@ars.usda.gov wrote: Ista, I have looked at the reshape package and have used Œmelt¹ successfully on simpler tables. I tried it here, but have not been successful. I think I just need to gain experience. I am loving R and am having a difficult time with data structure issues. I am attaching the data set that I am trying to manipulate. Ultimately, I would like to be able to analyze these data with ANOVA and repeated measures, and also be able to plot growth, wt * days. I appreciate any help or guidance on references to read that will help me solve my problems. The data set represents wt over time (starting with days=0, birth day) for steers. Factors include Stockering and Finishing treatments. Regards, Bill On 3/29/10 3:39 PM, Ista Zahn [via R] ml-node+1695531-1043721504-210...@n4.nabble.com wrote: Hi Bill, Without an example dataset it's hard to see exactly what you need to do. But you can get started by looking at the documentation for the reshape function (?reshape), and by looking at the reshape package. The reshape package has an associated web page (http://had.co.nz/reshape/) with links to papers and other information to help you get started. Best, Ista On Mon, Mar 29, 2010 at 3:15 PM, wclapham [hidden email] http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1695531i=0 http://n4.nabble.com/user/SendEmail.jtp?type=nodeamp;node=1695531amp;i=0 wrote: I have a data frame that I created using read.table on a csv spreadsheet. The data look like the following: Steer.ID stocker.trt Finish.trt Date Days Wt .. Steer.Id, stocker.trt, Finish.trt are factors-- Date, Days, Wt are data that are repeated 23 times (wide format). I want to reshape the data such that I have the correct Steer.ID, stocker.trt, Finish.trt identifying all of the repeated measures data in a long format. I am a newbie at R and need to develop the skill in reshaping data, so that I can handle routine problems like described above. Thanks so much in advance for help or advice. Bill -- View this message in context: http://n4.nabble.com/Reshaping-a-data-frame-with-a-series-of-factors-and-23-r epeated-measures-tp1695500p1695500.html Sent from the R help mailing list archive at Nabble.com. __ [hidden email] http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1695531i=1 http://n4.nabble.com/user/SendEmail.jtp?type=nodeamp;node=1695531amp;i=1 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://n4.nabble.com/Reshaping-a-data-frame-with-a-series-of-factors-and-23-repeated-measures-tp1695500p1745223.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multivariate hypergeometric distribution version of phyper()
Dear R Users, I employed the phyper() function to estimate the likelihood that the number of genes overlapping between 2 different lists of genes is due to chance. This appears to work appropriately. Now i want to try this with 3 lists of genes which phyper() does not appear to support. Some googling suggests i can utilize the Multivariate hypergeometric distribution to achieve this. eg.: http://en.wikipedia.org/wiki/Hypergeometric_distribution But when i try to do this manually using the choose() function (see attempt below example with just two gene lists) i'm unable to perform the calculations- the numbers hit infinity before getting an answer. Searching cran archives for Multivariate hypergeometric show this term in the vignettes of package's ‘combinat’ and ‘forward’. But i'm unable to make sense of the these pachakege functions in the context of my aforementioned apllication. Can some one suggest a function, script or method to achieve my goal of estimating the likelyhood of overlap between 3 lists of genes, ideally using the multivariate hypergeometric, or anything else for that matter? cheers in advance, Karl #example attempt with two gene lists m n N - 45101 # total number balls in urn m - 720 # number of 'white' or 'special' balls in urn, aka 'success' n - 801 # number balls drawn or number of samples k - 40# number of 'white' or 'special' balls DRAWN a - choose(m,k) b - choose((N-m),(n-k)) z - choose(N,n) prK - (a*b)/z #'the answer' print(prK) [1] NaN a [1] 7.985852e+65 b [1] Inf z [1] Inf -- Karl Brand Department of Genetics Erasmus MC Dr Molewaterplein 50 3015 GE Rotterdam T +31 (0)10 704 3457 | F +31 (0)10 704 4743 | M +31 (0)642 777 268 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calling R from Perl
Hi all, I am interested to know that how it is possible to call R from Perl. I would like to read the file in Perl, store it in a data structure and would like to pass the data structure to R so that I can do the mathematical operations easily. Thanks. -- Regards, Ayush Raman [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Substitute of a For Loop
Hi, I am trying to permute a vector for 1000 times for which I am using for loop.Within the for loop, I am doing some matrix operations which is taking a lot of time. I am looking for a way to permute the vector 1000 times and do the operations of the matrix without using for loop. This a snippet of my code: for (i in 2:1000){ y.permute = permute(y.permute) ### permute the vector F.stats = calPseudoStat(y.permute,table.Gij) ## call the function which does some matrix calculation and calculates a pseudo statistics F.stats.vec = append(F.stats.vec, F.stats) ## add the Pseudo Statistics in a vector. } Thanks. -- Regards, Ayush Raman [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] number of clusters for k-means
I am currently working on a clustering project and would like to obtain statistics for the number of clusters to include. In SAS you get a pseudo-F statistic and a cubic clustering criterion. Has anyone developed a function to get these values? Sean __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use logical in cor.test
On 2010-03-30 2:41, pgseye wrote: Thanks for the replies. In response to Erik: What does Both[,1] show you? Both[,1] [1] 3.36 NA NA NA NA NA NA 3.92 3.50 NA NA NA NA 3.76 3.19 3.83 NA 3.66.. What does Both[,1] 2.5 show you? Both[,1]2.5 [1] TRUENANANANANANA TRUE TRUENANA NANA TRUE TRUE I understand a logical variable is binary, but don't know how to select a subset of the data (have tried the subset function, but can't seem to get it to work) Bill, when I run what you suggested, I get: tBoth- Both is.na(tBoth[tBoth 2.5])- TRUE Error in is.na(tBoth[tBoth 2.5])- TRUE : NAs are not allowed in subscripted assignments R- cor(tBoth, use = complete.obs) R[1,2] [1] 0.7750889 Any idea with the error message? This happens because your 'Both' already has missing values. You can replace the line is.na(tBoth[tBoth 2.5]) - TRUE with tBoth[tBoth 2.5] - NA and the rest should work. Thanks again, Paul -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] BaselR
Dear Swiss R Users, The Basel R meeting has moved to Wed, Apr 28 based on user feedback. We now have a lineup of speakers: * Andreas Krause, Actelion Pharmaceuticals Ltd., on Graphics of Clinical Data * Yann Abraham, Novartis Pharma AG, on Graphics with ggplot2 * Charles Roosen, Mango Solutions AG, on Web-based R Reporting I'm pretty excited about the first two presentations myself. I'm also looking forward to seeing old friends and meeting new ones. Details are on the new Basel R web site at: http://www.baselr.org/ Warm regards, Charlie Roosen croo...@mango-solutions.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Sarah Lewis Sent: 26 March 2010 17:20 To: r-help@r-project.org Subject: [R] BaselR BaselR - The new R meeting We are pleased to announce the new R meeting to be held in Basel, Switzerland. BaselR will be held from 6:30-9:30pm on Tues, Apr 27 at TransBARent: http://transbarent.business.sv-group.ch Doors open at 6:30,pm with the presentations starting at 7:00pm Introduction: What is Basel R? Andreas Krause:... Graphing Pharma Data Yann Abraham: Graphics Charles Roosen: Web based R reporting (This agenda is yet to be finalised. We will notify any changes) For further information or to register, please contact: bas...@mango-solutions.com Please also visit - www.mango-solutions.com and www.londonr.org Sarah Lewis mangosolutions T: +44 (0)1249 767700 F: +44 (0)1249 767707 Unit 2 Greenways Business Park Bellinger Close Chippenham Wilts SN15 1BN UK LEGAL NOTICE\ \ This message is intended for the use of{{dropped:19}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calling R from Perl
One way to do this is with Rscript. If you want Perl because it can handle cgi, then CGIwithR is also useful. You can call Rscript from Perl, but you can also do the reverse (with system()) and have your Rscript be the main routine. Jon On 03/30/10 10:19, Ayush Raman wrote: Hi all, I am interested to know that how it is possible to call R from Perl. I would like to read the file in Perl, store it in a data structure and would like to pass the data structure to R so that I can do the mathematical operations easily. Thanks. -- Regards, Ayush Raman -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
Dear friends, I am testing glm as at page 514/515 of THE R BOOK by M.Crawley, that is on proportion data. I use glm(y~x1+,family=binomial) y is a proportion in (0,1), and x is a real number. I get the error: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! But that is exactly what was suggested in the book, where there is no mention of a similar warning. Where am I going wrong? Here is the output: glm(response.prepared~x,data=,family=binomial) Call: glm(formula = response.prepared ~ x, family = binomial, data = ) Coefficients: (Intercept)x -0.3603 0.4480 Degrees of Freedom: 510554 Total (i.e. Null); 510553 Residual Null Deviance: 24420 Residual Deviance: 23240AIC: 700700 Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error singular gradient matrix at initial parameter
Hi Gabor, same problem even using nls2 with method=brute-force to calculate the initial parameters. Best, Gabor Grothendieck wrote: You could try method=brute-force in the nls2 package to find starting values. On Tue, Mar 30, 2010 at 7:03 AM, Corrado ct...@york.ac.uk wrote: I am using nls to fit a non linear function to some data. The non linear function is: y= 1- exp(-(k0+k1*p1+ + kn*pn)) I have chosen algorithm port, with lower boundary is 0 for all of the ki parameters, and I have tried many start values for the parameters ki (including generating them at random). If I fit the non linear function to the same data using an external algorithm, it fits perfectly and finds the parameters. As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64 bit), I keep getting the error: Error in nlsModel(formula, mf, start, wts, upper) : singular gradient matrix at initial parameter estimates I have read all the previous postings and the documentation, but to no avail: the error is there to stay. I am sure the problem is with nls, because the external fitting algorithm perfectly fits it in less than a second. Also, if my n is 4, then the nls works perfectly (but that excludes all the k5 kn). Can anyone help me with suggestions? Thanks in advance. Alternatively, what do you suggest I should do? Shall I abandon nls in favour of optim? Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Value-at-Risk Portfolio(both equity and option)
Hello All, I am working on the risk measures for a portfolio, which contain both equity futures, equity options and currency options. There are many packages related with the portoflio which only contain the equities,I wonder whether there is any avaible package that could include the option. Thank you. -- View this message in context: http://n4.nabble.com/Value-at-Risk-Portfolio-both-equity-and-option-tp1745179p1745179.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calling R functions into C# or C++
The zip file actually works fine for me. Anyhow, here is the code snippet that you need: using System; using System.Collections.Generic; using System.Linq; using System.Text; using Interop.STATCONNECTORSRVLib; namespace RFromCsharp { class RConnector { private StatConnectorClass rdcom = null; private string rcmd; public StatConnectorClass RConnection { get { return rdcom; } set { rdcom = value; } } private bool initR() { try { rdcom = new StatConnectorClass(); rdcom.Init(R); return true; } catch(Exception e) { string errmsg = R Init failed: + rdcom.GetErrorText() + Other: +e.Message.ToString(); Console.WriteLine(errmsg); return false; } } private bool loadR(string table, string filename, bool stripwhite, bool header, string separator) { try { rcmd = table.ToString() + -read.delim(' + filename.ToString() + ',strip.white= + stripwhite.ToString().ToUpper() + ,header= + header.ToString().ToUpper() + ,sep=' + separator.ToString() + '); rdcom.EvaluateNoReturn(rcmd); return true; } catch(Exception e) { string errmsg = rcmd.ToString() + + rdcom.GetErrorText() + Other: + e.Message.ToString(); Console.WriteLine(errmsg); return false; } } private bool closeR() { try { rcmd = graphics.off(); rdcom.EvaluateNoReturn(rcmd); rcmd = rm(list=ls(all=TRUE)); rdcom.EvaluateNoReturn(rcmd); rdcom.Close(); return true; } catch(Exception e) { string errmsg = R Close failed: + rdcom.GetErrorText() + Other: + e.Message.ToString(); Console.WriteLine(errmsg); return false; } } static void Main(string[] args) { RConnector conn = new RConnector(); // Initialize the instance to be used with R conn.initR(); // create an R variable named abc and assign it the value of 5 conn.RConnection.SetSymbol(abc, 5); // Retrieve the value of the R variable named abc and assign that value to the F# value valueForabc var valueForabc = conn.RConnection.GetSymbol(abc); // Evaluate an expression in R and assign that value to an F# value aTestEvaluation var aTestEvaluation = conn.RConnection.Evaluate(8 * sin(4)); // Close the R connection conn.closeR(); Console.BackgroundColor = ConsoleColor.Gray; Console.ForegroundColor = ConsoleColor.Blue; Console.WriteLine(Value of abc: + valueForabc); Console.WriteLine(Value of 8 * sin(4): + aTestEvaluation); Console.WriteLine(Press any key to continue ...); Console.ReadKey(); //- } } } You would also need to reference the Interop.STATCONNECTORSRVLib.dll assembly. Here is snapshot of my references list: http://n4.nabble.com/file/n1744914/RFromCsharpReferences.png Best regards, Fayssal El Moufatich -- View this message in context: http://n4.nabble.com/Calling-R-functions-into-C-or-C-tp904267p1744914.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error singular gradient matrix at initial parameter
Sorry, its algorithm=brute-force On Tue, Mar 30, 2010 at 10:29 AM, Corrado ct...@york.ac.uk wrote: Hi Gabor, same problem even using nls2 with method=brute-force to calculate the initial parameters. Best, Gabor Grothendieck wrote: You could try method=brute-force in the nls2 package to find starting values. On Tue, Mar 30, 2010 at 7:03 AM, Corrado ct...@york.ac.uk wrote: I am using nls to fit a non linear function to some data. The non linear function is: y= 1- exp(-(k0+k1*p1+ + kn*pn)) I have chosen algorithm port, with lower boundary is 0 for all of the ki parameters, and I have tried many start values for the parameters ki (including generating them at random). If I fit the non linear function to the same data using an external algorithm, it fits perfectly and finds the parameters. As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64 bit), I keep getting the error: Error in nlsModel(formula, mf, start, wts, upper) : singular gradient matrix at initial parameter estimates I have read all the previous postings and the documentation, but to no avail: the error is there to stay. I am sure the problem is with nls, because the external fitting algorithm perfectly fits it in less than a second. Also, if my n is 4, then the nls works perfectly (but that excludes all the k5 kn). Can anyone help me with suggestions? Thanks in advance. Alternatively, what do you suggest I should do? Shall I abandon nls in favour of optim? Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] update.packages() and install.packages() does not work more because of Error in read.dcf
Hi, on all my systems update.packages() and install.packages() fails now. I get the following message: r...@orca:/root(28)# R R version 2.10.1 (2009-12-14) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. update.packages(checkBuilt=T) --- Please select a CRAN mirror for use in this session --- Loading Tcl/Tk interface ... done Error in read.dcf(file = tmpf) : Line starting 'Li ...' is malformed! update.packages() Error in read.dcf(file = tmpf) : Line starting 'Li ...' is malformed! install.packages(e1071) Error in read.dcf(file = tmpf) : Line starting 'Li ...' is malformed! All systems are gentoo systems with R-2.10.1. Also reinstalling of R from sources did not solve the problem. Any hint is appreciated. Regards Juergen -- Juergen Rose r...@rz.uni-potsdam.de Uni-Potsdam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error singular gradient matrix at initial parameter
Yes, of course. The problem still stays. Gabor Grothendieck wrote: Sorry, its algorithm=brute-force On Tue, Mar 30, 2010 at 10:29 AM, Corrado ct...@york.ac.uk wrote: Hi Gabor, same problem even using nls2 with method=brute-force to calculate the initial parameters. Best, Gabor Grothendieck wrote: You could try method=brute-force in the nls2 package to find starting values. On Tue, Mar 30, 2010 at 7:03 AM, Corrado ct...@york.ac.uk wrote: I am using nls to fit a non linear function to some data. The non linear function is: y= 1- exp(-(k0+k1*p1+ + kn*pn)) I have chosen algorithm port, with lower boundary is 0 for all of the ki parameters, and I have tried many start values for the parameters ki (including generating them at random). If I fit the non linear function to the same data using an external algorithm, it fits perfectly and finds the parameters. As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64 bit), I keep getting the error: Error in nlsModel(formula, mf, start, wts, upper) : singular gradient matrix at initial parameter estimates I have read all the previous postings and the documentation, but to no avail: the error is there to stay. I am sure the problem is with nls, because the external fitting algorithm perfectly fits it in less than a second. Also, if my n is 4, then the nls works perfectly (but that excludes all the k5 kn). Can anyone help me with suggestions? Thanks in advance. Alternatively, what do you suggest I should do? Shall I abandon nls in favour of optim? Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reshaping a data frame with a series of factors and 23 repeated measures
Hi Bill, Here is a reshape package version. The key thing to notice is that you have multiple pieces of information in your original column names. These can be split out using the colsplit() function: # Read in the data cattle - read.csv(http://n4.nabble.com/attachment/1745223/0/CattleGrowth.csv;, colClasses=character) # make the naming scheme consistent names(cattle)[names(cattle) %in% c(Date, Days, Wt)] - c(Date.0, Days.0, Wt.0) # melt the data m.cattle - melt(cattle, id = c(Steer.ID, stocker.trt, Finish.trt)) # split out variable and time info m.cattle - as.data.frame(cbind(colsplit(m.cattle$variable, split=\\., names=c(Var, time)), m.cattle)) # get rid of the now-redundant variable column m.cattle$variable - NULL # cast the data to put variables back in the columns long.cattle - cast(m.cattle, ... ~ Var) Best, Ista On Tue, Mar 30, 2010 at 9:34 AM, wclapham william.clap...@ars.usda.gov wrote: Ista, I have looked at the reshape package and have used Œmelt¹ successfully on simpler tables. I tried it here, but have not been successful. I think I just need to gain experience. I am loving R and am having a difficult time with data structure issues. I am attaching the data set that I am trying to manipulate. Ultimately, I would like to be able to analyze these data with ANOVA and repeated measures, and also be able to plot growth, wt * days. I appreciate any help or guidance on references to read that will help me solve my problems. The data set represents wt over time (starting with days=0, birth day) for steers. Factors include Stockering and Finishing treatments. Regards, Bill On 3/29/10 3:39 PM, Ista Zahn [via R] ml-node+1695531-1043721504-210...@n4.nabble.com wrote: Hi Bill, Without an example dataset it's hard to see exactly what you need to do. But you can get started by looking at the documentation for the reshape function (?reshape), and by looking at the reshape package. The reshape package has an associated web page (http://had.co.nz/reshape/) with links to papers and other information to help you get started. Best, Ista On Mon, Mar 29, 2010 at 3:15 PM, wclapham [hidden email] http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1695531i=0 http://n4.nabble.com/user/SendEmail.jtp?type=nodeamp;node=1695531amp;i=0 wrote: I have a data frame that I created using read.table on a csv spreadsheet. The data look like the following: Steer.ID stocker.trt Finish.trt Date Days Wt .. Steer.Id, stocker.trt, Finish.trt are factors-- Date, Days, Wt are data that are repeated 23 times (wide format). I want to reshape the data such that I have the correct Steer.ID, stocker.trt, Finish.trt identifying all of the repeated measures data in a long format. I am a newbie at R and need to develop the skill in reshaping data, so that I can handle routine problems like described above. Thanks so much in advance for help or advice. Bill -- View this message in context: http://n4.nabble.com/Reshaping-a-data-frame-with-a-series-of-factors-and-23-r epeated-measures-tp1695500p1695500.html Sent from the R help mailing list archive at Nabble.com. __ [hidden email] http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1695531i=1 http://n4.nabble.com/user/SendEmail.jtp?type=nodeamp;node=1695531amp;i=1 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://n4.nabble.com/Reshaping-a-data-frame-with-a-series-of-factors-and-23-repeated-measures-tp1695500p1745223.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
A) It is not an error, only a warning. Wouldn't it seem reasonable to issue such a warning if you have data that violates the distributional assumptions? B) You did not include any of the data C) Wouldn't this be more appropriate to the author of the book if this is exactly what was suggested there? -- David, On Mar 30, 2010, at 10:51 AM, Corrado wrote: Dear friends, I am testing glm as at page 514/515 of THE R BOOK by M.Crawley, that is on proportion data. I use glm(y~x1+,family=binomial) y is a proportion in (0,1), and x is a real number. I get the error: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! But that is exactly what was suggested in the book, where there is no mention of a similar warning. Where am I going wrong? Here is the output: glm(response.prepared~x,data=,family=binomial) Call: glm(formula = response.prepared ~ x, family = binomial, data = ) Coefficients: (Intercept)x-0.3603 0.4480 Degrees of Freedom: 510554 Total (i.e. Null); 510553 Residual Null Deviance: 24420 Residual Deviance: 23240AIC: 700700 Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error singular gradient matrix at initial parameter
What do you mean the problem still stays? If you are using brute force its not a problem to have it fail on some of the evaluations since each one is separate. How large a grid are you using? Are you claiming that every single point on the grid fails? Please provide reproducible code showing what you are doing. On Tue, Mar 30, 2010 at 10:56 AM, Corrado ct...@york.ac.uk wrote: Yes, of course. The problem still stays. Gabor Grothendieck wrote: Sorry, its algorithm=brute-force On Tue, Mar 30, 2010 at 10:29 AM, Corrado ct...@york.ac.uk wrote: Hi Gabor, same problem even using nls2 with method=brute-force to calculate the initial parameters. Best, Gabor Grothendieck wrote: You could try method=brute-force in the nls2 package to find starting values. On Tue, Mar 30, 2010 at 7:03 AM, Corrado ct...@york.ac.uk wrote: I am using nls to fit a non linear function to some data. The non linear function is: y= 1- exp(-(k0+k1*p1+ + kn*pn)) I have chosen algorithm port, with lower boundary is 0 for all of the ki parameters, and I have tried many start values for the parameters ki (including generating them at random). If I fit the non linear function to the same data using an external algorithm, it fits perfectly and finds the parameters. As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64 bit), I keep getting the error: Error in nlsModel(formula, mf, start, wts, upper) : singular gradient matrix at initial parameter estimates I have read all the previous postings and the documentation, but to no avail: the error is there to stay. I am sure the problem is with nls, because the external fitting algorithm perfectly fits it in less than a second. Also, if my n is 4, then the nls works perfectly (but that excludes all the k5 kn). Can anyone help me with suggestions? Thanks in advance. Alternatively, what do you suggest I should do? Shall I abandon nls in favour of optim? Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Corrado Enviado el: martes, 30 de marzo de 2010 16:52 Para: r-help@r-project.org Asunto: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! Dear friends, I am testing glm as at page 514/515 of THE R BOOK by M.Crawley, that is on proportion data. I use glm(y~x1+,family=binomial) y is a proportion in (0,1), and x is a real number. I get the error: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! But that is exactly what was suggested in the book, where there is no mention of a similar warning. Where am I going wrong? Here is the output: glm(response.prepared~x,data=,family=binomial) Call: glm(formula = response.prepared ~ x, family = binomial, data = ) Coefficients: (Intercept)x -0.3603 0.4480 Degrees of Freedom: 510554 Total (i.e. Null); 510553 Residual Null Deviance: 24420 Residual Deviance: 23240AIC: 700700 Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk --- Probably you are misreading Crawley's Book? A proportion would usually be modeled with the Beta distribution, not the binomial, which is for counts. If you are modeling a proportion try the betareg function in betareg package. HTH Ruben Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Code is too slow: mean-centering variables in a data frame by subgroup
Dear R-ers, I have a large data frame (several thousands of rows and about 2.5 thousand columns). One variable (group) is a grouping variable with over 30 levels. And I have a lot of NAs. For each variable, I need to divide each value by variable mean - by subgroup. I have the code but it's way too slow - takes me about 1.5 hours. Below is a data example and my code that is too slow. Is there a different, faster way of doing the same thing? Thanks a lot for your advice! Dimitri # Building an example frame - with groups and a lot of NAs: set.seed(1234) frame-data.frame(group=rep(paste(group,1:10),10),a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100)) frame-frame[order(frame$group),] names.used-names(frame)[2:length(frame)] set.seed(1234) for(i in names.used){ i.for.NA-sample(1:100,60) frame[[i]][i.for.NA]-NA } frame ### Code that does what's needed but is too slow: Start-Sys.time() frame - do.call(cbind, lapply(names.used, function(x){ unlist(by(frame, frame$group, function(y) y[,x] / mean(y[,x],na.rm=T))) })) Finish-Sys.time() print(Finish-Start) # Takes too long -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
In a Binomial GLM, typically y is a factor with two levels (indicating success/failure) instead of a numeric vector on [0, 1]. Perhaps the description in the book is not so clear. You should interpret data on proportions as the observations from a Binomial distribution (rather than we observed some proportion data which fell in [0,1]). E.g. y=rbinom(10, size = 1, prob = .3); x=rnorm(y) # or y = factor(y) glm(y~x, family = binomial) Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: 515-294-6609 Web: http://yihui.name Department of Statistics, Iowa State University 3211 Snedecor Hall, Ames, IA On Tue, Mar 30, 2010 at 9:51 AM, Corrado ct...@york.ac.uk wrote: Dear friends, I am testing glm as at page 514/515 of THE R BOOK by M.Crawley, that is on proportion data. I use glm(y~x1+,family=binomial) y is a proportion in (0,1), and x is a real number. I get the error: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! But that is exactly what was suggested in the book, where there is no mention of a similar warning. Where am I going wrong? Here is the output: glm(response.prepared~x,data=,family=binomial) Call: glm(formula = response.prepared ~ x, family = binomial, data = ) Coefficients: (Intercept) x -0.3603 0.4480 Degrees of Freedom: 510554 Total (i.e. Null); 510553 Residual Null Deviance: 24420 Residual Deviance: 23240 AIC: 700700 Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
Dear David, David Winsemius wrote: A) It is not an error, only a warning. Wouldn't it seem reasonable to issue such a warning if you have data that violates the distributional assumptions? I am not questioning the approach. I am only trying to understand why a (rather expensive) source of documentation and the behaviour of a function are not aligned. B) You did not include any of the data Data attached as R object. C) Wouldn't this be more appropriate to the author of the book if this is exactly what was suggested there? I think it will be definitively appropriate, but only when I am certain I am not doing anything wrong. Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
Dear Ruben I am afraid not the paragraph's title is a bit of a give away: Proportion Data and Binomial Errors The sentence reads: are dealt with by using a generalised linear model with a binomial error structure. with the example: glm(y~x,family=binomial) You can check at page 514/515. Rubén Roa wrote: -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Corrado Enviado el: martes, 30 de marzo de 2010 16:52 Para: r-help@r-project.org Asunto: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! Dear friends, I am testing glm as at page 514/515 of THE R BOOK by M.Crawley, that is on proportion data. I use glm(y~x1+,family=binomial) y is a proportion in (0,1), and x is a real number. I get the error: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! But that is exactly what was suggested in the book, where there is no mention of a similar warning. Where am I going wrong? Here is the output: glm(response.prepared~x,data=,family=binomial) Call: glm(formula = response.prepared ~ x, family = binomial, data = ) Coefficients: (Intercept)x -0.3603 0.4480 Degrees of Freedom: 510554 Total (i.e. Null); 510553 Residual Null Deviance: 24420 Residual Deviance: 23240AIC: 700700 Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to recode variables using base R
Thanks, you're a lifesaver. -J 2010/3/30 Henrique Dallazuanna www...@gmail.com: Using lapply: as.data.frame(lapply(df, cut, breaks = c(-Inf, 3, 8, 16), labels = c('x', 'y', 'z'))) On Tue, Mar 30, 2010 at 10:14 AM, johannes rara johannesr...@gmail.com wrote: Thanks John and Henrique, my intention is to do this for A, B and C (all at once), so I'll have to wrap your solution into lapply or for loop? -J 2010/3/30 Henrique Dallazuanna www...@gmail.com: You could try this also: cut(df$A, c(-Inf, 3, 8), labels = c('x', 'y')) On Tue, Mar 30, 2010 at 8:30 AM, johannes rara johannesr...@gmail.com wrote: Hi, Is there an efficient way recoding variables in a data.frame using base R? My purpose is to create new variables and attach them into old data.frame. The basic idea is shown below, but how to create recoding for A, B and C and assing them into new variables? df - data.frame(A = c(1:5), B = c(3,6,2,8,10), C = c(0,15,5,9,12)) df$A[df$A = 3] - x df$A[df$A 3 df$A = 8] - y df$A[df$A = 16] - z Thanks, -J __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multivariate hypergeometric distribution version of phyper()
On Tue, 30 Mar 2010, Karl Brand wrote: Dear R Users, I employed the phyper() function to estimate the likelihood that the number of genes overlapping between 2 different lists of genes is due to chance. This appears to work appropriately. Now i want to try this with 3 lists of genes which phyper() does not appear to support. Some googling suggests i can utilize the Multivariate hypergeometric distribution to achieve this. eg.: http://en.wikipedia.org/wiki/Hypergeometric_distribution But when i try to do this manually using the choose() function (see attempt below example with just two gene lists) i'm unable to perform the calculations- the numbers hit infinity before getting an answer. Searching cran archives for Multivariate hypergeometric show this term in the vignettes of package's ‘combinat’ and ‘forward’. But i'm unable to make sense of the these pachakege functions in the context of my aforementioned apllication. Can some one suggest a function, script or method to achieve my goal of estimating the likelyhood of overlap between 3 lists of genes, ideally using the multivariate hypergeometric, or anything else for that matter? Two suggestions: 1) Don't! Likely the theory is unsuited for the application. In most applications that generate lists of genes, the genes are not iid realizations and the hypergeometric gives results that are astonishingly anticonservative. As an alternative , the block bootstrap may be suitable. See http://171.66.122.45/cgi/content/abstract/17/6/760 and Google (scholar) 'genomic block bootstrap' for some starting points. 2) Take this thread to the bioconductor list. You are much more likely to get pointers to useful packages and functions for genomic statistical software there. HTH, Chuck cheers in advance, Karl #example attempt with two gene lists m n N - 45101 # total number balls in urn m - 720 # number of 'white' or 'special' balls in urn, aka 'success' n - 801 # number balls drawn or number of samples k - 40# number of 'white' or 'special' balls DRAWN a - choose(m,k) b - choose((N-m),(n-k)) z - choose(N,n) prK - (a*b)/z #'the answer' print(prK) [1] NaN a [1] 7.985852e+65 b [1] Inf z [1] Inf -- Karl Brand Department of Genetics Erasmus MC Dr Molewaterplein 50 3015 GE Rotterdam T +31 (0)10 704 3457 | F +31 (0)10 704 4743 | M +31 (0)642 777 268 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
On Mar 30, 2010, at 11:19 AM, Corrado wrote: Dear David, David Winsemius wrote: A) It is not an error, only a warning. Wouldn't it seem reasonable to issue such a warning if you have data that violates the distributional assumptions? I am not questioning the approach. I am only trying to understand why a (rather expensive) source of documentation and the behaviour of a function are not aligned. B) You did not include any of the data Data attached as R object. C) Wouldn't this be more appropriate to the author of the book if this is exactly what was suggested there? I think it will be definitively appropriate, but only when I am certain I am not doing anything wrong. I don't understand this perspective. You bought Crowley's book so he is in some minor sense in debt to you. Why should you think it is more appropriate to send your message out to thousands of readers of r- help around the world (some of whom have written books that you did not buy) before sending Crowley a question about his text? Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Code is too slow: mean-centering variables in a data frame by subgroup
I wrote a different code - but it takes twice as long as my original code. :( However, I thought I should share it as well - because the second part of the code is fast - it's the first part that's slow. Maybe there is a way to fix the first part... Thank you! group.var-group subgroups-levels(frame[[group.var]]) system.time({ means.no.zeros-list() for(i in 1:length(subgroups)){ # SLOW part of the code row.of.means-as.data.frame(t(colMeans(frame[frame[[group.var]] %in% subgroups[i],names.used],na.rm=T))) nr.of.rows-(dim(frame[frame[[group.var]] %in% subgroups[i],])[1]) means.no.zeros[[i]]-as.data.frame(matrix(nrow=nr.of.rows,ncol=length(names.used))) means.no.zeros[[i]]-row.of.means for(z in 1:nr.of.rows){ #z-1 means.no.zeros[[i]][z,] = row.of.means } } means.no.zeros-do.call(rbind,means.no.zeros) }) system.time({#FAST part of the code frame[names.used]-frame[names.used]/means.no.zeros }) On Tue, Mar 30, 2010 at 11:04 AM, Dimitri Liakhovitski ld7...@gmail.com wrote: Dear R-ers, I have a large data frame (several thousands of rows and about 2.5 thousand columns). One variable (group) is a grouping variable with over 30 levels. And I have a lot of NAs. For each variable, I need to divide each value by variable mean - by subgroup. I have the code but it's way too slow - takes me about 1.5 hours. Below is a data example and my code that is too slow. Is there a different, faster way of doing the same thing? Thanks a lot for your advice! Dimitri # Building an example frame - with groups and a lot of NAs: set.seed(1234) frame-data.frame(group=rep(paste(group,1:10),10),a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100)) frame-frame[order(frame$group),] names.used-names(frame)[2:length(frame)] set.seed(1234) for(i in names.used){ i.for.NA-sample(1:100,60) frame[[i]][i.for.NA]-NA } frame ### Code that does what's needed but is too slow: Start-Sys.time() frame - do.call(cbind, lapply(names.used, function(x){ unlist(by(frame, frame$group, function(y) y[,x] / mean(y[,x],na.rm=T))) })) Finish-Sys.time() print(Finish-Start) # Takes too long -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding positions in array
which(y7) z-which(y7) On 3/30/2010 2:54 PM, Romildo Martins wrote: Hello, I need a function to check what positions of the array are greater than y and return to positions in another array z. x-array(E(gaux)$weight) x [1] 3 8 10 6 If y = 7 z [1] 2 3 Thanks a lot! Romild [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Erich Neuwirth, University of Vienna Faculty of Computer Science Computer Supported Didactics Working Group Visit our SunSITE at http://sunsite.univie.ac.at Phone: +43-1-4277-39464 Fax: +43-1-4277-39459 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Code is too slow: mean-centering variables in a data frame by subgroup
On Tue, 30 Mar 2010, Dimitri Liakhovitski wrote: Dear R-ers, I have a large data frame (several thousands of rows and about 2.5 thousand columns). One variable (group) is a grouping variable with over 30 levels. And I have a lot of NAs. For each variable, I need to divide each value by variable mean - by subgroup. I have the code but it's way too slow - takes me about 1.5 hours. Below is a data example and my code that is too slow. Is there a different, faster way of doing the same thing? Thanks a lot for your advice! Dimitri # Building an example frame - with groups and a lot of NAs: set.seed(1234) frame-data.frame(group=rep(paste(group,1:10),10),a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100)) Use model.matrix and crossprod to do this in a vectorized fashion: mat - as.matrix(frame[,-1]) mm - model.matrix(~0+group,frame) col.grp.N - crossprod( !is.na(mat), mm ) mat[is.na(mat)] - 0.0 col.grp.sum - crossprod( mat, mm ) mat - mat / ( t(col.grp.sum/col.grp.N)[ frame$group,] ) is.na(mat) - is.na(frame[,-1]) mat is now a matrix whose columns each correspond to the columns in 'frame' as you have it after do.call(...) Are you sure you want to divide the values by their (possibly negative) means?? HTH, Chuck frame-frame[order(frame$group),] names.used-names(frame)[2:length(frame)] set.seed(1234) for(i in names.used){ i.for.NA-sample(1:100,60) frame[[i]][i.for.NA]-NA } frame ### Code that does what's needed but is too slow: Start-Sys.time() frame - do.call(cbind, lapply(names.used, function(x){ unlist(by(frame, frame$group, function(y) y[,x] / mean(y[,x],na.rm=T))) })) Finish-Sys.time() print(Finish-Start) # Takes too long -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] about the possible errors in Rgraphviz Package
Hi All, I tried to install the package of Rgraphviz in the following two ways successfully: source(http://bioconductor.org/biocLite.R;) biocLite(Rgraphviz) install.packages(pkgs=C:/Progra~1/R/lib_download/Rgraphviz_1.24.0.zip, lib=C:/Progra~1/R/R-2.10.1/library, repos=NULL) but when I loaded the package though library(Rgraphviz) or library(Rgraphviz), and got the same error message below: Error in inDL(x, as.logical(local), as.logical(now), ...) : unable to load shared library 'C:/PROGRA~1/R/R-210~1.1/library/Rgraphviz/libs/Rgraphviz.dll': LoadLibrary failure: The specified module could not be found. I think that it is the error in the package because it should go to 'C:/PROGRA~1/R/R-2.10.1/library/Rgraphviz/libs/Rgraphviz.dll' instead of 'C:/PROGRA~1/R/R-210~1.1/library/Rgraphviz/libs/Rgraphviz.dll' Could anyone help me to solve to problem? Thank you very much for the help. Howard __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dot plot
Hi All, I need to make a dot plot where the points of the plot are connected with lines.is the possible to do in R? Also, I do nto know how to combine two plots into one plot? thanks and I appreciate your help -- View this message in context: http://n4.nabble.com/dot-plot-tp1745415p1745415.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] list index rules evaluation behavior
I have what may be a simple/foolish question, but I've done the due diligence and looked through pages of posts here as well as several of the PDFs on the CRAN site, but haven't been able find what I'm after. I am working with a list of say 3 histogram objects A, B C, and each histogram is a list of 7 elements. I would like to access $name, the 6th element, of histograms A,B and C. Trial and error yielded some results that told me I clearly don't understand how R interprets index commands. For the histogram list above: a[1:2] give histograms A and B as expected. a[[1:2]] gives the second element of histogram 1, but a[[1:1]] gives all elements of histogram 1, while a[[1:3]] gives null?! If anyone could help with an explanation of indexing rules, or a source that does so, I would very much appreciate it. Oh and an answer to the first question! Thanks All Jason -- View this message in context: http://n4.nabble.com/list-index-rules-evaluation-behavior-tp1745398p1745398.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Code is too slow: mean-centering variables in a data frame by subgroup
I posted a similar problem last week (but with an uninformative subject header) See if this http://n4.nabble.com/a-vectorized-solution-to-some-simple-dataframe-math-td1692810.html#a1710410 this helps. -- View this message in context: http://n4.nabble.com/Code-is-too-slow-mean-centering-variables-in-a-data-frame-by-subgroup-tp1745335p1745434.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dot plot
?lines ?plot (with type= argument) Ivan Le 3/30/2010 17:55, kayj a écrit : Hi All, I need to make a dot plot where the points of the plot are connected with lines.is the possible to do in R? Also, I do nto know how to combine two plots into one plot? thanks and I appreciate your help -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] weighted.median function from package R.basic
Dear all, I want to apply a weighted median on a huge dataset, and I remember a function from the package R.basic that could do this using an internal sorting algorithm qsort. This speeded things up quite a bit. Alas, I can't find that package anywhere anymore. There is a weighted.median function in the package limma too, but I didn't use that before. Anybody who knows what happened to R.basic? Cheers Joris -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Code is too slow: mean-centering variables in a data frame by subgroup
Thanks a lot, Charles - I'll try your approach. Yes - don't worry about dividing by negative means - in real data all values are positive. Dimitri On Tue, Mar 30, 2010 at 12:24 PM, Charles C. Berry cbe...@tajo.ucsd.edu wrote: On Tue, 30 Mar 2010, Dimitri Liakhovitski wrote: Dear R-ers, I have a large data frame (several thousands of rows and about 2.5 thousand columns). One variable (group) is a grouping variable with over 30 levels. And I have a lot of NAs. For each variable, I need to divide each value by variable mean - by subgroup. I have the code but it's way too slow - takes me about 1.5 hours. Below is a data example and my code that is too slow. Is there a different, faster way of doing the same thing? Thanks a lot for your advice! Dimitri # Building an example frame - with groups and a lot of NAs: set.seed(1234) frame-data.frame(group=rep(paste(group,1:10),10),a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100)) Use model.matrix and crossprod to do this in a vectorized fashion: mat - as.matrix(frame[,-1]) mm - model.matrix(~0+group,frame) col.grp.N - crossprod( !is.na(mat), mm ) mat[is.na(mat)] - 0.0 col.grp.sum - crossprod( mat, mm ) mat - mat / ( t(col.grp.sum/col.grp.N)[ frame$group,] ) is.na(mat) - is.na(frame[,-1]) mat is now a matrix whose columns each correspond to the columns in 'frame' as you have it after do.call(...) Are you sure you want to divide the values by their (possibly negative) means?? HTH, Chuck frame-frame[order(frame$group),] names.used-names(frame)[2:length(frame)] set.seed(1234) for(i in names.used){ i.for.NA-sample(1:100,60) frame[[i]][i.for.NA]-NA } frame ### Code that does what's needed but is too slow: Start-Sys.time() frame - do.call(cbind, lapply(names.used, function(x){ unlist(by(frame, frame$group, function(y) y[,x] / mean(y[,x],na.rm=T))) })) Finish-Sys.time() print(Finish-Start) # Takes too long -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Create a new variable
Dear R-list, Sorry for spamming the list lately, I am just learning the more advanced aspects of R! I have some data that looks like this: Out Country1 Country 2 Country 3 ... CountryN 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 I want to create a new variable that counts the number of zeros in every row whenever Out is equal to 1, and else it is a zero, so it would look like this: new_var 0 0 2 I have tried the following: for (i in length(Out)){ if (Out == 1) {new_var - sum(dat[i,] != 1)} else {new_var - 0} } but this gives me an error message. Best, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] list index rules evaluation behavior
Hi Jason, try using comma's instead of colons. eg a[[c(1,6)]], a[[c(3,6)]] etc... If you use a[[1:3]] this is equivalent to a[[c(1,2,3)]]. As the list only contains 2 levels, this will give an error or NULL , depending on your R version. More info you find by ?[[ Cheers Joris On Tue, Mar 30, 2010 at 5:47 PM, Dgnn sharkbrain...@gmail.com wrote: I have what may be a simple/foolish question, but I've done the due diligence and looked through pages of posts here as well as several of the PDFs on the CRAN site, but haven't been able find what I'm after. I am working with a list of say 3 histogram objects A, B C, and each histogram is a list of 7 elements. I would like to access $name, the 6th element, of histograms A,B and C. Trial and error yielded some results that told me I clearly don't understand how R interprets index commands. For the histogram list above: a[1:2] give histograms A and B as expected. a[[1:2]] gives the second element of histogram 1, but a[[1:1]] gives all elements of histogram 1, while a[[1:3]] gives null?! If anyone could help with an explanation of indexing rules, or a source that does so, I would very much appreciate it. Oh and an answer to the first question! Thanks All Jason -- View this message in context: http://n4.nabble.com/list-index-rules-evaluation-behavior-tp1745398p1745398.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create a new variable
Hello, Thomas Jensen wrote: Dear R-list, Sorry for spamming the list lately, I am just learning the more advanced aspects of R! I have some data that looks like this: Out Country1 Country 2 Country 3 ... CountryN 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 Don't paste data like this to the list. Use ?dput to create an easy to use data.frame that users of the list can input with one R command. You will most likely get help very quickly at that point since our data will match your's exactly. I want to create a new variable that counts the number of zeros in every row whenever Out is equal to 1, and else it is a zero, so it would look like this: new_var 0 0 2 I have tried the following: for (i in length(Out)){ if (Out == 1) {new_var - sum(dat[i,] != 1)} else {new_var - 0} } but this gives me an error message. I have not tested any of this, but I'm guessing something like the following would work. Assume your data.frame is called df. #NOT TESTED tmp - apply(df, 1, function(x) sum(x == 0)) df$new_var - ifelse(df$Out == 1, tmp, 0) See ?apply and ?ifelse . Best, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Code is too slow: mean-centering variables in a data frame by subgroup
Dear Charles, thank you so much! On my example data frame you code takes 0 sec and mine - 0.05 sec - a huge difference even if 0 = 0.04 sec. Dimitri On Tue, Mar 30, 2010 at 12:30 PM, Dimitri Liakhovitski ld7...@gmail.com wrote: Thanks a lot, Charles - I'll try your approach. Yes - don't worry about dividing by negative means - in real data all values are positive. Dimitri On Tue, Mar 30, 2010 at 12:24 PM, Charles C. Berry cbe...@tajo.ucsd.edu wrote: On Tue, 30 Mar 2010, Dimitri Liakhovitski wrote: Dear R-ers, I have a large data frame (several thousands of rows and about 2.5 thousand columns). One variable (group) is a grouping variable with over 30 levels. And I have a lot of NAs. For each variable, I need to divide each value by variable mean - by subgroup. I have the code but it's way too slow - takes me about 1.5 hours. Below is a data example and my code that is too slow. Is there a different, faster way of doing the same thing? Thanks a lot for your advice! Dimitri # Building an example frame - with groups and a lot of NAs: set.seed(1234) frame-data.frame(group=rep(paste(group,1:10),10),a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100)) Use model.matrix and crossprod to do this in a vectorized fashion: mat - as.matrix(frame[,-1]) mm - model.matrix(~0+group,frame) col.grp.N - crossprod( !is.na(mat), mm ) mat[is.na(mat)] - 0.0 col.grp.sum - crossprod( mat, mm ) mat - mat / ( t(col.grp.sum/col.grp.N)[ frame$group,] ) is.na(mat) - is.na(frame[,-1]) mat is now a matrix whose columns each correspond to the columns in 'frame' as you have it after do.call(...) Are you sure you want to divide the values by their (possibly negative) means?? HTH, Chuck frame-frame[order(frame$group),] names.used-names(frame)[2:length(frame)] set.seed(1234) for(i in names.used){ i.for.NA-sample(1:100,60) frame[[i]][i.for.NA]-NA } frame ### Code that does what's needed but is too slow: Start-Sys.time() frame - do.call(cbind, lapply(names.used, function(x){ unlist(by(frame, frame$group, function(y) y[,x] / mean(y[,x],na.rm=T))) })) Finish-Sys.time() print(Finish-Start) # Takes too long -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Code is too slow: mean-centering variables in a data frame by subgroup
I meant - even if 0 = 0.004 D. On Tue, Mar 30, 2010 at 12:47 PM, Dimitri Liakhovitski ld7...@gmail.com wrote: Dear Charles, thank you so much! On my example data frame you code takes 0 sec and mine - 0.05 sec - a huge difference even if 0 = 0.04 sec. Dimitri On Tue, Mar 30, 2010 at 12:30 PM, Dimitri Liakhovitski ld7...@gmail.com wrote: Thanks a lot, Charles - I'll try your approach. Yes - don't worry about dividing by negative means - in real data all values are positive. Dimitri On Tue, Mar 30, 2010 at 12:24 PM, Charles C. Berry cbe...@tajo.ucsd.edu wrote: On Tue, 30 Mar 2010, Dimitri Liakhovitski wrote: Dear R-ers, I have a large data frame (several thousands of rows and about 2.5 thousand columns). One variable (group) is a grouping variable with over 30 levels. And I have a lot of NAs. For each variable, I need to divide each value by variable mean - by subgroup. I have the code but it's way too slow - takes me about 1.5 hours. Below is a data example and my code that is too slow. Is there a different, faster way of doing the same thing? Thanks a lot for your advice! Dimitri # Building an example frame - with groups and a lot of NAs: set.seed(1234) frame-data.frame(group=rep(paste(group,1:10),10),a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100)) Use model.matrix and crossprod to do this in a vectorized fashion: mat - as.matrix(frame[,-1]) mm - model.matrix(~0+group,frame) col.grp.N - crossprod( !is.na(mat), mm ) mat[is.na(mat)] - 0.0 col.grp.sum - crossprod( mat, mm ) mat - mat / ( t(col.grp.sum/col.grp.N)[ frame$group,] ) is.na(mat) - is.na(frame[,-1]) mat is now a matrix whose columns each correspond to the columns in 'frame' as you have it after do.call(...) Are you sure you want to divide the values by their (possibly negative) means?? HTH, Chuck frame-frame[order(frame$group),] names.used-names(frame)[2:length(frame)] set.seed(1234) for(i in names.used){ i.for.NA-sample(1:100,60) frame[[i]][i.for.NA]-NA } frame ### Code that does what's needed but is too slow: Start-Sys.time() frame - do.call(cbind, lapply(names.used, function(x){ unlist(by(frame, frame$group, function(y) y[,x] / mean(y[,x],na.rm=T))) })) Finish-Sys.time() print(Finish-Start) # Takes too long -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create a new variable
Easy using rowSums : x - data.frame(X1=c(0,0,1),X2=c(1,1,1),X3=c(0,1,0)) x$Nulls - rowSums(x==0) x X1 X2 X3 Nulls 1 0 1 0 2 2 0 1 1 1 3 1 1 0 1 Cheers On Tue, Mar 30, 2010 at 6:31 PM, Thomas Jensen thomas.jen...@eup.gess.ethz.ch wrote: Dear R-list, Sorry for spamming the list lately, I am just learning the more advanced aspects of R! I have some data that looks like this: Out Country1 Country 2 Country 3 ... CountryN 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 I want to create a new variable that counts the number of zeros in every row whenever Out is equal to 1, and else it is a zero, so it would look like this: new_var 0 0 2 I have tried the following: for (i in length(Out)){ if (Out == 1) {new_var - sum(dat[i,] != 1)} else {new_var - 0} } but this gives me an error message. Best, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weighted.median function from package R.basic
Hi, good memory. weightedMedian() is now available in the aroma.light package (it was moved there from R.basic in Feb 2006). /Henrik (author of both packages) On Tue, Mar 30, 2010 at 6:30 PM, Joris Meys jorism...@gmail.com wrote: Dear all, I want to apply a weighted median on a huge dataset, and I remember a function from the package R.basic that could do this using an internal sorting algorithm qsort. This speeded things up quite a bit. Alas, I can't find that package anywhere anymore. There is a weighted.median function in the package limma too, but I didn't use that before. Anybody who knows what happened to R.basic? Cheers Joris -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about the possible errors in Rgraphviz Package
On 30/03/2010 10:44 AM, HU,ZHENGJUN wrote: Hi All, I tried to install the package of Rgraphviz in the following two ways successfully: source(http://bioconductor.org/biocLite.R;) biocLite(Rgraphviz) install.packages(pkgs=C:/Progra~1/R/lib_download/Rgraphviz_1.24.0.zip, lib=C:/Progra~1/R/R-2.10.1/library, repos=NULL) but when I loaded the package though library(Rgraphviz) or library(Rgraphviz), and got the same error message below: Error in inDL(x, as.logical(local), as.logical(now), ...) : unable to load shared library 'C:/PROGRA~1/R/R-210~1.1/library/Rgraphviz/libs/Rgraphviz.dll': LoadLibrary failure: The specified module could not be found. Most likely the problem is that you haven't followed the installation instructions. (They are pretty hard to find, but I think you can find them on the Bioconductor site.) It is not enough to install the Rgraphviz package, you also need to install Graphviz. Duncan Murdoch I think that it is the error in the package because it should go to 'C:/PROGRA~1/R/R-2.10.1/library/Rgraphviz/libs/Rgraphviz.dll' instead of 'C:/PROGRA~1/R/R-210~1.1/library/Rgraphviz/libs/Rgraphviz.dll' Could anyone help me to solve to problem? Thank you very much for the help. Howard __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] list index rules evaluation behavior
On Mar 30, 2010, at 11:47 AM, Dgnn wrote: I have what may be a simple/foolish question, but I've done the due diligence and looked through pages of posts here as well as several of the PDFs on the CRAN site, but haven't been able find what I'm after. I am working with a list of say 3 histogram objects A, B C, and each histogram is a list of 7 elements. I would like to access $name, the 6th element, of histograms A,B and C. If you want better answers, you should provide better examples ... with _CODE_. Trial and error yielded some results that told me I clearly don't understand how R interprets index commands. For the histogram list above: a[1:2] give histograms A and B as expected. a[[1:2]] gives the second element of histogram 1, but a[[1:1]] gives all elements of histogram 1, while a[[1:3]] gives null?! If anyone could help with an explanation of indexing rules, or a source that does so, I would very much appreciate it. Oh and an answer to the first question! ?[[ [[ always returns a single vector or list and so its arguments will be coerced to a single value. When passed an arguemnt that has multiple values it is interpreted as serial application of [[ with the serial values. The construction [[1:1]] gets turned into [[1]] (since 1:1 is just 1) while the construction [[1:2]] got turned into [[1]][[2]] list(a=list(aa=5, bb=6),b=2,c=3)[[1:2]] [1] 6 [ may return a more complex object and so may accept multiple arguments list(a=1,b=2,c=3)[c(1,3)] $a [1] 1 $c [1] 3 -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
G'day all, On Tue, 30 Mar 2010 16:19:46 +0100 Corrado ct...@york.ac.uk wrote: David Winsemius wrote: A) It is not an error, only a warning. Wouldn't it seem reasonable to issue such a warning if you have data that violates the distributional assumptions? I am not questioning the approach. I am only trying to understand why a (rather expensive) source of documentation and the behaviour of a function are not aligned. 1) Also expensive books have typos in them. 2) glm() is from a package that is part of R and the author of this book is AFAIK not a member of R core, hence has no control on whether his documentation and the behaviour of a function are aligned. a) If he were documenting a function that was part of a package he wrote as support for his book, as some authors do, there might be a reason to complain. But then 1) would still apply. b) Even books written by members of R core have occasionally misalignments between the behaviour of a function and the documentation contained in such books. This can be due to them documenting a function over whose implementation they do not have control (e.g. a function in a contributed package) or the fact that R is improving/changing from version to version while books are rather static. For these reasons it is always worthwhile to check the errata page for a book, if such exists. The source of the warning is due to the fact that you do not provide all necessary information about your response. If your response is binomial (with a mean depended on some explanatory variables), then each response consists of two numbers, the number of trials and the number of success. If you calculate the observed proportion of successes from these two numbers and feed this into glm as the response, you are omitting necessary information. In this case, you should provide the number of trials on which each proportion is based as prior weights. For example: R x - seq(from=-1,to=1,length=41) R px - exp(x)/(1+exp(x)) R nn - sample(8:12, 41, replace=TRUE) R yy - rbinom(41, size=nn, prob=px) R y - yy/nn R glm(y~x, family=binomial, weights=nn) Call: glm(formula = y ~ x, family = binomial, weights = nn) Coefficients: (Intercept)x 0.2461.124 Degrees of Freedom: 40 Total (i.e. Null); 39 Residual Null Deviance: 91.49 Residual Deviance: 50.83AIC: 157.6 R glm(y~x, family=binomial) Call: glm(formula = y ~ x, family = binomial) Coefficients: (Intercept)x 0.2143 1.1152 Degrees of Freedom: 40 Total (i.e. Null); 39 Residual Null Deviance: 9.256 Residual Deviance: 5.229AIC: 49.87 Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! HTH, Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
Corrado I am afraid not the paragraph's title is a bit of a give away: Proportion Data and Binomial Errors The sentence reads: are dealt with by using a generalised linear model with a binomial error structure. with the example: glm(y~x,family=binomial) You can check at page 514/515. It would be better to check Chapter 16 (from page 569) on Proportions. The pages you cite don't come across to me as an example of how this procedure should be carried out, but rather a trivial example on the changes in syntax between a linear model and a GLM. Graham __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] large dataset
KeithC, If you're arguing that there should be more documentation and examples explaining how to use very large data sets with R, then I agree. Feel free to write some. I've been giving tutorials on this for years now. I wrote the first netCDF interface package for R because I needed to use data that wouldn't fit on a 64Mb system. I wrote the biglm package to handle out-of-core regression. My presentation at the last useR meeting was on how to automatically load variables on demand from a SQL connection. It's still true that you can't treat large data sets and small data sets the same way, and I still think that it's even more important to point out that nearly everyone doesn't have large data and doesn't need to worry about these issues. -thomas On Mon, 29 Mar 2010, kMan wrote: Dear Thomas, While it may be true that R (and S) are *accused* of being slow, memory-hungry, and able to handle only small data sets (emphasis added), the accusation is false, rendering the *accusers* misinformed. Transparency is another, perhaps more interesting matter. R-users can *experience* R as limited in the ways described above (a functional limitation) while making a false technical assertion, without generating a dichotomy. It is a bit like a cell phone example from human-computer interaction circles in the 90s. The phone could technically work, provided one is an engineer so as to make sense out of its interface, while for most people, it may *functionally* be nothing more than a paperweight. R is not technically limited in the way the accusation reads (the point I was making), though many users are functionally limited so (the point you seem to have made or at least passed along). An R user can get far more data into memory as single objects with R than with other stats packages; including matlab, JMP, and, obviously, excel. This is just a simple comparison of the programs' documented environment size and object limits. The difference in the same read/scan operation between R and JMP on 600 Mb of data could easily be 25+ minutes (R perhaps taking 5-7 minutes, with JMP taking 30+ minutes, assuming 1.8GHz 3GB RAM I used back when I made the comparison that sold me on R). R can do formal operations with all that data in memory, assuming the environment is given enough space to work with, while JMP will do the same operation in several smaller chunks, reference the disk several times, AND on windows machines, cause the OS to page. In that case, the differences can be upwards of a day. With the ability to handle larger chunks at once, and direct control over preventing one's OS from paging, R users should be able to crank out analyses on very large datasets faster than other programs. I am perfectly willing to accept that consumers of statistical software may *experience* R as more limiting, in keeping with the accusations, that the effect may be larger for newcomers, and even larger for newcomers after controlling for transparency. I'd expect the effect to reverse at around 3 years of experience, controlling for transparency or not. Large scale data may present technical problems many users choose simply to avoid using R for, so the effect may not reverse for these issues. Even when R is more than capable of outperforming other programs, its usability (or access to suitable documentation/training material) apparently isn't currently up to the challenge. This is something the R community should be gnawing at the bit to address. I'd think a consortium of sorts showcasing large-scale data support in R would be a stellar contribution, and perhaps an issue of R-journal devoted to the topic, say, of near worst-case scenario - 10Gb of data containing different data types (categorical, numeric, embedded matrices), in a .csv file, header information somewhere else. Now how do the authors explain to the beginner (say, 1 year experience with I/O) how to tackle getting the data into a more suitable format, and then how did they analyze it 300Mb at a time, all using R, in a non-cluster/single user environment, 32 bit, while controlling for the environment size, missing data, and preventing paging? How was their solution different when moving to 64 bit? Moving to a cluster? One of the demos would certainly have to use scan() exclusively for I/O, perhaps also demonstrating why the 'bad practice' part of working with raw text files is something more than mere prescription. Sincerely, KeithC. -Original Message- From: Thomas Lumley [mailto:tlum...@u.washington.edu] Sent: Monday, March 29, 2010 2:56 PM To: Gabor Grothendieck Cc: kMan; r-help; n.via...@libero.it Subject: Re: [R] large dataset On Mon, 29 Mar 2010, Gabor Grothendieck wrote: On Mon, Mar 29, 2010 at 4:12 PM, Thomas Lumley tlum...@u.washington.edu wrote: On Sun, 28 Mar 2010, kMan wrote: This was *very* useful for me when I dealt with a 1.5Gb text file http://www.csc.fi/sivut/atcsc/arkisto/atcsc3_2007/ohjelmistot_html/R_and_la
Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
At 12:08 PM 3/30/2010, David Winsemius wrote: snip I don't understand this perspective. You bought Crowley's book so he is in some minor sense in debt to you. Why should you think it is more appropriate to send your message out to thousands of readers of r- help around the world (some of whom have written books that you did not buy) before sending Crowley a question about his text? In fairness to Michael Crawley, whose books are useful and very clear (although not well-liked on this list for some reason): 1. The example quoted by Corrado Topi is not an actual example. Instead is an isolated line of code given to illustrate the simplicity of glm() syntax and its relation to lm() syntax. This is in a short general topic overview chapter on GLMs meant to introduce concepts and terminology, not runnable code. 2. The example chapter is followed in the book by individual chapters on each type of GLM covered (count data, count data in tables, proportion data, binary response variables). If Corrado Topi had looked in the relevant chapter, he would find numerous worked out examples with runnable code. Corrado Topi made an error in trying to run an isolated line of code without antecedent definitions, which almost never works in any programming system. Michael Crawley made a mistake in judgment in assuming that detail later will suffice for generality now. My advice to Corrado Topi is engage in some forward referencing, and read chapters 16 and 17 before deciding which example code to run. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: r...@lcfltd.com Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weighted.median function from package R.basic
While perhaps not the solution you were looking for, you might consider estimating weighted medians with linear quantile regression (just specify an intercept for single sample analysis, tau=0.50, and weights = your weights) in the quantreg package. Quantile regression does not require sorting to estimate medians (minimizes and objective function) and thus might require less computing time on a large data set. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 From: Joris Meys jorism...@gmail.com To: R mailing list r-help@r-project.org Date: 03/30/2010 10:39 AM Subject: [R] weighted.median function from package R.basic Sent by: r-help-boun...@r-project.org Dear all, I want to apply a weighted median on a huge dataset, and I remember a function from the package R.basic that could do this using an internal sorting algorithm qsort. This speeded things up quite a bit. Alas, I can't find that package anywhere anymore. There is a weighted.median function in the package limma too, but I didn't use that before. Anybody who knows what happened to R.basic? Cheers Joris -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] for loop; lm() regressions; list of vectors
## Hello everyone, ## ## I am trying to execute 150 times a lm regression using the 'for' loop, with 150 vectors for y, ## ## and always the same vector for x. ## ## I have an object with 150 elements named a, ## ## and a vector of 60 values named b. ## ## Each element in a has 60 values plus a header. ## ## When I type: r - lm(i ~ b) for(i in a) print(r) ## I get 150 times the lm results of the first element of a regressed with b, ## ## whereas I would like to have 150 different regression results from each element in a... ## ## Can someone please help me with the syntax of my loop please? ## ## Many Thanks, ## ## Driss Agramelal ## ## Switzerland ## [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multivariate hypergeometric distribution version of phyper()
Karl, I strongly support Chuck's recommendations. If you do still want to compute such probabilities 'by hand', you could consider the lchoose() function which does work for your example. -Peter Ehlers On 2010-03-30 9:55, Charles C. Berry wrote: On Tue, 30 Mar 2010, Karl Brand wrote: Dear R Users, I employed the phyper() function to estimate the likelihood that the number of genes overlapping between 2 different lists of genes is due to chance. This appears to work appropriately. Now i want to try this with 3 lists of genes which phyper() does not appear to support. Some googling suggests i can utilize the Multivariate hypergeometric distribution to achieve this. eg.: http://en.wikipedia.org/wiki/Hypergeometric_distribution But when i try to do this manually using the choose() function (see attempt below example with just two gene lists) i'm unable to perform the calculations- the numbers hit infinity before getting an answer. Searching cran archives for Multivariate hypergeometric show this term in the vignettes of package's ‘combinat’ and ‘forward’. But i'm unable to make sense of the these pachakege functions in the context of my aforementioned apllication. Can some one suggest a function, script or method to achieve my goal of estimating the likelyhood of overlap between 3 lists of genes, ideally using the multivariate hypergeometric, or anything else for that matter? Two suggestions: 1) Don't! Likely the theory is unsuited for the application. In most applications that generate lists of genes, the genes are not iid realizations and the hypergeometric gives results that are astonishingly anticonservative. As an alternative , the block bootstrap may be suitable. See http://171.66.122.45/cgi/content/abstract/17/6/760 and Google (scholar) 'genomic block bootstrap' for some starting points. 2) Take this thread to the bioconductor list. You are much more likely to get pointers to useful packages and functions for genomic statistical software there. HTH, Chuck cheers in advance, Karl #example attempt with two gene lists m n N - 45101 # total number balls in urn m - 720 # number of 'white' or 'special' balls in urn, aka 'success' n - 801 # number balls drawn or number of samples k - 40 # number of 'white' or 'special' balls DRAWN a - choose(m,k) b - choose((N-m),(n-k)) z - choose(N,n) prK - (a*b)/z #'the answer' print(prK) [1] NaN a [1] 7.985852e+65 b [1] Inf z [1] Inf -- Karl Brand Department of Genetics Erasmus MC Dr Molewaterplein 50 3015 GE Rotterdam T +31 (0)10 704 3457 | F +31 (0)10 704 4743 | M +31 (0)642 777 268 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about the possible errors in Rgraphviz Package
On 30/03/2010 1:24 PM, HU,ZHENGJUN wrote: Hi Duncan, (They are pretty hard to find, but I think you can find them on the Bioconductor site.) It is not enough to install the Rgraphviz package, you also need to install Graphviz. Yes I did. Before installing the Rgraphviz package successfully, (1) I downloaded graphviz-2.26.3.msi for MS Windows (XP) and installed it successfully and (2) I also installed the packages from Bioconductor by: (Note: I use MS Windows XP and R 2.10.1 version) From the instructions: The right version of Graphviz for Bioconductor 2.5 is version 2.20.3.1. Duncan Murdoch source(http://www.bioconductor.org/biocLite.R;) biocLite() I got those error messages: Error in inDL(x, as.logical(local), as.logical(now), ...) : unable to load shared library 'C:/PROGRA~1/R/R-210~1.1/library/Rgraphviz/libs/Rgraphviz.dll': LoadLibrary failure: The specified module could not be found. Obviously, it seems it is the package problem because it should go to 'C:/PROGRA~1/R/R-2.10.1/library/Rgraphviz/libs/Rgraphviz.dll' instead of 'C:/PROGRA~1/R/R-210~1.1/library/Rgraphviz/libs/Rgraphviz.dll' Thank you for the reply. Howard On Tue Mar 30 12:50:44 EDT 2010, Duncan Murdoch murd...@stats.uwo.ca wrote: On 30/03/2010 10:44 AM, HU,ZHENGJUN wrote: Hi All, I tried to install the package of Rgraphviz in the following two ways successfully: source(http://bioconductor.org/biocLite.R;) biocLite(Rgraphviz) install.packages(pkgs=C:/Progra~1/R/lib_download/Rgraphviz_1.24.0.zip, lib=C:/Progra~1/R/R-2.10.1/library, repos=NULL) but when I loaded the package though library(Rgraphviz) or library(Rgraphviz), and got the same error message below: Error in inDL(x, as.logical(local), as.logical(now), ...) : unable to load shared library 'C:/PROGRA~1/R/R-210~1.1/library/Rgraphviz/libs/Rgraphviz.dll': LoadLibrary failure: The specified module could not be found. Most likely the problem is that you haven't followed the installation instructions. (They are pretty hard to find, but I think you can find them on the Bioconductor site.) It is not enough to install the Rgraphviz package, you also need to install Graphviz. Duncan Murdoch I think that it is the error in the package because it should go to 'C:/PROGRA~1/R/R-2.10.1/library/Rgraphviz/libs/Rgraphviz.dll' instead of 'C:/PROGRA~1/R/R-210~1.1/library/Rgraphviz/libs/Rgraphviz.dll' Could anyone help me to solve to problem? Thank you very much for the help. Howard __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- HU,ZHENGJUN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MySQL and RODBC - limitations
I found the solution. The problem was indeed R. Their is a simple way to solve the problem, but it just needs a bit more time. If you download large integers from a database, convert it on the fly with SELECT CONVERT(yourcolumn,char) That is it. This is nor problem, as long you do NO comparisons within this columns. If you want to find something like entry10entry11 ('13''2') than the result will be wrong, if both values do not have the same number of characters. Hence, if you have numbers, you must fill up the empty slotes with zeros. So it would look like: '13''02'. -- View this message in context: http://n4.nabble.com/MySQL-and-RODBC-limitations-tp1692743p1745570.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dataframe in loop
hello all: I would like to thank those who helped me out of the string problem..but now I got another problem. I used R to query from SQL and got a list of crsp_fundno of G-style mutual funds which is still alive. I use the following codes and got what I want: library(RODBC) channel-odbcConnect(CRSPFUND) g.crspfundno-sqlQuery(channel,select crsp_fundno from Fund_style where wbrger_obj_cd = 'G'order by crsp_fundno) g.crspfundno (got crsp_fundno of G-style fund from Fund_style table) y.crspfundno-sqlQuery(channel,select crsp_fundno from Fund_hdr where dead_flag = 'N'and end_dt=20091231 order by crsp_fundno) y.crspfundno (got crsp_fundno of still alive fund from Fund_hdr table) g$key-paste(g.crspfundno$crsp_fundno) y$key-paste(y.crspfundno$crsp_fundno) v.fundno-intersect(g$key,y$key) (using intersect to get crsp_fundno of G-style mutual funds which is still alive.) v.fundno What i need to do next is using the v.fundno I got to query from another table Monthly_return to get the mret coresponding to every v.fundno. I have only a basic idea of the code: for (i in 1:length(v.fundno)){ gmret-sqlQuery(channel,paste(select mret from Monthly_returns where crsp_fundno =,test[i],'and caldt 19900630 order by caldt')) } The loop doesn't work:( I realize it might be the problem that I didnt define the dataframe, but my limited knowledge cant help me find out how.. I will give you guys a example of my data: head(v.fundno) test-head(v.fundno) test [1] 2899 2903 2960 3094 3095 3211 If I dont do the loop and query for one fund say 2899, gmret.2899-sqlQuery(channel,select caldt, mret from Monthly_returns where crsp_fundno = 2899 and caldt 19900630 order by caldt) gmret.2899 It will give me what I want: sample2899-head(gmret.2899) sample2899 caldt mret 1 19900731 0.014204546 2 19900831 -0.050420168 3 19900928 -0.039823009 4 19901031 0.006144393 5 19901130 0.054961832 6 19901231 0.019632639 Can anybody help me with the loop? Thanks a lot Muting __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MySQL and RODBC - limitations
On 30/03/2010 1:35 PM, jorgusch wrote: I found the solution. The problem was indeed R. Their is a simple way to solve the problem, but it just needs a bit more time. If you download large integers from a database, convert it on the fly with SELECT CONVERT(yourcolumn,char) That is it. This is nor problem, as long you do NO comparisons within this columns. If you want to find something like entry10entry11 ('13''2') than the result will be wrong, if both values do not have the same number of characters. Hence, if you have numbers, you must fill up the empty slotes with zeros. So it would look like: '13''02'. If your longest integer is 10 digits (as mentioned earlier), you might do better to convert them to doubles rather than char. I don't know how to say double in mySQL, but if you can figure that out, you should be good to about 15 digits. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] open help files in browser
Hi, Is there a way to open help files in the default web browser instead of a new R-window when I use the help-functions (like ?, help.search() etc.)? thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error singular gradient matrix at initial parameterestimates in nls
Your model is almost certainly over-parameterized (given the data that you have to fit it), and the asymptotic correlation matrix of the parameters that you should get from the solutions that converged will probably have some large off diagonal elements. In other words, your model is essentially non-identifiable. If you don't know what the above means, you shouldn't be using nls. Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Gabor Grothendieck Sent: Tuesday, March 30, 2010 4:25 AM To: Corrado Cc: r-help@r-project.org Subject: Re: [R] Error singular gradient matrix at initial parameterestimates in nls You could try method=brute-force in the nls2 package to find starting values. On Tue, Mar 30, 2010 at 7:03 AM, Corrado ct...@york.ac.uk wrote: I am using nls to fit a non linear function to some data. The non linear function is: y= 1- exp(-(k0+k1*p1+ + kn*pn)) I have chosen algorithm port, with lower boundary is 0 for all of the ki parameters, and I have tried many start values for the parameters ki (including generating them at random). If I fit the non linear function to the same data using an external algorithm, it fits perfectly and finds the parameters. As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64 bit), I keep getting the error: Error in nlsModel(formula, mf, start, wts, upper) : singular gradient matrix at initial parameter estimates I have read all the previous postings and the documentation, but to no avail: the error is there to stay. I am sure the problem is with nls, because the external fitting algorithm perfectly fits it in less than a second. Also, if my n is 4, then the nls works perfectly (but that excludes all the k5 kn). Can anyone help me with suggestions? Thanks in advance. Alternatively, what do you suggest I should do? Shall I abandon nls in favour of optim? Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Code is too slow: mean-centering variables in a data frame bysubgroup
?scale Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Dimitri Liakhovitski Sent: Tuesday, March 30, 2010 8:05 AM To: r-help Subject: [R] Code is too slow: mean-centering variables in a data frame bysubgroup Dear R-ers, I have a large data frame (several thousands of rows and about 2.5 thousand columns). One variable (group) is a grouping variable with over 30 levels. And I have a lot of NAs. For each variable, I need to divide each value by variable mean - by subgroup. I have the code but it's way too slow - takes me about 1.5 hours. Below is a data example and my code that is too slow. Is there a different, faster way of doing the same thing? Thanks a lot for your advice! Dimitri # Building an example frame - with groups and a lot of NAs: set.seed(1234) frame-data.frame(group=rep(paste(group,1:10),10),a=rnorm(1:100),b=rnorm(1 :100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1: 100)) frame-frame[order(frame$group),] names.used-names(frame)[2:length(frame)] set.seed(1234) for(i in names.used){ i.for.NA-sample(1:100,60) frame[[i]][i.for.NA]-NA } frame ### Code that does what's needed but is too slow: Start-Sys.time() frame - do.call(cbind, lapply(names.used, function(x){ unlist(by(frame, frame$group, function(y) y[,x] / mean(y[,x],na.rm=T))) })) Finish-Sys.time() print(Finish-Start) # Takes too long -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] open help files in browser
If you're not already using R v2.10.0 or newer, try that first. My $.02 /Henrik On Tue, Mar 30, 2010 at 7:46 PM, Martin Batholdy batho...@googlemail.com wrote: Hi, Is there a way to open help files in the default web browser instead of a new R-window when I use the help-functions (like ?, help.search() etc.)? thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.