Re: [R] Graphical output format
Stats Wolf stats.wolf at gmail.com writes: Postscript, however, does not have to be what I need for two reasons. First, it does not accept some special characters from foreign languages (exactly like PDF). You should given an example for that in pdf. I always had the impression that pdf is the most comprehensive in foreign character support. Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [OT?]R Reference Manual review/recommend
John Kane wrote: I don't know the book but I doubt that it is a good way to learn R. I'd suggest having a look at some of the documentation available on the R site. Click on Other (in left column of page) have a look there and then select the contributed documentation link to get more documentation. Have a look at some of these offerings before buying any books. The on-line Introduction to R (Click on Manuals) is also very useful although I found that it was more useful after I had a basic understanding of R than as an intro for a complete novice who is not a statistician. Oh yes, it's also much easer to use in PDF form than in the HTML format. --- On Wed, 5/13/09, AG computing.acco...@googlemail.com wrote: From: AG computing.acco...@googlemail.com Subject: [R] [OT?]R Reference Manual review/recommend To: R-help@r-project.org Received: Wednesday, May 13, 2009, 4:55 PM Hello all I am looking to learn R and was thumbing through volume 1 of R reference manual - Base Package. I'm sorry if this is ludicrously silly to ask, but is this book worth the investment as a good way to learn how to use R? AG __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ Looking for the perfect gift? Give the gift of Flickr! http://www.flickr.com/gift/ Dear all Thanks for all of the suggestions. I'm glad I asked before I bought the book. Sounds like there's loads of alternatives, so will pursue those leads. Many thanks AG [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] transposing/rotating XY in a 3D array
Dear list, We have a number of files containing similarly structured data: file1: A B C 1 2 3 4 5 6 file2: A B C 7 8 9 10 11 12 ... etc My part of R receives all these data as an array: 1,2,3... 12 together with info about dimensions (row,col,fileN) . ( Converting the data into 3D cannot simply done by: array(x, c(2,3,2)) because breaks the structure (e.g. 1,3,5 is type mismatch) array(1:12,c(2,3,2)) , , 1 [,1] [,2] [,3] [1,]135 [2,]246 ... Of course following R's indexing order (rowIndex is the fastest) retains the structures, but X and Y dimensions are transposed. (note, c (2,3,2) = (3,2,2)) array(1:12, c(3,2,2)) , , 1 [,1] [,2] [1,]14 [2,]25 [3,]36 Its like converting into Japanese vertical reading. It is not that I cannot get used to it, but I guess it is less error prone if I have the same order as in the data files. Now I am using an ad-hoc function (see below) to transpose back the rotated YX into a XYZ array, but I'd rather go with built-ins, e.g. byrow=T, or t() -- and also without duplicating my data. THanks for the advice in advance. Gabor code transposeXYinXYZ-function(x){ y - array(NA,c(dim(x)[2],dim(x)[1],dim(x)[3])) for(i in 1:dim(x)[3]){ y[,,i] - t(x[,,i]) } return (y) } xyz - array(1:24,c(4,3,2)) yxz - transpose(x) xyz yxz /code __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Graphical output format
On Fri, 15 May 2009, Dieter Menne wrote: Stats Wolf stats.wolf at gmail.com writes: Postscript, however, does not have to be what I need for two reasons. First, it does not accept some special characters from foreign languages (exactly like PDF). 'foreign' is a relative term which is imprecise (and somewhat impolite) when writing to an international community: I don't suppose Dieter Menne regards German characters as 'foreign' but Ei-ji Nakama does, unlike Japanese ones. You should given an example for that in pdf. I always had the impression that pdf is the most comprehensive in foreign character support. Not really true, but since you can embed bitmaps in both PostScript and PDF, there are workarounds. PostScript and PDF use 8-bit encodings for character strings except for some predefined encodings for CJK languages, so in principle this is far less comprehensive than windows() and X11() which use Unicode. However, in practice the limitations are the glyphs available in the specified fonts, and in all the cases I am aware of an available font can be encoded in one or two 8-bit encodings (and hence in one or two R font families). You can't mix (say) Russian and Polish characters in a single text() call for pdf() (you can for windows()), but you can have them in separate calls for the same plot. There are (on suitable R platforms) cairo_pdf() and cairo_ps() devices. They are (on suitably rich OSes) able to cover a very wide range of characters, which they do by embedding the font gyphs into the output (often as bitmaps): the quality of the effect often depends on the output device used, which is why the traditional approach in PS/PDF is to render fonts in the output device. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Graphical output format
On 15 May 2009, at 10:01, Prof Brian Ripley wrote: On Fri, 15 May 2009, Dieter Menne wrote: Stats Wolf stats.wolf at gmail.com writes: Postscript, however, does not have to be what I need for two reasons. First, it does not accept some special characters from foreign languages (exactly like PDF). 'foreign' is a relative term which is imprecise (and somewhat impolite) when writing to an international community: I don't suppose Dieter Menne regards German characters as 'foreign' but Ei-ji Nakama does, unlike Japanese ones. You should given an example for that in pdf. I always had the impression that pdf is the most comprehensive in foreign character support. Just a thought: There was recently a discussion here on the pgfSweave [1] driver --- it should be possible to use it in conjunction with XeTeX [2] to process the pgf output. Presumably there will be issues of alignment and spacing but at least arbitrary characters of most languages could be employed in a fairly straight-forward manner. [1]: http://r-forge.r-project.org/R/?group_id=331 [2]: http://www.tug.org/xetex/ Regards, baptiste _ Baptiste Auguié School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulation
hey guys, i've been following this discussion about the simulation, and being a beginner myself, im really unsure of the best method. I hve the same problem as the initial one, except i need 1000 samples of size 15, and my distribution is Exp(1). I've adjusted some of the loop formulas for my n=15, but im unsure how to proceed in the quickest way. Can someone please help? Much appreciated :) From: r.tur...@auckland.ac.nz Date: Thu, 14 May 2009 10:26:38 +1200 To: c...@witthoft.com CC: r-help@r-project.org Subject: Re: [R] Simulation On 14/05/2009, at 10:04 AM, Carl Witthoft wrote: So far nobody seems to have warned the OP about seeding. Presumably Debbie wants 1000 different sets of samples, but as we all know there are ways to get the same sequence (initial seed) every time. If there's a starting seed for one of the generate a single giant matrix methods proposed, the whole matrix will be the same for a given seed. If rnorm is called 1000 times (hopefully w/ different random (oops) seeds), the entire set of samples will be different. and so on. I really don't get this. The OP wanted 1000 independent samples, each of size 100. Whether she does set.seed(42) M - matrix(rnorm(100*1000),nrow=1000) # Each row is a sample. or L - list() set.seed(42) for(i in 1:1000) L[[i]] - rnorm(100) # Each list entry is a sample. she gets this, i.e. the desired result. Setting a seed serves to make the results reproducible. This works via either approach. Making results reproducible in this manner is advisable, but seed-setting is nothing that the OP needs to be *warned* about. cheers, Rolf Turner ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help on Nan error
When i want to do ANOSIM i get an NaN error message. What is wrong? (lots of other code) iwithin=rep(0,(N*(N-1)/2) ) r.w=sum(r*iwithin)/sum(iwithin) iwithin is a vector of zeroes and so is its sum. r*iwithin is also a vector of zeroes, and so is its sum. Thus r.w=sum(r*iwithin)/sum(iwithin) is zero divided by zero, which is not defined. Regards, Richie. Mathematical Sciences Unit HSL ATTENTION: This message contains privileged and confidential inform...{{dropped:20}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simulation
hey guys, i've been following this discussion about the simulation, and being a beginner myself, im really unsure of the best method. I hve the same problem as the initial one, except i need 1000 samples of size 15, and my distribution is Exp(1). I've adjusted some of the loop formulas for my n=15, but im unsure how to proceed in the quickest way. Can someone please help? _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] transposing/rotating XY in a 3D array
Try this: x - array(1:12,c(3,2,2)) x , , 1 [,1] [,2] [1,]14 [2,]25 [3,]36 , , 2 [,1] [,2] [1,]7 10 [2,]8 11 [3,]9 12 xt - aperm(x, c(2,1,3)) xt , , 1 [,1] [,2] [,3] [1,]123 [2,]456 , , 2 [,1] [,2] [,3] [1,]789 [2,] 10 11 12 Good day! Kushantha Perera | Amba Research Ph +94 11 235 6281 | Mob +94 77 222 4373 Bangalore * Colombo * London * New York * San José * Singapore * www.ambaresearch.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of andzsin Sent: Friday, May 15, 2009 12:38 PM To: r-help@r-project.org Subject: [R] transposing/rotating XY in a 3D array Dear list, We have a number of files containing similarly structured data: file1: A B C 1 2 3 4 5 6 file2: A B C 7 8 9 10 11 12 ... etc My part of R receives all these data as an array: 1,2,3... 12 together with info about dimensions (row,col,fileN) . ( Converting the data into 3D cannot simply done by: array(x, c(2,3,2)) because breaks the structure (e.g. 1,3,5 is type mismatch) array(1:12,c(2,3,2)) , , 1 [,1] [,2] [,3] [1,]135 [2,]246 ... Of course following R's indexing order (rowIndex is the fastest) retains the structures, but X and Y dimensions are transposed. (note, c (2,3,2) = (3,2,2)) array(1:12, c(3,2,2)) , , 1 [,1] [,2] [1,]14 [2,]25 [3,]36 Its like converting into Japanese vertical reading. It is not that I cannot get used to it, but I guess it is less error prone if I have the same order as in the data files. Now I am using an ad-hoc function (see below) to transpose back the rotated YX into a XYZ array, but I'd rather go with built-ins, e.g. byrow=T, or t() -- and also without duplicating my data. THanks for the advice in advance. Gabor code transposeXYinXYZ-function(x){ y - array(NA,c(dim(x)[2],dim(x)[1],dim(x)[3])) for(i in 1:dim(x)[3]){ y[,,i] - t(x[,,i]) } return (y) }xyz - array(1:24,c(4,3,2)) yxz - transpose(x) xyz yxz /code __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This e-mail may contain confidential and/or privileged i...{{dropped:10}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulation
On Fri, 15 May 2009 19:17:37 +1000 Kon Knafelman konk2...@hotmail.com wrote: KK I hve the same problem as the initial one, except i need 1000 KK samples of size 15, and my distribution is Exp(1). I've adjusted KK some of the loop formulas for my n=15, but im unsure how to proceed KK in the quickest way. KK Can someone please help? What exactly do you want? Please be more specific about what you did and what does not work. Stefan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to do a pretty panel plot?
Ajay Shah wrote: Here's my best version of your code: ## Data M - structure(list(date = structure(c(13634, 13665, 13695, 13726, 13757, 13787, 13818, 13848, 13879, 13910, 13939, 13970, 14000, 14031, 14061, 14092, 14123, 14153, 14184, 14214, 14245, 14276, 14304, 14335), class = Date), cospi = c(1987.31, 2033.37, 2140.13, 2120.66, 2427.09, 2917.7, 2915.28, 3262.06, 2616.26, 2617.75, 2277.69, 2538.13, 2374.09, 1911.22, 2063.73, 2081.28, 1813.58, 1304.96, 1219.73, 1361.74, 1299.2, 1242.74, 1339.18, 1557.29), cospi.PE = c(19.2, 19.69, 20.13, 24.08, 27.61, 30.9, 30.69, 34.92, 26.95, 27.63, 23.86, 26.14, 23.72, 19.5, 23.43, 23.73, 20.69, 16.4, 16.12, 18.04, 18.46, 18.86, 20.24, 23.53)), .Names = c(date, cospi, cospi.PE), row.names = 209:232, class = data.frame) ## Set up par's to make 2 panel chart par(bty=l); par(ps=10) par(mfrow=c(2,1)) # try to get two plots, one above the other par(mar=c(0,4,0,1)) ## Set par(mar) to eliminate X axis gap par(oma=c(2,2,2,2)) ## Make Plot 1 plot(M$date, M$cospi, type=l, log=y, xaxs=i, yaxs=i, axes=F, lwd=2, ylab=Cospi level) axis(1, col=grey, at=NULL, labels=FALSE) axis(2, col=black, labels=TRUE) axis(3, col=grey, labels=TRUE) grid(col = lightgrey, lty=1) box(col = grey) ## Adjust par(mar) for 2nd plot par(mar=c(2,4,0,1)) ## Second plot plot(M$date, M$cospi.PE, type=l, col=black, log=y, xaxs=i, yaxs=i, axes=F, lwd=2, ylab=Cospi P/E) axis(2, col=black, at=NULL, labels=T) axis(1, col=lightgrey, at=NULL, labels=T) grid(col = lightgrey, lty=1) box(col = grey) I think it's better if the lines are above the grid: ## Data M - structure(list(date = structure(c(13634, 13665, 13695, 13726, 13757, 13787, 13818, 13848, 13879, 13910, 13939, 13970, 14000, 14031, 14061, 14092, 14123, 14153, 14184, 14214, 14245, 14276, 14304, 14335), class = Date), cospi = c(1987.31, 2033.37, 2140.13, 2120.66, 2427.09, 2917.7, 2915.28, 3262.06, 2616.26, 2617.75, 2277.69, 2538.13, 2374.09, 1911.22, 2063.73, 2081.28, 1813.58, 1304.96, 1219.73, 1361.74, 1299.2, 1242.74, 1339.18, 1557.29), cospi.PE = c(19.2, 19.69, 20.13, 24.08, 27.61, 30.9, 30.69, 34.92, 26.95, 27.63, 23.86, 26.14, 23.72, 19.5, 23.43, 23.73, 20.69, 16.4, 16.12, 18.04, 18.46, 18.86, 20.24, 23.53)), .Names = c(date, cospi, cospi.PE), row.names = 209:232, class = data.frame) ## Set up par's to make 2 panel chart par(bty=l) par(ps=10) par(mfrow=c(2,1)) # try to get two plots, one above the other par(mar=c(0,4,0,1)) ## Set par(mar) to eliminate X axis gap par(oma=c(2,2,2,2)) ## Make Plot 1 plot(M$date, M$cospi, type=l, log=y, xaxs=i, yaxs=i, axes=F, lwd=0, ylab=Cospi level) grid(col = lightgrey, lty=1) lines(M$date, M$cospi, type=l, lwd=2) axis(1, col=grey, at=NULL, labels=FALSE) axis(2, col=black, labels=TRUE) axis(3, col=grey, labels=TRUE) box(col = grey) ## Adjust par(mar) for 2nd plot par(mar=c(2,4,0,1)) ## Second plot plot(M$date, M$cospi.PE, type=l, col=black, log=y, xaxs=i, yaxs=i, axes=F, lwd=0, ylab=Cospi P/E) grid(col = lightgrey, lty=1) lines(M$date, M$cospi.PE, col=black, lwd=2) axis(2, col=black, at=NULL, labels=T) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SQL Queries from Multiple Servers
Hi. Depending on your requirements, one option would be to do the join in R using merge() If you wish to run SQL joins across multiple databases, then it is not an R problem but a database problem. For a quick solution, I would write scripts that bring all your data together into one database (could be written in any scripting language, and of course R) and then process from there. Bw Mark 2009/5/13 Tom Schenk Jr tomschen...@gmail.com: I use RODBC as my conduit from R to SQL. It works well when the tables are stored on one channel, e.g., channel - odbcConnect(data_base_01, uid=, dsn=) However, I often need to match tables across multiple databases, e.g., data_base_01 and data_base_02. However, odbcConnect() appears limited insofar as you may only query from tables within a single channel, e.g., database. I do not have access to write and create new tables on the SQL servers, which is a possible solution (e.g., copy all tables into a single database). Is there any way, in RODBC or another R-friendly SQL package, to perform SQL operations across multiple databases? Warm regards. -- Tom Schenk Jr. tomschen...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr. Mark Wardle Specialist registrar, Neurology Cardiff, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] web interface for R script??
Dear All, I requiered your feedbacks about a web interface for R scripts. I already tested RGG ( but it's not web). and two of the CRAN list : Rserve Rpad. However, Rpad requieres some knowledge in Javascript, php etc... and with Rserve I have to create a web interface entirely. Rwui from the cran list seems attractive. Did you ever test this one ? Other suggestions are welcomed too ^^ Thanks, - Martial _ Découvrez toutes les possibilités de communication avec vos proches [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ego net and merge networkss
Dear R-Help Members, I am working on an analysis of a social network of web pages. Therefore I use the STATNET package which is such a good sna package. Thank you for developing it! But now I came to a point where my R skills are not good enough for what I want. So I am asking you if you might help me. The Problem: I have a network object and calculated the degree centrality (freeman) for all vertexes. Now I select the first 5 vertexes with the highest degree centrality to take a closer look at their ego networks. For the ego network analysis I tried and tried for 3 day with ego.extract, sapply and gapply but I couldn`t do it. My question: I want to look at the development of the relation between the number of egos related to the number of alters they reach with directly. Let's say with one ego (the vertex with the highest degree centrality) I reach 15 alters in my network. Now I want to combine the ego networks of the vertex with the highest and the second highest degree centrality and loo how many alters these two egos combined can rech. Than I want to look at the three vertexes with the highest degree centralities ... and so on. At the end I want to develop a data.frame which lists in the first column the number of egos (increasing from 1...N) , in the second column the number of alters which are reached by the egos, the third column the number of edges and in the third column the density of the subnetwork. Than I want to decide which ego-combination-networks seems to be the best and want to gplot it. Unfortunately my R skills are limited and so I could not program this. I really would appreciated it if you could help me! Thank you in advance. Sincerely yours Martin Klaus (University of Kassel) Example Data Code: m- matrix( c ( 0 , 1 , 1 , 0 , 0 , 0 , 1 , 0 , 0 , 1 , 0 , 1 , 0 , 0 , 0 , 0 , 1 , 0 , 1 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 , 0 , 1 , 1 , 1 , 0 , 0 , 0 , 0 , 0 , 1 , 0 , 1 , 1 , 0 , 0 , 0 , 0 , 0 , 1 , 1 , 0 , 1 , 0 , 1 , 0 , 0 , 0 , 1 , 1 , 1 , 0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ) , ncol=9) diag(m) - 0 g - network(m, vertex.attrnames=c(a,a,a,a,a)) summary(g) g degree(g) sort(degree(g)) #-vertex 7 has the highest degree eg-ego.extract(g) #- extract and visualize the ego network of vertex 4 eg$`7` gplot(eg$`7`) str(eg$`7`) #- Now I need to count the number of alters and edges and save the results in the first row of a data.frame #- Than I need to combine the ego networks of vertex 7 and vertex 6 and look how many alters these two egos reach together #- The results need to saved in the second row of the data.fram ... EXAMPLE data.frame: #Egos / #Alters / #edges / ego.net.density 1 / 8 / ... [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Intel® Core™2 Quad Processors
Introducing the Intel® Core™2 Quad processor for desktop PCs, designed to handle massive compute and visualization workloads enabled by powerful multi-core technology. Providing all the bandwidth you need for next-generation highly-threaded applications, the latest four-core Intel Core 2 Quad processors are built on 45nm Intel® Core™ microarchitecture enabling faster, cooler, and quieter desktop PC and workstation experiences. www.infoaboutintelprocessor.blogspot.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using sample to create Training and Test sets
Forgive the newbie question, I want to select random rows from my data.frame to create a test set (which I can do) but then I want to create a training set using whats left over. Example code: acc - read.table(accOUT.txt, header=T, sep = ,, row.names=1) #select 400 random rows in data training - acc[sample(1:nrow(acc), 400, replace=TRUE),] #try to get whats left of acc not in training testset - acc[-training, ] Fails with the following error Error: invalid subscript type In addition: Warning message: - not meaningful for factors in: Ops.factor(left) I then try. testset - acc[!training, ] Which gives me the warning message ! not meaningful for factors in: Ops.factor(left) And if i look at testset It is 400 rows of NA's ... which clearly isn't right. Can anyone tell me what I'm doing wrong. Thanks in advance Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Intel® Core™2 Quad Processors
Introducing the Intel® Core™2 Quad processor for desktop PCs, designed to handle massive compute and visualization workloads enabled by powerful multi-core technology. Providing all the bandwidth you need for next-generation highly-threaded applications, the latest four-core Intel Core 2 Quad processors are built on 45nm Intel® Core™ microarchitecture enabling faster, cooler, and quieter desktop PC and workstation experiences. for more info www.infoaboutintelprocessor.blogspot.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with kalman-filterd betas using the dlm package
I have studied both the vinguette and other material I've been able to get my hands on and Im starting to get a better understanding. And I'm defenitly going to buy Petris, Petrone, and Campagnoli (2009) Dynamic Linear Models with R. But that's not publish yet so I 'm not getting much help there. This is the set-up i am using y[t] = a[t] + b*x[t] + V[t], a[t] = a[t-1] + W[t,a] b[t] = b[t-1] + W[t,b] V[t] ~ N(0,V) W[t] ~ N(0,W) W = blockdiag(W[a],W[b]) V could be estimated from the data with a non-diagonal variance matrix of the returns, W would be the same estimated in the same way but where the effect of past betas in the transition taken into account. But how do I estimate that matrix, is that done with a MLE,SUR or some other statistical teqnique. Im also assuming in this example that a[t] are time invariant, which gives W[a] = 0 Appriciate any guidence. Regards Tom spencerg wrote: Have you worked through vignette('dlm')? Vignettes are nice because they provide an Adobe Acrobat Portable Document Format (pdf) file with a companion R script file, which you can get as follows: (dlm. - vignette('dlm')) Stangle(dlm.$file) The first of these two lines opens the pdf file. The second creates a file dlm.R in the working directory (getwd()) containing the R commands discussed in the pdf file. If I remember correctly, your question is answered in this vignette. You may also be interested in a book that is soon to appear about this package: Petris, Petrone, and Campagnoli (2009) Dynamic Linear Models with R (Springer; http://www.amazon.com/Dynamic-Linear-Models-R-Use/dp/0387772375/ref=sr_1_4?ie=UTF8s=booksqid=1242162708sr=1-4), scheduled to ship in late June. If you have long-term interest in this subject, as I suspect you may, you might find this book interesting and useful. Hope this helps. Spencer Graves tom81 wrote: Hi all R gurus out there, Im a kind of newbie to kalman-filters after some research I have found that the dlm package is the easiest to start with. So be patient if some of my questions are too basic. I would like to set up a beta estimation between an asset and a market index using a kalman-filter. Much littarture says it gives superior estimates compared to OLS estimates. So I would like to learn and to use the filter. I would like to run two types of kalman-filters, one with using a random-walk model (RW) and one with a stationary model, in other worlds the transition equition either follow a RW or AR(1) model. This is how I think it would be set up; I will have my time-series Y,X, where Y is the response variable this setup should give me a RW process if I have understood the example correctly mydlmModel = dlmModReg(X) + dlmModPoly(order=1) and then run on the dlm model dlmFilter(Y,mydlmModel ) but setting up a AR(1) process is unclear, should I use dlmModPoly or the dlmModARMA to set up the model. And at last but not the least, how do I set up a proper build function to use with dlmMLE to optimize the starting values. Regards Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Help-with-kalman-filterd-betas-using-the-dlm-package-tp23473796p2376.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems intalling on Suse 10.3 x86_64 OS
On Thu, 14 May 2009 12:32:18 -0700 (PDT) PDXRugger j_r...@hotmail.com wrote: P P Alright, i am unsure of the posting rules for these types of P questions but i will be as help ful as possible. My windows based P system cant handle a model i am running so i am trying to install R Why? To many data? P on our Linux based machine but i have encountered the following and P i dont know linux much but my intuition is that i need to install P some other files first. Any thoughts? P There are no installable providers of libtcl8.5.so()(64bit) for It seems that you need a more recent tcl/tk Installation. Suse 10.3 is a little bit old. Maybe 8.5 was not included. So you could see if you find a newer version. Sometimes you can take the repository of the more recent opensuse version and install newer versions from there. Unfortunately yast is very sensitive on dependencies. From my experience the smart package manager is faster and less touchy than Yast/zypper on the older Opensuse systems. Alternatively you could upgrade you distro, install another one or run a live-System. You could run a live Usb-Ubuntu to run your programs. Depends on what time you have and how flexible your admins are... hth Stefan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with kalman-filterd betas using the dlm package
1. Might you look again at section 2. Maximum likelihood estimation of the dlm vignette? It describes how to estimate parameters. 2. Have you started with the code on those 2 pages, confirming that you can make that work and understand what it does? If yes, then try to build code for your problem as a series of small modifications to that example. With luck, this will bring enlightenment. If not, try to express your question in terms of commented, minimal, self-contained code that others can copy into R to replicate what you see then modify to get it to work, as suggested in the posting guide www.R-project.org/posting-guide.html. If someone reading this list can do this in a few seconds, it will increases the chances that you will get a useful reply. Hope this helps. Spencer Graves tom81 wrote: I have studied both the vinguette and other material I've been able to get my hands on and Im starting to get a better understanding. And I'm defenitly going to buy Petris, Petrone, and Campagnoli (2009) Dynamic Linear Models with R. But that's not publish yet so I 'm not getting much help there. This is the set-up i am using y[t] = a[t] + b*x[t] + V[t], a[t] = a[t-1] + W[t,a] b[t] = b[t-1] + W[t,b] V[t] ~ N(0,V) W[t] ~ N(0,W) W = blockdiag(W[a],W[b]) V could be estimated from the data with a non-diagonal variance matrix of the returns, W would be the same estimated in the same way but where the effect of past betas in the transition taken into account. But how do I estimate that matrix, is that done with a MLE,SUR or some other statistical teqnique. Im also assuming in this example that a[t] are time invariant, which gives W[a] = 0 Appriciate any guidence. Regards Tom spencerg wrote: Have you worked through vignette('dlm')? Vignettes are nice because they provide an Adobe Acrobat Portable Document Format (pdf) file with a companion R script file, which you can get as follows: (dlm. - vignette('dlm')) Stangle(dlm.$file) The first of these two lines opens the pdf file. The second creates a file dlm.R in the working directory (getwd()) containing the R commands discussed in the pdf file. If I remember correctly, your question is answered in this vignette. You may also be interested in a book that is soon to appear about this package: Petris, Petrone, and Campagnoli (2009) Dynamic Linear Models with R (Springer; http://www.amazon.com/Dynamic-Linear-Models-R-Use/dp/0387772375/ref=sr_1_4?ie=UTF8s=booksqid=1242162708sr=1-4), scheduled to ship in late June. If you have long-term interest in this subject, as I suspect you may, you might find this book interesting and useful. Hope this helps. Spencer Graves tom81 wrote: Hi all R gurus out there, Im a kind of newbie to kalman-filters after some research I have found that the dlm package is the easiest to start with. So be patient if some of my questions are too basic. I would like to set up a beta estimation between an asset and a market index using a kalman-filter. Much littarture says it gives superior estimates compared to OLS estimates. So I would like to learn and to use the filter. I would like to run two types of kalman-filters, one with using a random-walk model (RW) and one with a stationary model, in other worlds the transition equition either follow a RW or AR(1) model. This is how I think it would be set up; I will have my time-series Y,X, where Y is the response variable this setup should give me a RW process if I have understood the example correctly mydlmModel = dlmModReg(X) + dlmModPoly(order=1) and then run on the dlm model dlmFilter(Y,mydlmModel ) but setting up a AR(1) process is unclear, should I use dlmModPoly or the dlmModARMA to set up the model. And at last but not the least, how do I set up a proper build function to use with dlmMLE to optimize the starting values. Regards Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with loops
Hi I am trying to create a loop which averages replicates in my data. The original data has many rows. and consists of 40 column zz[,2:41] plus row headings in zz[,1] I am trying to average each set of values (i.e. zz[1,2:3] averaged and placed in average_value[1,2] and so on. below is my script but it seems to be stuck in an endless loop Any suggestions?? for (i in 1:length(average_value[,1])) { average_value[i] - i^100; print(average_value[i]) #calculates Meanss #Sample A average_value[i,2] - rowMeans(zz[i,2:3]) average_value[i,3] - rowMeans(zz[i,4:5]) average_value[i,4] - rowMeans(zz[i,6:7]) average_value[i,5] - rowMeans(zz[i,8:9]) average_value[i,6] - rowMeans(zz[i,10:11]) #Sample B average_value[i,7] - rowMeans(zz[i,12:13]) average_value[i,8] - rowMeans(zz[i,14:15]) average_value[i,9] - rowMeans(zz[i,16:17]) average_value[i,10] - rowMeans(zz[i,18:19]) average_value[i,11] - rowMeans(zz[i,20:21]) #Sample C average_value[i,12] - rowMeans(zz[i,22:23]) average_value[i,13] - rowMeans(zz[i,24:25]) average_value[i,14] - rowMeans(zz[i,26:27]) average_value[i,15] - rowMeans(zz[i,28:29]) average_value[i,16] - rowMeans(zz[i,30:31]) #Sample D average_value[i,17] - rowMeans(zz[i,32:33]) average_value[i,18] - rowMeans(zz[i,34:35]) average_value[i,19] - rowMeans(zz[i,36:37]) average_value[i,20] - rowMeans(zz[i,38:39]) average_value[i,21] - rowMeans(zz[i,40:41]) } thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with as.numeric
hi everyone, wondering if you could help me with a novice problem. I have a data frame called subjects with a height and weight variable and want to calculate a bmi variable from the two. i have tried: attach(subjects) bmi - (weight)/((height/100)^2) but it comes up with the error: Warning messages: 1: In Ops.factor(height, 100) : / not meaningful for factors 2: In Ops.factor((weight), ((height/100)^2)) : / not meaningful for factors I presume that this means the vectors height and weight are not in numeric form (confirmed by is.numeric) so i changed the code to: bmi - (as.numeric(weight))/((as.numeric(height)/100)^2) but this just comes up with a result which doesnt make sense i.e. numbers such as 4 within bmi vector. Ive looked at as.numeric(height)/as.numeric(weight) and these numbers just arnt the same as height/weight which is the reason for the incorrect bmi. Cant anyone tell me where I am going wrong? Its quiet frustrating because I cant understand why a function claiming to convert to numeric would come up with such a bizarre result. -- View this message in context: http://www.nabble.com/help-with-as.numeric-tp23558326p23558326.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] need help
Dear all please ,I need to write a function in R to estimate the parameters of negative binomial distribution and then calculate the loglikelihood amount for given data.Is there any one to help me. thank you very much for any help Best regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitdistr for t distribution
Thanks Jorge, but I still don't understand where they come from. when I use: fitdistr(mydata, t, df = 9) and get values for m and s, and the variance of my data should be the df/s? I jsut want to be able to confirm how m and s are calculated mydt - function(x, m, s, df) dt((x-m)/s, df)/s fitdistr(x2, mydt, list(m = 0, s = 1), df = 9, lower = c(-Inf, 0)) Thanks anyway for the help! Jorge Ivan Velez wrote: Dear lagreene, See the second example in require(MASS) ?fitdistr HTH, Jorge On Thu, May 14, 2009 at 7:15 PM, lagreene lagreene...@gmail.com wrote: Hi, I was wondering if anyone could tell me how m and s are calculated for a t distribution? I thought m was the sample mean and s the standard deviation- but obviously I'm wrong as this doesn'y give the same answer. Thank you -- View this message in context: http://www.nabble.com/fitdistr-for-t-distribution-tp23550779p23550779.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/fitdistr-for-t-distribution-tp23550779p23557778.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] transposing/rotating XY in a 3D array
Dear Kushantha, Thank you very much. Very nice, indeed. Gabor __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using sample to create Training and Test sets
Note that the single split sample technique is not competitive with other approaches unless the sample size exceeds around 20,000. Frank Chris Arthur wrote: Forgive the newbie question, I want to select random rows from my data.frame to create a test set (which I can do) but then I want to create a training set using whats left over. Example code: acc - read.table(accOUT.txt, header=T, sep = ,, row.names=1) #select 400 random rows in data training - acc[sample(1:nrow(acc), 400, replace=TRUE),] #try to get whats left of acc not in training testset - acc[-training, ] Fails with the following error Error: invalid subscript type In addition: Warning message: - not meaningful for factors in: Ops.factor(left) I then try. testset - acc[!training, ] Which gives me the warning message ! not meaningful for factors in: Ops.factor(left) And if i look at testset It is 400 rows of NA's ... which clearly isn't right. Can anyone tell me what I'm doing wrong. Thanks in advance Chris -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with as.numeric
as.numeric() doesn't convert factors to the explicit value, nor should it. Under what you're expecting, ff you have a factor where the levels are Female and Male, using as.numeric() wouldn't produce anything meaningful. However, as.numeric() does something much smarter. It converts Female to 1, and Male to 2. More generally, if you have n levels, it will produce a vector of values between 1 and n. This is referred to as the 'internal coding.' If you want to convert your height and bmi variables to their numeric values, you need to do as.numeric(as.character(height)) This will get you around the internal coding. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of deanj2k Sent: Friday, May 15, 2009 7:58 AM To: r-help@r-project.org Subject: [R] help with as.numeric hi everyone, wondering if you could help me with a novice problem. I have a data frame called subjects with a height and weight variable and want to calculate a bmi variable from the two. i have tried: attach(subjects) bmi - (weight)/((height/100)^2) but it comes up with the error: Warning messages: 1: In Ops.factor(height, 100) : / not meaningful for factors 2: In Ops.factor((weight), ((height/100)^2)) : / not meaningful for factors I presume that this means the vectors height and weight are not in numeric form (confirmed by is.numeric) so i changed the code to: bmi - (as.numeric(weight))/((as.numeric(height)/100)^2) but this just comes up with a result which doesnt make sense i.e. numbers such as 4 within bmi vector. Ive looked at as.numeric(height)/as.numeric(weight) and these numbers just arnt the same as height/weight which is the reason for the incorrect bmi. Cant anyone tell me where I am going wrong? Its quiet frustrating because I cant understand why a function claiming to convert to numeric would come up with such a bizarre result. -- View this message in context: http://www.nabble.com/help-with-as.numeric-tp23558326p23558326.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. === P Please consider the environment before printing this e-mail Cleveland Clinic is ranked one of the top hospitals in America by U.S. News World Report (2008). Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations. Confidentiality Note: This message is intended for use\...{{dropped:13}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] displaying results
Hi everyone, can anyone tell me how i can change how i display mean(age), i want it to say The mean age of patients within the sample is mean(age) -- View this message in context: http://www.nabble.com/displaying-results-tp23558890p23558890.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulation
On Fri, 15 May 2009 19:17:37 +1000 Kon Knafelman konk2...@hotmail.com wrote: KK I hve the same problem as the initial one, except i need 1000 KK samples of size 15, and my distribution is Exp(1). I've adjusted KK some of the loop formulas for my n=15, but im unsure how to proceed KK in the quickest way. KK Can someone please help? Taking a guess: matrix(rexp(15000,1),ncol=15) ? -- View this message in context: http://www.nabble.com/Simulation-tp23556274p23558953.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to do a pretty panel plot?
M - (structure(list(date = structure(c(13634, 13665, 13695, 13726, 13757, 13787, 13818, 13848, 13879, 13910, 13939, 13970, 14000, 14031, 14061, 14092, 14123, 14153, 14184, 14214, 14245, 14276, 14304, 14335), class = Date), cospi = c(1987.31, 2033.37, 2140.13, 2120.66, 2427.09, 2917.7, 2915.28, 3262.06, 2616.26, 2617.75, 2277.69, 2538.13, 2374.09, 1911.22, 2063.73, 2081.28, 1813.58, 1304.96, 1219.73, 1361.74, 1299.2, 1242.74, 1339.18, 1557.29), cospi.PE = c(19.2, 19.69, 20.13, 24.08, 27.61, 30.9, 30.69, 34.92, 26.95, 27.63, 23.86, 26.14, 23.72, 19.5, 23.43, 23.73, 20.69, 16.4, 16.12, 18.04, 18.46, 18.86, 20.24, 23.53)), .Names = c(date, cospi, cospi.PE), row.names = 209:232, class = data.frame)) library(ggplot2) a - melt.data.frame(M, id.var=date) qplot(date, value, data=a, geom=line)+facet_wrap(~variable, ncol=1, scales=free) how about this and much simpler code. If you add the theme_bw argument it looks more similar to your plot, but there are some bugs. Stephen Sefick On Fri, May 15, 2009 at 6:12 AM, Jakson Alves de Aquino jaksonaqu...@gmail.com wrote: Ajay Shah wrote: Here's my best version of your code: ## Data M - structure(list(date = structure(c(13634, 13665, 13695, 13726, 13757, 13787, 13818, 13848, 13879, 13910, 13939, 13970, 14000, 14031, 14061, 14092, 14123, 14153, 14184, 14214, 14245, 14276, 14304, 14335), class = Date), cospi = c(1987.31, 2033.37, 2140.13, 2120.66, 2427.09, 2917.7, 2915.28, 3262.06, 2616.26, 2617.75, 2277.69, 2538.13, 2374.09, 1911.22, 2063.73, 2081.28, 1813.58, 1304.96, 1219.73, 1361.74, 1299.2, 1242.74, 1339.18, 1557.29), cospi.PE = c(19.2, 19.69, 20.13, 24.08, 27.61, 30.9, 30.69, 34.92, 26.95, 27.63, 23.86, 26.14, 23.72, 19.5, 23.43, 23.73, 20.69, 16.4, 16.12, 18.04, 18.46, 18.86, 20.24, 23.53)), .Names = c(date, cospi, cospi.PE), row.names = 209:232, class = data.frame) ## Set up par's to make 2 panel chart par(bty=l); par(ps=10) par(mfrow=c(2,1)) # try to get two plots, one above the other par(mar=c(0,4,0,1)) ## Set par(mar) to eliminate X axis gap par(oma=c(2,2,2,2)) ## Make Plot 1 plot(M$date, M$cospi, type=l, log=y, xaxs=i, yaxs=i, axes=F, lwd=2, ylab=Cospi level) axis(1, col=grey, at=NULL, labels=FALSE) axis(2, col=black, labels=TRUE) axis(3, col=grey, labels=TRUE) grid(col = lightgrey, lty=1) box(col = grey) ## Adjust par(mar) for 2nd plot par(mar=c(2,4,0,1)) ## Second plot plot(M$date, M$cospi.PE, type=l, col=black, log=y, xaxs=i, yaxs=i, axes=F, lwd=2, ylab=Cospi P/E) axis(2, col=black, at=NULL, labels=T) axis(1, col=lightgrey, at=NULL, labels=T) grid(col = lightgrey, lty=1) box(col = grey) I think it's better if the lines are above the grid: ## Data M - structure(list(date = structure(c(13634, 13665, 13695, 13726, 13757, 13787, 13818, 13848, 13879, 13910, 13939, 13970, 14000, 14031, 14061, 14092, 14123, 14153, 14184, 14214, 14245, 14276, 14304, 14335), class = Date), cospi = c(1987.31, 2033.37, 2140.13, 2120.66, 2427.09, 2917.7, 2915.28, 3262.06, 2616.26, 2617.75, 2277.69, 2538.13, 2374.09, 1911.22, 2063.73, 2081.28, 1813.58, 1304.96, 1219.73, 1361.74, 1299.2, 1242.74, 1339.18, 1557.29), cospi.PE = c(19.2, 19.69, 20.13, 24.08, 27.61, 30.9, 30.69, 34.92, 26.95, 27.63, 23.86, 26.14, 23.72, 19.5, 23.43, 23.73, 20.69, 16.4, 16.12, 18.04, 18.46, 18.86, 20.24, 23.53)), .Names = c(date, cospi, cospi.PE), row.names = 209:232, class = data.frame) ## Set up par's to make 2 panel chart par(bty=l) par(ps=10) par(mfrow=c(2,1)) # try to get two plots, one above the other par(mar=c(0,4,0,1)) ## Set par(mar) to eliminate X axis gap par(oma=c(2,2,2,2)) ## Make Plot 1 plot(M$date, M$cospi, type=l, log=y, xaxs=i, yaxs=i, axes=F, lwd=0, ylab=Cospi level) grid(col = lightgrey, lty=1) lines(M$date, M$cospi, type=l, lwd=2) axis(1, col=grey, at=NULL, labels=FALSE) axis(2, col=black, labels=TRUE) axis(3, col=grey, labels=TRUE) box(col = grey) ## Adjust par(mar) for 2nd plot par(mar=c(2,4,0,1)) ## Second plot plot(M$date, M$cospi.PE, type=l, col=black, log=y, xaxs=i, yaxs=i, axes=F, lwd=0, ylab=Cospi P/E) grid(col = lightgrey, lty=1) lines(M$date, M$cospi.PE, col=black, lwd=2) axis(2, col=black, at=NULL, labels=T) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick Let's not spend our time and resources thinking about things that are so little or so large that all they really do for
Re: [R] displaying results
Read the posting guide please. Self-contained, minimal, reproducible code. On Fri, May 15, 2009 at 8:33 AM, deanj2k dl...@le.ac.uk wrote: Hi everyone, can anyone tell me how i can change how i display mean(age), i want it to say The mean age of patients within the sample is mean(age) -- View this message in context: http://www.nabble.com/displaying-results-tp23558890p23558890.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need help
Read about glm by typing ?glm There are tons of books and pdfs out there to show you the basics. http://cran.r-project.org/other-docs.html HTH, Si. - Original Message - From: H Z zamani_...@yahoo.com To: r-help@r-project.org Sent: Friday, May 15, 2009 12:26 PM Subject: [R] need help Dear all please ,I need to write a function in R to estimate the parameters of negative binomial distribution and then calculate the loglikelihood amount for given data.Is there any one to help me. thank you very much for any help Best regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using sample to create Training and Test sets
Here's one possibility: idx - sample(nrow(acc)) training - acc[idx[1:400], ] testset - acc[-idx[1:400], ] Andy From: Chris Arthur Forgive the newbie question, I want to select random rows from my data.frame to create a test set (which I can do) but then I want to create a training set using whats left over. Example code: acc - read.table(accOUT.txt, header=T, sep = ,, row.names=1) #select 400 random rows in data training - acc[sample(1:nrow(acc), 400, replace=TRUE),] #try to get whats left of acc not in training testset - acc[-training, ] Fails with the following error Error: invalid subscript type In addition: Warning message: - not meaningful for factors in: Ops.factor(left) I then try. testset - acc[!training, ] Which gives me the warning message ! not meaningful for factors in: Ops.factor(left) And if i look at testset It is 400 rows of NA's ... which clearly isn't right. Can anyone tell me what I'm doing wrong. Thanks in advance Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:12}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selecting points on 3D scatterplots
Dear list members, I was out of town when this message arrived and so didn't respond at the time. I did respond to a private email from the poster. Yes, the scatter3d() function in the Rcmdr package can identify points in 3D scatterplots drawn with rgl via the identify3d() function in the same package. Points are identified by right-clicking and dragging. The nice() function is in the car package, one of the suggested packages for the Rcmdr package. John -- original message -- It looks like Rcmdr may be able to select points on 3D scatterplots however when I try to use it's 3dscatter plot function I get the error message: could not find function nice If I copy the code: scatter3d(data$X, data$Z, data$Y, surface=FALSE, residuals=TRUE, bg=white, + axis.scales=TRUE, grid=TRUE, ellipsoid=FALSE, xlab=X, ylab=Z, zlab=Y) into the R console I get the same error message. Sorry I'm new - does anyone know where this missing nice function can be found? I tried using scatterplot3d but it doesn't rotate or zoom - which I need to be able to do to select the data... but thanks for the suggestion! . . . -- John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] displaying results
Absolutely no idea what you mean, Try reconstructing your question in concise English with reproducible code. Simon. - Original Message - From: deanj2k dl...@le.ac.uk To: r-help@r-project.org Sent: Friday, May 15, 2009 1:33 PM Subject: [R] displaying results Hi everyone, can anyone tell me how i can change how i display mean(age), i want it to say The mean age of patients within the sample is mean(age) -- View this message in context: http://www.nabble.com/displaying-results-tp23558890p23558890.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need help
Dear H Z, Take a look at the examples in require(MASS) ?glm.nb This might be useful as well summary(glm.nb(yourvariable ~ 1, data = yourdata)) HTH, Jorge On Fri, May 15, 2009 at 7:26 AM, H Z zamani_...@yahoo.com wrote: Dear all please ,I need to write a function in R to estimate the parameters of negative binomial distribution and then calculate the loglikelihood amount for given data.Is there any one to help me. thank you very much for any help Best regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fw: Help with loops(corrected question)
--- On Fri, 15/5/09, Amit Patel amitrh...@yahoo.co.uk wrote: From: Amit Patel amitrh...@yahoo.co.uk Subject: Help with loops To: r-help@r-project.org Date: Friday, 15 May, 2009, 12:17 PM Hi I am trying to create a loop which averages replicates in my data. The original data has many rows. and consists of 40 column zz[,2:41] plus row headings in zz[,1] I am trying to average each set of values (i.e. zz[1,2:3] averaged and placed in average_value[1,2] and so on. below is my script but it seems to be stuck in an endless loop Any suggestions?? for (i in 1:length(zz[,1])) { #calculates Meanss #Sample A average_value[i,2] - rowMeans(zz[i,2:3]) average_value[i,3] - rowMeans(zz[i,4:5]) average_value[i,4] - rowMeans(zz[i,6:7]) average_value[i,5] - rowMeans(zz[i,8:9]) average_value[i,6] - rowMeans(zz[i,10:11]) #Sample B average_value[i,7] - rowMeans(zz[i,12:13]) average_value[i,8] - rowMeans(zz[i,14:15]) average_value[i,9] - rowMeans(zz[i,16:17]) average_value[i,10] - rowMeans(zz[i,18:19]) average_value[i,11] - rowMeans(zz[i,20:21]) #Sample C average_value[i,12] - rowMeans(zz[i,22:23]) average_value[i,13] - rowMeans(zz[i,24:25]) average_value[i,14] - rowMeans(zz[i,26:27]) average_value[i,15] - rowMeans(zz[i,28:29]) average_value[i,16] - rowMeans(zz[i,30:31]) #Sample D average_value[i,17] - rowMeans(zz[i,32:33]) average_value[i,18] - rowMeans(zz[i,34:35]) average_value[i,19] - rowMeans(zz[i,36:37]) average_value[i,20] - rowMeans(zz[i,38:39]) average_value[i,21] - rowMeans(zz[i,40:41]) } thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with as.numeric
On May 15, 2009, at 6:57 AM, deanj2k wrote: hi everyone, wondering if you could help me with a novice problem. I have a data frame called subjects with a height and weight variable and want to calculate a bmi variable from the two. i have tried: attach(subjects) bmi - (weight)/((height/100)^2) but it comes up with the error: Warning messages: 1: In Ops.factor(height, 100) : / not meaningful for factors 2: In Ops.factor((weight), ((height/100)^2)) : / not meaningful for factors I presume that this means the vectors height and weight are not in numeric form (confirmed by is.numeric) so i changed the code to: bmi - (as.numeric(weight))/((as.numeric(height)/100)^2) but this just comes up with a result which doesnt make sense i.e. numbers such as 4 within bmi vector. Ive looked at as.numeric(height)/as.numeric(weight) and these numbers just arnt the same as height/weight which is the reason for the incorrect bmi. Cant anyone tell me where I am going wrong? Its quiet frustrating because I cant understand why a function claiming to convert to numeric would come up with such a bizarre result. That 'height' is a factor suggests that you imported the data using one of the read.table() family of functions and that there are non- numeric characters in at least one of the entries in that column. Since 'height' is a factor, if you use as.numeric(), you will get numeric values returned that are the factor level numeric codes and not the expected numeric values. That is why you are getting bad values for BMI. See: http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f If you use something like: grep([^0-9\\.], height, value = TRUE) that should show you where you have non-numeric values in the 'height' column. That is, entries for 'height' that contain characters other than numeric or a decimal. Foe example: height - factor(c(seq(0, 1, 0.1), 1,10, letters[1:5])) height [1] 00.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11,10 a bcde Levels: 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1,10 a b c d e as.numeric(height) [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 grep([^0-9\\.], height, value = TRUE) [1] 1,10 abcde I would also check the 'weight' column for the same reasons, to be sure that you don't have bad data there. Another approach would be to use: str(subjects) which will give you a sense of the data types for each column in your data frame. Review each column and take note of any columns that should be numeric, but are factors. See ?str, ?grep and ?regex for more information. You might also want to look at ?type.convert, which is the function used by the read.table() family of functions to determine the data types for each column during import. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can you tell what .Random.seed *was*?
Duncan Murdoch wrote: 1) can you tell me what my original set.seed() value was? (I wouldn't be able to figure it out, but maybe someone can) The only way I know is to test all 2^32 possible values of the seed. I think cryptographers would know faster ways. Well, I'm not a cryptographer, but I know a faster way: rainbow tables. http://en.wikipedia.org/wiki/Rainbow_table Given that the algorithm to generate these images is known and that each seed always gives the same image as output, you can simply precompute all possible images, hash them using your favorite algorithm -- say, SHA-256 -- and record the seed-to-hash correspondence on disk. Then given an output image, you can hash it and use that to look up the seed. It can take a long time to generate all the images, but then you have database like lookup speeds for image-to-seed correspondence. This is not just a theoretical idea. There are underground sites where you can put in, say, an MD5 password hash and get out the likely password that was actually used. This allows a black hat to break into one site, grab their password hash database, reverse engineer the passwords, and then go use them to bang on the front door of other sites users of that site he first compromised also use. There are defenses against this: salting the passwords and using passwords too big to appear in rainbow tables are easiest. Now, if the seed was removed just before the values were generated, the seed would be generated from the system clock. If you knew the time that this occurred approximately, the search could be a lot faster. This also helps with the rainbow table approach. Given that the seed for the generation algorithm is always the current wall time, you can restrict the needed rainbow table size greatly. You simply have to know when the algorithm was first put into use, then start your rainbow table with that time's value as the first seed, and only compute up to now plus whatever you need for future operations. For instance, you can cover about 3 years worth of image production in about 1/45 the time as it takes to cover all 2^32 possible images. Say it takes a month to generate a rainbow table covering those 3 years. Full coverage would then take nearly 4 years. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] creating and then executing command strings
Hi: I very recently started experimenting with R and am occasionally running into very basic problems that I can't seem to solve. If there is an R-newbies forum that is more appropriate for these kinds of questions, please direct me to it. I'd like to automatically add vectors to a dataframe. I am able to build command strings that would do what I want, but R is not executing them. A simplified example: # Add three vectors called avg_col1, avg_col2, avg_col3 to dataframe df for(colname in c(col1, col2, col3)){ print(paste(df$avg_,colname, - 0;, sep='')) # Just using this to make sure the command is correct paste(avg_,colname, - 0;, sep='') # Does nothing } Output: [1] df$avg_col1 - 0; [1] df$avg_col2 - 0; [1] df$avg_col3 - 0; Thanks for your help! Best - P __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] displaying results
On 5/15/2009 8:33 AM, deanj2k wrote: Hi everyone, can anyone tell me how i can change how i display mean(age), i want it to say The mean age of patients within the sample is mean(age) I think you want something like this: cat(sprintf(The mean age of patients within the sample is %.1f.\n, mean(age))) Play with the %.1f format for more decimal places, etc. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating and then executing command strings
Hi, You can either parse and eval the string you are making, as in: eval( parse( text = paste(avg_,colname, - 0;, sep='') ) ) Or you can do something like this: df[[ paste( avg_, colname, sep = ) ]] - 0 Romain Philipp Schmidt wrote: Hi: I very recently started experimenting with R and am occasionally running into very basic problems that I can't seem to solve. If there is an R-newbies forum that is more appropriate for these kinds of questions, please direct me to it. I'd like to automatically add vectors to a dataframe. I am able to build command strings that would do what I want, but R is not executing them. A simplified example: # Add three vectors called avg_col1, avg_col2, avg_col3 to dataframe df for(colname in c(col1, col2, col3)){ print(paste(df$avg_,colname, - 0;, sep='')) # Just using this to make sure the command is correct paste(avg_,colname, - 0;, sep='') # Does nothing } Output: [1] df$avg_col1 - 0; [1] df$avg_col2 - 0; [1] df$avg_col3 - 0; Thanks for your help! Best - P __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Romain Francois Independent R Consultant +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] displaying results
Duncan Murdoch wrote: On 5/15/2009 8:33 AM, deanj2k wrote: Hi everyone, can anyone tell me how i can change how i display mean(age), i want it to say The mean age of patients within the sample is mean(age) I think you want something like this: cat(sprintf(The mean age of patients within the sample is %.1f.\n, mean(age))) or maybe cat(sprintf(The mean age of patients within the sample is %.1f.\n, round(mean(age), 1))) Play with the %.1f format for more decimal places, etc. ... and be aware of excel bugs. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Issue displaying legend for large data
Hi, We are working on R project with the latest version 2.9.0. We are using matplot and barplot functions to draw different graphs. End user may generate graphs for large number of data. Also, each point to be plotted may have large name (around 170 characters). These names (Y axis points) need to be displayed in legend for the graph. However, it is not possible to fit these large names in legend on a R window when large number of points are selected for trending. We tried setting the font and window size for the graphs using the graphical parameters. However, it did not help for large number of points having long names. Further, we tried using R packages tcltk and tkrplot to display graphs and legend in a Tk widget instead of R window. We are able to display the full description of plotting points on click on the corresponding point style. However, we are not able to save/export this graph(widget) in some format. Currently, we are displaying the legend for the points in a separate R window. But, it does not seem to be associated with graphs generated. We need to have the actual graph and legend associated with it on a same window with all the plotted points and point styles. Is there any other way to solve this display issue of large number of data used for plotting? Thanks in advance. Regards, Anisha Sinnarkar DISCLAIMER =\ == =\ This...{{dropped:10}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Additional points to scatter plot show up at wrong place
Hi everyone, I have a problem with adding points to scatter plots. The plot is drawn with the scatterplot() function from the car library: scatterplot(data[,2] ~ data[,1], data=data,smooth=F,reg.line=F,xlim=c(0.5,1),ylim=c(0.5,1),ylab=ML,xlab=Freq,cex.lab=1.9,cex.axis=1.8) after that, I draw one line with abline(0,1,col=gray20) which works perfectly fine. now I want to add, say the point (0.6,0.6) to the plot with points(c(0.6),c(0.6)). The point is plotted, but not exactly at the proper coordinates, but at something like (0.55,0.55). When I use the plot() function to make the scatter plots, this problem does not occur, but I want to have those nice box plots next to the X and Y-axes that are drawn by scatterplot().. Anybody has an idea how to get the points at the right place in the plot? cheers, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] warning message while installing a package
Dear all I was trying to install the package ISwR and got the following message. I was connected to the internet. Warning: unable to access index for repository http://cms.unipune.ernet.in/computing/cran/bin/windows/contrib/2.8 Please help. regards M. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] warning message while installing a package
On 5/15/2009 11:04 AM, meenus...@gmail.com wrote: Dear all I was trying to install the package ISwR and got the following message. I was connected to the internet. Warning: unable to access index for repository http://cms.unipune.ernet.in/computing/cran/bin/windows/contrib/2.8 That's a problem connecting to the mirror. Try a different one. (You can do this from the menus in the GUI, or from the console using chooseCRANmirror().) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] warning message while installing a package
On May 15, 2009, at 9:04 AM, Duncan Murdoch wrote: On 5/15/2009 11:04 AM, meenus...@gmail.com wrote: Dear all I was trying to install the package ISwR and got the following message. I was connected to the internet. Warning: unable to access index for repository http://cms.unipune.ernet.in/computing/cran/bin/windows/contrib/2.8 That's a problem connecting to the mirror. Try a different one. (You can do this from the menus in the GUI, or from the console using chooseCRANmirror().) It's not a problem connecting, but either a permissions issue or there is just nothing there: http://cms.unipune.ernet.in/computing/cran That URL is 'Not Found'. I don't see that mirror (or any mirrors in India) on the 'official' mirror list, but if legit, might be worthwhile contacting the mirror Admin to see what's up. That being said, definitely use a different mirror in the mean time. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with loops
I'm not quite sure what you want to do, but this might help: d=data.frame(replicate(40, rnorm(20))) d$sample=rep(c('a','b','c','d'),each=5) lib(doBy) summaryBy(.~sample,da=d) David Freedman Amit Patel-7 wrote: Hi I am trying to create a loop which averages replicates in my data. The original data has many rows. and consists of 40 column zz[,2:41] plus row headings in zz[,1] I am trying to average each set of values (i.e. zz[1,2:3] averaged and placed in average_value[1,2] and so on. below is my script but it seems to be stuck in an endless loop Any suggestions?? for (i in 1:length(average_value[,1])) { average_value[i] - i^100; print(average_value[i]) #calculates Meanss #Sample A average_value[i,2] - rowMeans(zz[i,2:3]) average_value[i,3] - rowMeans(zz[i,4:5]) average_value[i,4] - rowMeans(zz[i,6:7]) average_value[i,5] - rowMeans(zz[i,8:9]) average_value[i,6] - rowMeans(zz[i,10:11]) #Sample B average_value[i,7] - rowMeans(zz[i,12:13]) average_value[i,8] - rowMeans(zz[i,14:15]) average_value[i,9] - rowMeans(zz[i,16:17]) average_value[i,10] - rowMeans(zz[i,18:19]) average_value[i,11] - rowMeans(zz[i,20:21]) #Sample C average_value[i,12] - rowMeans(zz[i,22:23]) average_value[i,13] - rowMeans(zz[i,24:25]) average_value[i,14] - rowMeans(zz[i,26:27]) average_value[i,15] - rowMeans(zz[i,28:29]) average_value[i,16] - rowMeans(zz[i,30:31]) #Sample D average_value[i,17] - rowMeans(zz[i,32:33]) average_value[i,18] - rowMeans(zz[i,34:35]) average_value[i,19] - rowMeans(zz[i,36:37]) average_value[i,20] - rowMeans(zz[i,38:39]) average_value[i,21] - rowMeans(zz[i,40:41]) } thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Help-with-loops-tp23558647p23560599.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Additional points to scatter plot show up at wrong place
On Fri, 15 May 2009 15:43:33 +0200 Peter Menzel pmen...@googlemail.com wrote: PM scatterplot(data[,2] ~ data[,1], PM data=data,smooth=F,reg.line=F,xlim=c(0.5,1),ylim=c(0.5,1),ylab=ML,xlab=Freq,cex.lab=1.9,cex.axis=1.8) Side remark: you don't need do data[,2] if you have specified data=data as you did. So var1~var2 would be enough. PM after that, I draw one line with abline(0,1,col=gray20) which PM works perfectly fine. abline for me also does not work in the expected way, see below. PM now I want to add, say the point (0.6,0.6) to the plot with PM points(c(0.6),c(0.6)). the c() is not necessary points(0.6,0.6) is enough. PM The point is plotted, but not exactly at the proper coordinates, but PM at something like (0.55,0.55). That seems to be a bug. The axis seems not to be drawn exactly. to replicate see: library(car) data-data.frame(x1=rnorm(100),x2=rnorm(100,.25)) scatterplot(x1~x2,data=data,ylab=ML,xlab=Freq) points(0.5,0.5,col=blue) abline(h=0.5,lty=2) # check whether point is at the correct location. abline(v=0.5,lty=2) abline(h=1) # line is not at 1 at the y-axis! So maybe one can contact the package owner? Btw. creating such a plot by yourself is easy, have a look at ?layout ?axis Stefan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using sample to create Training and Test sets
Forgive the newbie question, I want to select random rows from my data.frame to create a test set (which I can do) but then I want to create a training set using whats left over. The caret package has a function, createDataPartition, that does the split taking into account the distribution of the outcome. This might be good in classification cases where one or more classes have low percentages in the data set. There is more detail in the pdf: http://cran.r-project.org/web/packages/caret/vignettes/caretMisc.pdf and examples in this pdf http://cran.r-project.org/web/packages/caret/vignettes/caretTrain.pdf Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with viewports, print.trellis and more/newpage
Hi Deepayan, Thank you very much for the tip. After removing the 'more' argument and another couple of hours, I finally found something that works for my multi-page multi-graph plots. For documentation, the script is: library(lattice) library(grid) foo - data.frame(x=1:10,y=1:10) # Defines some viewports fulldevice - viewport(x=0, y=0, width=1, height=1, just=c(0,0), name=fulldevice) plotvw - viewport(x=0, y=0, width=1, height=0.95, just=c(0,0), name=plotvw) titlevw - viewport(x=0, y=0.95, width=1, height=0.05, just=c(0,0), name=titlevw) tree - vpTree(fulldevice,vpList(plotvw,titlevw)) for (i in 1:4) { plots - xyplot((i*y)~x,data=foo) grid.newpage() pushViewport(tree) seekViewport(plotvw) print(plots, split=c(1,1,2,4), newpage=FALSE) print(plots, split=c(2,1,2,4), newpage=FALSE) print(plots, split=c(1,2,2,4), newpage=FALSE) print(plots, split=c(2,2,2,4), newpage=FALSE) print(plots, split=c(1,3,2,4), newpage=FALSE) print(plots, split=c(2,3,2,4), newpage=FALSE) print(plots, split=c(1,4,2,4), newpage=FALSE) print(plots, split=c(2,4,2,4), newpage=FALSE) seekViewport(titlevw) grid.text(label = test, just = c(centre,centre), gp = gpar(fontsize = 10, font = 2)) } On Thu, May 14, 2009 at 1:58 PM, Sebastien Bihorel sebastien.biho...@cognigencorp.com wrote: Dear R-users, I have got the following problem. I need to create 4x2 arrays of xyplot's on several pages. The plots are created within a loop and plotted using the print function. It seems that I cannot find the proper grid syntax with my viewports, and the more/newpage arguments. The following script is a simplification but hopefully will suffice to illustrate my problem. Any suggestion from the list would be greatly appreciated. Without looking at it in detail, here's one bit of advice that might help: if you are using pushViewport(), don't use 'more', use only 'newpage', and preferably don't use 'split' either. In particular, if you are using 'more', the first print.trellis() call will always start a new page, and your viewport will be lost. -Deepayan Sebastien # library(lattice) foo - data.frame(x=1:10,y=1:10) for (i in 1:4) {  #isnewpage -   FALSE  plots - xyplot(y~x,data=foo)  pushViewport(viewport(x=0,            y=0,            width=1,            height=0.95,            just=c(0,0)))  print(plots, split=c(1,1,2,4), more=T)#, newpage=isnewpage)  print(plots, split=c(2,1,2,4), more=T)#, newpage=isnewpage)  print(plots, split=c(1,2,2,4), more=T)#, newpage=isnewpage)  print(plots, split=c(2,2,2,4), more=T)#, newpage=isnewpage)  print(plots, split=c(1,3,2,4), more=T)#, newpage=isnewpage)  print(plots, split=c(2,3,2,4), more=T)#, newpage=isnewpage)  print(plots, split=c(1,4,2,4), more=T)#, newpage=isnewpage)  print(plots, split=c(2,4,2,4), more=F)#, newpage=isnewpage)    popViewport()    pushViewport(viewport(x=0,            y=0.95,            width=1,            height=0.05,            just=c(0,0)))   grid.text(label = i,        just = c(centre,centre),        gp = gpar(fontsize = 10, font = 2))   popViewport()    # Updates isnewpage  # isnewpage - TRUE } -- *Sebastien Bihorel, PharmD, PhD* PKPD Scientist Cognigen Corp Email: sebastien.biho...@cognigencorp.com mailto:sebastien.biho...@cognigencorp.com Phone: (716) 633-3463 ext. 323 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] replace % with \%
Dear all, I'm trying to gsub() % with \% with no obvious success. temp1 - c(mean, sd, 0%, 25%, 50%, 75%, 100%) temp1 [1] mean sd 0% 25% 50% 75% 100% gsub(%, \%, temp1, fixed=TRUE) [1] mean sd 0% 25% 50% 75% 100% Warning messages: 1: '\%' is an unrecognized escape in a character string 2: unrecognized escape removed from \% I am not quite sure on how to deal with this error message. I tried the following gsub(%, \\%, temp1, fixed=TRUE) [1] mean sd 0\\% 25\\% 50\\% 75\\% 100\\% Could anyone suggest how to obtain output similar to: [1] mean sd 0\% 25\% 50\% 75\% 100\% Thank you, Liviu -- Do you know how to read? http://www.alienetworks.com/srtest.cfm Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] analysis of circular data with mixed models???
Hi. I am trying to model data on movements (direction) of birds and the response variables are compass directions (0 to 360). I have found two packages CircStats and Circular that can implement linear models for a circular response, which will do what I need for the data set I am currently working on (modeling movements for only 1 species). However, in the near future, I would like to extend my modeling by including multiple species, and treating each species as a random effect. It appears that analysis of circular data using a mixed model approach is possible (see the text Statistical Analysis of Circular Data, Fisher 1996); however, does anyone know of a package in R that implements mixed models for circular data? Cheers -Steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replace % with \%
On May 15, 2009, at 9:46 AM, Liviu Andronic wrote: Dear all, I'm trying to gsub() % with \% with no obvious success. temp1 - c(mean, sd, 0%, 25%, 50%, 75%, 100%) temp1 [1] mean sd 0% 25% 50% 75% 100% gsub(%, \%, temp1, fixed=TRUE) [1] mean sd 0% 25% 50% 75% 100% Warning messages: 1: '\%' is an unrecognized escape in a character string 2: unrecognized escape removed from \% I am not quite sure on how to deal with this error message. I tried the following gsub(%, \\%, temp1, fixed=TRUE) [1] mean sd 0\\% 25\\% 50\\% 75\\% 100\\% Could anyone suggest how to obtain output similar to: [1] mean sd 0\% 25\% 50\% 75\% 100\% Thank you, Liviu Presuming that you might want to output the results to a TeX file for subsequent processing, where the '%' would otherwise be a comment character, the key is not to get a single '\', but a double '\\', so that you then get a single '\' on output: temp1 - c(mean, sd, 0%, 25%, 50%, 75%, 100%) temp2 - gsub(%, %, temp1) temp2 [1] mean sd 0\\% 25\\% 50\\% 75\\% 100\\% cat(temp2) mean sd 0\% 25\% 50\% 75\% 100\% Remember that the single '\' is an escape character, which needs to be doubled. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replacing default axis labels on a plot - SOLUTION
The original problem posed was: On 14/05/2009 7:31 AM, Graves, Gregory wrote: I have 3 columns: flow, month, and monthname, where month is 1-12, and monthname is name of month. I can't get the plot to replace the 1-12 with monthname using ticks.lab. What am I doing wrong? plot(flow~factor(month),xlab=Month,ylab=Total Flow per Month, ylim=c(0,55000), ticks.lab=monthname) Here is the solution to this: # make a boxplot but suppress default labels on x axis with xaxt=n plot(flow~factor(month),xlab=Month,ylab=Total Flow per Month, ylim=c(0,55000), xaxt=n) #NOTE xaxt # create a vector containing month abbrevs with [[1]] suffix as follows month.name-list(c(Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec))[[1]] # place the 12 months on axis 1 (the x axis) as follows: axis(1, at=1:12, labels=month.name) Gregory A. Graves Lead Scientist REstoration COoordination and VERification (RECOVER) Watershed Division South Florida Water Management District Phones: DESK: 561 / 682 - 2429 CELL: 561 / 719 - 8157 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating and then executing command strings
On Fri, May 15, 2009 at 3:38 PM, Romain Francois romain.franc...@dbmail.com wrote: Hi, You can either parse and eval the string you are making, as in: eval( parse( text = paste(avg_,colname, - 0;, sep='') ) ) Or you can do something like this: df[[ paste( avg_, colname, sep = ) ]] - 0 Thanks you so much! I used the first version and it worked. What puzzles me, is that I am not able to use - instead of = (my R book says the two can be exchanged) or break the command into different parts and execute them one after another. I get various error messages when I try: eval( parse( text - paste(avg_,colname, - 0;, sep='') ) ) or text = paste(avg_,colname, - 0;, sep='') parse(text) eval(parse(text)) Anyway, thanks a lot - you greatly improved the likelihood of me not working on the weekend! Best - P __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replace % with \%
Thanks all for the prompt responses. Now Hmisc::latex() no longer generates errors on Rcmdr::numSummary() objects (with `tempa' below being such an object). colnames(tempa$table) - gsub(%, \\%, colnames(tempa$table), fixed=TRUE) latex(tempa$table, cdec=3) Best regards, Liviu On Fri, May 15, 2009 at 5:13 PM, Patrick Burns pbu...@pburns.seanet.com wrote: See 'The R Inferno' page 46. Patrick Burns patr...@burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of The R Inferno and A Guide for the Unwilling S User) Liviu Andronic wrote: Dear all, I'm trying to gsub() % with \% with no obvious success. temp1 - c(mean, sd, 0%, 25%, 50%, 75%, 100%) temp1 [1] mean sd 0% 25% 50% 75% 100% gsub(%, \%, temp1, fixed=TRUE) [1] mean sd 0% 25% 50% 75% 100% Warning messages: 1: '\%' is an unrecognized escape in a character string 2: unrecognized escape removed from \% I am not quite sure on how to deal with this error message. I tried the following gsub(%, \\%, temp1, fixed=TRUE) [1] mean sd 0\\% 25\\% 50\\% 75\\% 100\\% Could anyone suggest how to obtain output similar to: [1] mean sd 0\% 25\% 50\% 75\% 100\% Thank you, Liviu -- Do you know how to read? http://www.alienetworks.com/srtest.cfm Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Additional points to scatter plot show up at wrong place
Peter Menzel wrote: Hi everyone, I have a problem with adding points to scatter plots. The plot is drawn with the scatterplot() function from the car library: scatterplot(data[,2] ~ data[,1], data=data,smooth=F,reg.line=F,xlim=c(0.5,1),ylim=c(0.5,1),ylab=ML,xlab=Freq,cex.lab=1.9,cex.axis=1.8) after that, I draw one line with abline(0,1,col=gray20) which works perfectly fine. now I want to add, say the point (0.6,0.6) to the plot with points(c(0.6),c(0.6)). The point is plotted, but not exactly at the proper coordinates, but at something like (0.55,0.55). scatterplot() is using layout() interńally, so you can't expect this to work. I don't think there's a nice way of going back to a previous subregion. When I use the plot() function to make the scatter plots, this problem does not occur, but I want to have those nice box plots next to the X and Y-axes that are drawn by scatterplot().. Anybody has an idea how to get the points at the right place in the plot? cheers, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with loops
Dear Amit, The following should get you started. What I'm doing is creating an identifiers (g) with the names of the columns you want to group for and then use a combination of apply() and tapply() to get the mean for each row in the levels of g. In your case, you have more columns than I have in my example, but with slightly modifications you can adapt the code below to your needs. See ?apply and ?rep for more information. HTH, Jorge # Some data set.seed(123) X - matrix(rnorm(100), ncol=10) colnames(X) - paste('x',1:10,sep=) rownames(X) - paste('sample_',1:10,sep=) # Defining the groups using seq() g - rep(1:(ncol(X)/2), each = 2 ) # Calculating the means res - t( apply(X, 1, tapply, g, mean) ) res # res[1,1] is the mean for X[1, 1:2] mean(X[1,1:2]) # [1] 0.2408457 On Fri, May 15, 2009 at 8:17 AM, Amit Patel amitrh...@yahoo.co.uk wrote: Hi I am trying to create a loop which averages replicates in my data. The original data has many rows. and consists of 40 column zz[,2:41] plus row headings in zz[,1] I am trying to average each set of values (i.e. zz[1,2:3] averaged and placed in average_value[1,2] and so on. below is my script but it seems to be stuck in an endless loop Any suggestions?? for (i in 1:length(average_value[,1])) { average_value[i] - i^100; print(average_value[i]) #calculates Meanss #Sample A average_value[i,2] - rowMeans(zz[i,2:3]) average_value[i,3] - rowMeans(zz[i,4:5]) average_value[i,4] - rowMeans(zz[i,6:7]) average_value[i,5] - rowMeans(zz[i,8:9]) average_value[i,6] - rowMeans(zz[i,10:11]) #Sample B average_value[i,7] - rowMeans(zz[i,12:13]) average_value[i,8] - rowMeans(zz[i,14:15]) average_value[i,9] - rowMeans(zz[i,16:17]) average_value[i,10] - rowMeans(zz[i,18:19]) average_value[i,11] - rowMeans(zz[i,20:21]) #Sample C average_value[i,12] - rowMeans(zz[i,22:23]) average_value[i,13] - rowMeans(zz[i,24:25]) average_value[i,14] - rowMeans(zz[i,26:27]) average_value[i,15] - rowMeans(zz[i,28:29]) average_value[i,16] - rowMeans(zz[i,30:31]) #Sample D average_value[i,17] - rowMeans(zz[i,32:33]) average_value[i,18] - rowMeans(zz[i,34:35]) average_value[i,19] - rowMeans(zz[i,36:37]) average_value[i,20] - rowMeans(zz[i,38:39]) average_value[i,21] - rowMeans(zz[i,40:41]) } thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Optimization algorithm to be applied to S4 classes - specifically sparse matrices
On Wed, May 13, 2009 at 5:21 PM, avraham.ad...@guycarp.com wrote: Hello. I am trying to optimize a set of parameters using /optim/ in which the actual function to be minimized contains matrix multiplication and is of the form: SUM ((A%*%X - B)^2) where A is a matrix and X and B are vectors, with X as parameter vector. As Spencer Graves pointed out, what you are describing here is a linear least squares problem, which has a direct (i.e. non-iterative) solution. A comparison of the speed of various ways of solving such a system is given in one of the vignettes in the Matrix package. This has worked well so far. Recently, I was given a data set A of size 360440 x 1173, which could not be handled as a normal matrix. I brought it into 'R' as a sparse matrix (dgCMatrix - using sparseMatrix from the Matrix package), and the formulæ and gradient work, but /optim/ returns an error of the form no method for coercing this S4 class to a vector. If you just want the least squares solution X then X - solve(crossprod(A), crossprod(A, B)) will likely be the fastest method where A is the sparse matrix. I do feel obligated to point out that the least squares solution for such large systems is rarely a sensible solution to the underlying problem. If you have over 1000 columns in A and it is very sparse then likely at least parts of A are based on indicator columns for a categorical variable. In such situations a model with random effects for the category is often preferable to the fixed-effects model you are fitting. After briefly looking into methods and classes, I realize I am in way over my head. Is there any way I could use /optim/ or another optimization algorithm, on sparse matrices? Thank you very much, --Avraham Adler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data summary and some automated t.tests.
I would like to preform a t.test to each of the measured variables (sand.silt etc.) with a mean and sd for each of the treatments (up or down), and out put this as a table I am having a hard time starting- maybe it is to close to lunch. Any suggestions would be greatly appreciated. Stephen Sefick x - (structure(list(sample. = structure(c(1L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 2L, 3L, 4L, 5L, 6L, 1L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 2L, 3L, 4L, 5L, 6L, 25L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 26L, 25L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 26L, 27L, 25L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 26L, 15L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 16L, 15L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 16L, 36L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 37L, 36L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 37L, 38L), .Label = c(0805-r1, 0805-r10, 0805-r11, 0805-r12, 0805-r13, 0805-r14, 0805-r2, 0805-r3, 0805-r4, 0805-r5, 0805-r6, 0805-r7, 0805-r8, 0805-r9, 0805-u1, 0805-u10, 0805-u2, 0805-u3, 0805-u4, 0805-u5, 0805-u6, 0805-u7, 0805-u8, 0805-u9, 1005-r1, 1005-r10, 1005-r11, 1005-r2, 1005-r3, 1005-r4, 1005-r5, 1005-r6, 1005-r7, 1005-r8, 1005-r9, 1005-u1, 1005-u10, 1005-u11, 1005-u2, 1005-u3, 1005-u4, 1005-u5, 1005-u6, 1005-u7, 1005-u8, 1005-u9 ), class = factor), date = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c(10/1/05, 8/29/05), class = factor), Replicate = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L ), site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(dn, up ), class = factor), sand.silt = c(20L, 45L, 90L, 21L, 80L, 77L, 30L, 80L, 36L, 9L, 62L, 71L, 20L, 65L, 10L, 70L, 50L, 80L, 90L, 97L, 94L, 82L, 30L, 10L, 65L, 80L, 90L, 70L, 10L, 50L, 60L, 40L, 10L, 45L, 10L, 10L, 15L, 10L, 8L, 35L, 10L, 40L, 10L, 10L, 28L, 5L, 45L, 35L, 2L, 10L, 40L, 2L, 70L, 40L, 20L, 30L, 50L, 60L, 10L, 100L, 98L, 98L, 90L, 87L, 87L, 40L, 97L, 92L, 70L, 50L, 81L, 35L, 70L, 89L, 28L, 28L, 82L, 81L, 33L, 80L, 40L, 40L, 60L, 30L, 5L, 50L, 70L, 75L, 85L, 95L, 93L, 80L, 80L, 60L, 82L, 60L, 5L, 70L, 80L, 40L), gravel = c(8L, 45L, 7L, 5L, 10L, 5L, 35L, 7L, 45L, 60L, 0L, 0L, 5L, 8L, 25L, 0L, 45L, 15L, 0L, 1L, 2L, 5L, 6L, 15L, 10L, 5L, 3L, 10L, 20L, 0L, 20L, 31L, 20L, 35L, 70L, 30L, 60L, 60L, 70L, 50L, 70L, 40L, 50L, 30L, 48L, 85L, 20L, 30L, 20L, 60L, 30L, 8L, 10L, 30L, 30L, 10L, 0L, 0L, 10L, 0L, 0L, 0L, 2L, 8L, 8L, 30L, 0L, 3L, 15L, 29L, 11L, 60L, 15L, 8L, 60L, 25L, 8L, 9L, 42L, 1L, 50L, 40L, 10L, 60L, 60L, 30L, 10L, 10L, 0L, 0L, 0L, 2L, 2L, 0L, 1L, 25L, 10L, 10L, 10L, 50L), cobble = c(5L, 2L, 1L, 5L, 0L, 3L, 10L, 2L, 4L, 3L, 1L, 0L, 3L, 14L, 50L, 0L, 1L, 1L, 0L, 0L, 0L, 2L, 0L, 5L, 0L, 0L, 2L, 5L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 30L, 5L, 2L, 1L, 0L, 0L, 0L, 5L, 35L, 3L, 0L, 0L, 0L, 40L, 0L, 0L, 5L, 0L, 0L, 10L, 5L, 0L, 0L, 10L, 0L, 0L, 0L, 0L, 1L, 1L, 30L, 0L, 0L, 0L, 10L, 4L, 3L, 2L, 0L, 2L, 0L, 0L, 0L, 20L, 0L, 0L, 0L, 0L, 0L, 20L, 0L, 10L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 10L, 0L, 0L, 0L), boulder.bedrock = c(60L, 0L, 0L, 45L, 0L, 0L, 0L, 0L, 0L, 8L, 10L, 0L, 35L, 5L, 8L, 0L, 0L, 0L, 0L, 0L, 0L, 10L, 60L, 70L, 0L, 0L, 0L, 5L, 55L, 0L, 0L, 0L, 40L, 0L, 0L, 0L, 0L, 15L, 0L, 0L, 10L, 0L, 20L, 10L, 0L, 0L, 0L, 0L, 20L, 0L, 0L, 60L, 0L, 0L, 20L, 0L, 10L, 0L, 50L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 10L, 0L, 0L, 0L, 0L, 0L, 4L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 5L, 0L, 0L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 75L, 10L, 0L, 0L), fine.root = c(5L, 7L, 0L, 10L, 2L, 6L, 5L, 4L, 3L, 7L, 0L, 0L, 7L, 4L, 6L, 1L, 4L, 2L, 2L, 2L, 3L, 1L, 0L, 1L, 20L, 5L, 3L, 5L, 10L, 2L, 0L, 6L, 10L, 10L, 15L, 0L, 0L, 5L, 15L, 0L, 10L, 10L, 0L, 5L, 8L, 5L, 0L, 20L, 0L, 8L, 0L, 0L, 7L, 0L, 0L, 15L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 0L, 2L, 0L, 3L, 3L, 4L, 5L, 0L, 0L, 8L, 2L, 2L, 3L, 0L, 1L, 0L, 10L, 0L, 0L, 0L, 0L, 0L, 12L, 0L, 0L, 10L, 0L, 0L, 5L, 12L, 0L, 0L, 0L, 0L, 10L, 5L, 5L), course.root = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 3L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
[R] Function Surv and interpretation
Dear everyone, My question involves the use of the survival object. We can have Surv(time,time2,event, type=, origin = 0) (1) As detailed on p.65 of: http://cran.r-project.org/web/packages/survival/survival.pdf My data (used in my study) is 'right censored' i.e. my variable corresponding to 'event' indicates whether a person is alive (0) or dead (1) at date last seen and my 'time' indicates time from transplant to date of last contact (where this is time from transplant to death if person has died or time from transplant to date last seen if person is still alive). Now I am using function, rcorr.cens http://lib.stat.cmu.edu/S/Harrell/help/Hmisc/html/rcorr.cens.html This function involves use of Surv. Now here is a section of my syntax: time-data$ovsrecod x1-data$RMY.GROUPS death-data$death rcorr.cens(x1,Surv(time,death),outx=FALSE) (2) As you can see, I have entered Surv(time,death)...this works (and complies with the example given in R for rcorr.cens) and all seems to be well...however, bearing in mind that in (1) we have: Surv(time,time2,event, type=, origin = 0) ...how does R know that 'death' in *my* syntax (2) is the 'event'...i.e. how does it know that time2 is skipped in my analysis? I am a bit perplexed! The R documentation for Surv says that Surv(time,event) is a 'typical usage' as is Surv(time,time2,event, type=, origin = 0)...but how does it know when we are using the former and not the latter? I have tried entering: rcorr.cens(x1,Surv(time,event=death),outx=FALSE) but it does not like it saying that Error in Surv(time, event = death) : argument time2 is missing, with no default I hope that this makes sense! Thank you so much for your advice on this ...it's much appreciated, Kim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replace % with \%
On 15-May-09 14:46:27, Liviu Andronic wrote: Dear all, I'm trying to gsub() % with \% with no obvious success. temp1 - c(mean, sd, 0%, 25%, 50%, 75%, 100%) temp1 [1] mean sd 0% 25% 50% 75% 100% gsub(%, \%, temp1, fixed=TRUE) [1] mean sd 0% 25% 50% 75% 100% Warning messages: 1: '\%' is an unrecognized escape in a character string 2: unrecognized escape removed from \% I am not quite sure on how to deal with this error message. I tried the following gsub(%, \\%, temp1, fixed=TRUE) [1] mean sd 0\\% 25\\% 50\\% 70\\% 100\\% Could anyone suggest how to obtain output similar to: [1] mean sd 0\% 25\% 50\% 75\% 100\% Thank you, Liviu 1: The double escape \\ is the correct way to do it. If you give \% to gsub, it will try to interpret % as a special character (like \n for newline), and there is none such (as it tells you). On the other hand, \\ tells gsub to interpret \ (normally used as the Escape character) in a special way (namely as a literal \). 2: The output mean sd 0\\% 25\\% 50\\% 70\\% 100\\% from gsub(%, \\%, temp1, fixed=TRUE) is one of those cases where R displays something different from what is really there! In other words, 0\\% for example is the character string you would have to enter in order for R to store \%. You can see what is really there using cat: cat(gsub(%, \\%, temp1, fixed=TRUE)) # mean sd 0\% 25\% 50\% 75\% 100\% which, of course, is what you wanted. You can see in other ways that what is stored is what you wanted -- for instance: temp2 - gsub(%, \\%, temp1, fixed=TRUE) write.csv(temp2,gsub.csv) and then, if you look into gsub.csv outside of R, you will see: ,x 1,mean 2,sd 3,0\% 4,25\% 5,50\% 6,75\% 7,100\% which, again, is what you wanted. Hoping this helops, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 15-May-09 Time: 16:32:13 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Printing to screen a matrix or data.frame in one chunk (not splitting columns)
Hello, I saw this nice trick I want to replicate but I lost the source and I hope one of you can point me to the solution. My problem is that I don't know the correct words to query this. When I print to screen a matrix or data.frame the columns are split and printed below the previous ones; even though I have plenty of screen left. E.g., my_matrix = matrix(runif(30),nrow=3,ncol=10) my_matrix [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] 0.4979305 0.1155717 0.4484069 0.29986049 0.5427566 0.4324351 0.269171456 [2,] 0.8405987 0.3605237 0.6615507 0.75305248 0.8569482 0.3401004 0.192526423 [3,] 0.5608779 0.3953941 0.9995035 0.03141064 0.7985053 0.4903582 0.000490054 [,8] [,9] [,10] [1,] 0.1402751 0.2852381 0.98816751 [2,] 0.8337806 0.7322920 0.17505541 [3,] 0.5414113 0.4668012 0.04420137 So there is a way to resize the space for printing so that everything in printed in one chunk. Thanks in advance, Adrian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error writing to connection
Hello, I am using: save(data,file=D:/mayData.RData), and I have the following error: Error in save(data, file = D:/mayData.RData) : error writing to connection Thank you very much in advance, Stefo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using column length in plot gives error
Hi I'm trying to write a generic script for processing some data which finishes off with some plots. Given Im never sure how many columns will be in my dataframe I wanted to using the following plot(spectra.wavelength, cormat, type = l, ylim=c(-1,1), xlab=Wavelength (nm), ylab=Correlation) however even if I specify as type=l it appears plot as points (right hand plot). If I specify a range such as plot(650:700, cormat, type = l, ylim=c(-1,1), xlab=Wavelength (nm), ylab=Correlation) it looks good (left hand plot). If I try something like: plot(spectra.wavelength[1]:spectra.wavelength[length(spectra.wavelength)], cormat, type = l, ylim=c(-1,1), xlab=Wavelength (nm), ylab=Correlation) it fails with variable lengths differ and when I look at spectra.wavelength[1] it gives me the value but then states there are 53 levels. What does this mean and how can I get the result I want??! many thanks mike -- View this message in context: http://www.nabble.com/Using-column-length-in-plot-gives-error-tp23562704p23562704.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotting question re. cuminc
Hello everyone, (This is my second question posted today on the R list). I am carrying out a competing risks analysis using the cuminc function...this takes the form: cuminc(ftime,fstatus,group) In my study, fstatus has 3 different causes of failure (1,2,3) there are also censored cases (0). group has two levels (0 and 1). I therefore have 6 different cumulative incidence curves: cause 1, group=0; cause 1 group=1 cause 2, group=0; cause 2 group=1 cause 3, group=0; cause 3 group=1 If I type the following commands: xx-cuminc(ftime,fstatus,group) plot(xx,lty=1,color=1:6) I end up with the 6 curves plotted on the same graph. Is there a way that I can plot a selection of these curves? (say only curves for cause 1, group=0 and cause 1 group=1). Thank you so much, Kind Regards, Kim Dr Kim Pearce CStat Industrial Statistics Research Unit (ISRU) School of Mathematics and Statistics Herschel Building University of Newcastle Newcastle upon Tyne United Kingdom NE1 7RU Tel. 0044 (0)191 222 6244 (direct) Fax. 0044 (0)191 222 8020 Email: k.f.pea...@ncl.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can you tell what .Random.seed *was*?
On Thu, May 14, 2009 at 3:36 PM, G. Jay Kerns gke...@ysu.edu wrote: set.seed(something) x - rnorm(100) y - runif(500) # bunch of other stuff ... Now, I give you a copy of my script.R (with the set.seed statement removed, of course) together with the .RData file that was generated by the save.image() command. ... 1) can you tell me what my original set.seed() value was?... 2) is it possible *in principle* to figure out what set.seed was, given the above? Set.seed takes an integer argument, that is, 2^32-1 distinct values (cf NA_integer_), so the very simplest approach, brute-force search, has a hope of working: whatseed - function (v) { i - as.integer(-2^31+1); max - as.integer(2^31-1) while (imax) { set.seed(i); if (runif(1)==v) return(i); i-i+1 } } (OK, being able to figure it out in 2*10^68 years doesn't count, but within a couple months is acceptable.) set.seed(-2^31+10) system.time(whatseed(runif(1))) user system elapsed 1.530.001.53 2^32*(1.53/10)/3600 = 18.25 18 hours 3) does the answer change if there is a remove(.Random.seed) command right before the save.image() command? Depending on which RNG algorithm (RNGkind) you use, there may be cryptographic techniques that are more efficient than brute-force search, especially if the full internal state (.Random.seed) is preserved. This all assumes that the seed is set *only* with set.seed. If .Random.seed is modified directly, there are many more possibilities for most of the RNGs. -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using column length in plot gives error
Hi I'm trying to write a generic script for processing some data which finishes off with some plots. Given Im never sure how many columns will be in my dataframe I wanted to using the following plot(spectra.wavelength, cormat, type = l, ylim=c(-1,1), xlab=Wavelength (nm), ylab=Correlation) however even if I specify as type=l it appears plot as points (right hand plot). If I specify a range such as plot(650:700, cormat, type = l, ylim=c(-1,1), xlab=Wavelength (nm), ylab=Correlation) it looks good (left hand plot). If I try something like: plot(spectra.wavelength[1]:spectra.wavelength[length(spectra.wavelength)], cormat, type = l, ylim=c(-1,1), xlab=Wavelength (nm), ylab=Correlation) it fails with variable lengths differ and when I look at spectra.wavelength[1] it gives me the value but then states there are 53 levels. What does this mean and how can I get the result I want??! many thanks mike http://www.nabble.com/file/p23562717/1.pdf 1.pdf -- View this message in context: http://www.nabble.com/Using-column-length-in-plot-gives-error-tp23562717p23562717.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulation
Another possibility (maybe more readable, gives the option of a list, probably not faster): Replicate(1000, rexp(15,1) ) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Ben Bolker Sent: Friday, May 15, 2009 6:37 AM To: r-help@r-project.org Subject: Re: [R] Simulation On Fri, 15 May 2009 19:17:37 +1000 Kon Knafelman konk2...@hotmail.com wrote: KK I hve the same problem as the initial one, except i need 1000 KK samples of size 15, and my distribution is Exp(1). I've adjusted KK some of the loop formulas for my n=15, but im unsure how to proceed KK in the quickest way. KK Can someone please help? Taking a guess: matrix(rexp(15000,1),ncol=15) ? -- View this message in context: http://www.nabble.com/Simulation- tp23556274p23558953.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replace % with \%
Marc Schwartz wrote: On May 15, 2009, at 9:46 AM, Liviu Andronic wrote: Dear all, I'm trying to gsub() % with \% with no obvious success. temp1 - c(mean, sd, 0%, 25%, 50%, 75%, 100%) temp1 [1] mean sd 0% 25% 50% 75% 100% gsub(%, \%, temp1, fixed=TRUE) [1] mean sd 0% 25% 50% 75% 100% Warning messages: 1: '\%' is an unrecognized escape in a character string 2: unrecognized escape removed from \% I am not quite sure on how to deal with this error message. I tried the following gsub(%, \\%, temp1, fixed=TRUE) [1] mean sd 0\\% 25\\% 50\\% 75\\% 100\\% Could anyone suggest how to obtain output similar to: [1] mean sd 0\% 25\% 50\% 75\% 100\% Thank you, Liviu Presuming that you might want to output the results to a TeX file for subsequent processing, where the '%' would otherwise be a comment character, the key is not to get a single '\', but a double '\\', so that you then get a single '\' on output: temp1 - c(mean, sd, 0%, 25%, 50%, 75%, 100%) temp2 - gsub(%, %, temp1) temp2 [1] mean sd 0\\% 25\\% 50\\% 75\\% 100\\% cat(temp2) mean sd 0\% 25\% 50\% 75\% 100\% Remember that the single '\' is an escape character, which needs to be doubled. this confusing backslash each backslashing backslash scheme is idiosyncratic to r; in many cases where one'd otherwise use a single backslash in a regex or a replacement string in another programming language, in r you have to double it. and actually, in this case you don't need four backslashes. the original poster has actually had a valid solution, but he wasn't aware that the string \\%, returned (not printed) by gsub includes two, not three characters -- thus only one backslash, not two: cat( gsub( pattern='%', replacement='\\%', x='foo % bar', fixed=TRUE)) # foo \% bar of course, if the pattern cannot be fixed, i.e., fixed=TRUE is less than helpful, you'd need four backslashes in the replacement -- a cute, though somewhat disturbing, weirdo. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Optimization algorithm to be applied to S4 classes - specifically sparse matrices
Dear Avraham: For problems with many parameters to estimate, I highly recommend Pinheiro and Bates (2000) Mixed-Effects Models in S and S-Plus (Springer). This book includes numerous examples showing how to use the nlme package. The value of this book is greatly enhanced by the availability of script files named, ch01.R, ch02.R, ... ch08.R showing how to work virtually all the examples in the book. These script files are available in your local installation of R. To find them, enter the following at a commands prompt in R: system.file('scripts', package='nlme') Hope this helps. Spencer Graves ## Dear Doug, et al.: What would you recommend for analyzing a longitudinal abundance survey of 22 species, when the species were not selected at random? A prominent scientist tried to tell me that mixed-effects modeling is inappropriate in that case because the species were selected purposefully not at random. My response is that even in that case, one should still use mixed-effects modeling, because it will tend to produce more appropriate estimates for the deviations of individual species from the average of all species -- potentially much lower variance with slight bias -- than naive ordinary least squares. The estimated variance components will not represent the between-species variance for the actual population of all hypothetical species of the particular type, but will represent the between-species variability in a hypothetical population from which the selected species might be considered a random sample. Best Wishes, Spencer Graves p.s. I appreciate very much Doug's comment on this. I thought about adding something like that to my reply but didn't feel I could afford the time then. Douglas Bates wrote: On Wed, May 13, 2009 at 5:21 PM, avraham.ad...@guycarp.com wrote: Hello. I am trying to optimize a set of parameters using /optim/ in which the actual function to be minimized contains matrix multiplication and is of the form: SUM ((A%*%X - B)^2) where A is a matrix and X and B are vectors, with X as parameter vector. As Spencer Graves pointed out, what you are describing here is a linear least squares problem, which has a direct (i.e. non-iterative) solution. A comparison of the speed of various ways of solving such a system is given in one of the vignettes in the Matrix package. This has worked well so far. Recently, I was given a data set A of size 360440 x 1173, which could not be handled as a normal matrix. I brought it into 'R' as a sparse matrix (dgCMatrix - using sparseMatrix from the Matrix package), and the formulæ and gradient work, but /optim/ returns an error of the form no method for coercing this S4 class to a vector. If you just want the least squares solution X then X - solve(crossprod(A), crossprod(A, B)) will likely be the fastest method where A is the sparse matrix. I do feel obligated to point out that the least squares solution for such large systems is rarely a sensible solution to the underlying problem. If you have over 1000 columns in A and it is very sparse then likely at least parts of A are based on indicator columns for a categorical variable. In such situations a model with random effects for the category is often preferable to the fixed-effects model you are fitting. After briefly looking into methods and classes, I realize I am in way over my head. Is there any way I could use /optim/ or another optimization algorithm, on sparse matrices? Thank you very much, --Avraham Adler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulation
Greg Snow wrote: Another possibility (maybe more readable, gives the option of a list, probably not faster): Replicate(1000, rexp(15,1) ) I think that should be replicate The matrix form is quite a bit faster, but don't know if that will matter -- times below are for doing this task (1000 x 15 replicates) 1000 times ... system.time(replicate(1000,replicate(1000,rexp(15,1 user system elapsed 12.689 0.220 12.985 system.time(replicate(1000,matrix(rexp(15000,1),ncol=15))) user system elapsed 2.512 0.452 2.976 -- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bol...@ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc signature.asc Description: OpenPGP digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Optimization algorithm to be applied to S4 classes - specifically sparse matrices
Dear Doug, et al.: What would you recommend for analyzing a longitudinal abundance survey of 22 species, when the species were not selected at random? A prominent scientist tried to tell me that mixed-effects modeling is inappropriate in that case because the species were selected purposefully not at random. My response is that even in that case, one should still use mixed-effects modeling, because it will tend to produce more appropriate estimates for the deviations of individual species from the average of all species -- potentially much lower variance with slight bias -- than naive ordinary least squares. The estimated variance components will not represent the between-species variance for the actual population of all hypothetical species of the particular type, but will represent the between-species variability in a hypothetical population from which the selected species might be considered a random sample. Best Wishes, Spencer Graves p.s. I appreciate very much Doug's comment on this. I thought about adding something like that to my reply but didn't feel I could afford the time then. Douglas Bates wrote: On Wed, May 13, 2009 at 5:21 PM, avraham.ad...@guycarp.com wrote: Hello. I am trying to optimize a set of parameters using /optim/ in which the actual function to be minimized contains matrix multiplication and is of the form: SUM ((A%*%X - B)^2) where A is a matrix and X and B are vectors, with X as parameter vector. As Spencer Graves pointed out, what you are describing here is a linear least squares problem, which has a direct (i.e. non-iterative) solution. A comparison of the speed of various ways of solving such a system is given in one of the vignettes in the Matrix package. This has worked well so far. Recently, I was given a data set A of size 360440 x 1173, which could not be handled as a normal matrix. I brought it into 'R' as a sparse matrix (dgCMatrix - using sparseMatrix from the Matrix package), and the formulæ and gradient work, but /optim/ returns an error of the form no method for coercing this S4 class to a vector. If you just want the least squares solution X then X - solve(crossprod(A), crossprod(A, B)) will likely be the fastest method where A is the sparse matrix. I do feel obligated to point out that the least squares solution for such large systems is rarely a sensible solution to the underlying problem. If you have over 1000 columns in A and it is very sparse then likely at least parts of A are based on indicator columns for a categorical variable. In such situations a model with random effects for the category is often preferable to the fixed-effects model you are fitting. After briefly looking into methods and classes, I realize I am in way over my head. Is there any way I could use /optim/ or another optimization algorithm, on sparse matrices? Thank you very much, --Avraham Adler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Printing to screen a matrix or data.frame in one chunk (not splitting columns)
AC == Adrián Cortés adrc...@gmail.com on Fri, 15 May 2009 08:58:04 -0700 writes: AC Hello, AC I saw this nice trick I want to replicate but I lost the source and I hope AC one of you can point me to the solution. My problem is that I don't know AC the correct words to query this. AC When I print to screen a matrix or data.frame the columns are split and AC printed below the previous ones; even though I have plenty of screen left. AC E.g., my_matrix = matrix(runif(30),nrow=3,ncol=10) my_matrix AC [,1] [,2] [,3] [,4] [,5] [,6] AC [,7] AC [1,] 0.4979305 0.1155717 0.4484069 0.29986049 0.5427566 0.4324351 AC 0.269171456 AC [2,] 0.8405987 0.3605237 0.6615507 0.75305248 0.8569482 0.3401004 AC 0.192526423 AC [3,] 0.5608779 0.3953941 0.9995035 0.03141064 0.7985053 0.4903582 AC 0.000490054 AC [,8] [,9] [,10] AC [1,] 0.1402751 0.2852381 0.98816751 AC [2,] 0.8337806 0.7322920 0.17505541 AC [3,] 0.5414113 0.4668012 0.04420137 AC So there is a way to resize the space for printing so that everything in AC printed in one chunk. options(width = 100) # or whatever. --- For ESS users, this option is set to the correct value, when R is started. If later, the emacs window is resized, you can automatically set the width to the current buffer (window) size, by M-x ess-execute-screen-options or, for everyone here who has (add-hook 'ess-mode-hook'ess-add-MM-keys) (add-hook 'inferior-ess-mode-hook 'ess-add-MM-keys) in their ~/.emacs equivalent, it's a simple C-c w ('w' for 'width') to adapt the R option to the emacs window size. Martin Maechler, ETH Zurich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Printing to screen a matrix or data.frame in one chunk (not splitting columns)
On May 15, 2009, at 10:58 AM, Adrián Cortés wrote: Hello, I saw this nice trick I want to replicate but I lost the source and I hope one of you can point me to the solution. My problem is that I don't know the correct words to query this. When I print to screen a matrix or data.frame the columns are split and printed below the previous ones; even though I have plenty of screen left. E.g., my_matrix = matrix(runif(30),nrow=3,ncol=10) my_matrix [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] 0.4979305 0.1155717 0.4484069 0.29986049 0.5427566 0.4324351 0.269171456 [2,] 0.8405987 0.3605237 0.6615507 0.75305248 0.8569482 0.3401004 0.192526423 [3,] 0.5608779 0.3953941 0.9995035 0.03141064 0.7985053 0.4903582 0.000490054 [,8] [,9] [,10] [1,] 0.1402751 0.2852381 0.98816751 [2,] 0.8337806 0.7322920 0.17505541 [3,] 0.5414113 0.4668012 0.04420137 So there is a way to resize the space for printing so that everything in printed in one chunk. Thanks in advance, Adrian See ?options and take note of 'width' which defaults to 80. Increase that value to a number that suits your requirements. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Optimization algorithm to be applied to S4 classes - specifically sparse matrices
Thank you both very much for your replies. What makes this a little less straightforward, at least to me, is that there needs to be constraints on the solved parameters. They most certainly need to be positive and there may be an upper limit as well. The true best linear fit would have negative entries for some of the parameters. Originally, I was using the L-BFGS-B method of optim which both allows for box constraints and has the limited memory advantage useful when dealing with large matrices. Having the analytic gradient, I thought of using BFGS and having a statement in the function returning Inf for any parameters outside the allowable constraints. I do /not/ know how to apply parameter constraints when using linear models. I looked around at the various manuals and help features, and outside of package glmc I did not find anything I could use. Perhaps I overlooked something. If there is something I missed, please let me know. If there truly is no standard optimization routine that works on sparse matrices, my next step may be to use the normal equations to shrink the size of the matrix, recast it as a dense matrix (it would only be 1173x1173 then) and then hand it off to optim. Any further suggestions or corrections would be very much appreciated. Thank you, --Avraham Adler Douglas Bates ba...@stat.wisc. edu To Sent by: avraham.ad...@guycarp.com dmba...@gmail.com cc r-help@r-project.org Subject 05/15/2009 11:57 Re: [R] Optimization algorithm to AMbe applied to S4 classes - specifically sparse matrices On Wed, May 13, 2009 at 5:21 PM, avraham.ad...@guycarp.com wrote: Hello. I am trying to optimize a set of parameters using /optim/ in which the actual function to be minimized contains matrix multiplication and is of the form: SUM ((A%*%X - B)^2) where A is a matrix and X and B are vectors, with X as parameter vector. As Spencer Graves pointed out, what you are describing here is a linear least squares problem, which has a direct (i.e. non-iterative) solution. A comparison of the speed of various ways of solving such a system is given in one of the vignettes in the Matrix package. This has worked well so far. Recently, I was given a data set A of size 360440 x 1173, which could not be handled as a normal matrix. I brought it into 'R' as a sparse matrix (dgCMatrix - using sparseMatrix from the Matrix package), and the formulæ and gradient work, but /optim/ returns an error of the form no method for coercing this S4 class to a vector. If you just want the least squares solution X then X - solve(crossprod(A), crossprod(A, B)) will likely be the fastest method where A is the sparse matrix. I do feel obligated to point out that the least squares solution for such large systems is rarely a sensible solution to the underlying problem. If you have over 1000 columns in A and it is very sparse then likely at least parts of A are based on indicator columns for a categorical variable. In such situations a model with random effects for the category is often preferable to the fixed-effects model you are fitting. After briefly looking into methods and classes, I realize I am in way over my head. Is there any way I could use /optim/ or another optimization algorithm, on sparse matrices? Thank you very much, --Avraham Adler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error writing to connection
On May 15, 2009, at 8:22 AM, Stefo Ratino wrote: Hello, I am using: save(data,file=D:/mayData.RData), and I have the following error: Error in save(data, file = D:/mayData.RData) : error writing to connection Thank you very much in advance, Stefo Presuming that drive 'D' exists and that you have permission to write to it, it is possible that there is insufficient room on that drive to save 'data'. Check on the above. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error writing to connection
On 5/15/2009 9:22 AM, Stefo Ratino wrote: Hello, I am using: save(data,file=D:/mayData.RData), and I have the following error: Error in save(data, file = D:/mayData.RData) : error writing to connection Do you have permission to create a file there? Try it from outside R. Duncan Murdoch Thank you very much in advance, Stefo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory usage grows too fast
Thanks for Peter, William, and Hadley's helps. Your codes are much more concise than mine. :P Both William and Hadley's comments are the same. Here are their codes. f - function(dataMatrix) rowMeans(datamatrix==02) And Peter's codes are the following. apply(yourMatrix, 1, function(x) length(x[x==yourPattern]))/ncol(yourMatrix) In terms of the running time, the first one ran faster than the later one on my dataset (2.5 mins vs. 6.4 mins) The memory consumption, however, of the first one is much higher than the later. ( 8G vs. ~3G ) Any thoughts? My guess is the rowMeans created extra copies to perform its calculation, but not so sure. And I am also interested in understanding ways to handle memory issues. Help someone could shed light on this for me. :) Best, Mike -Original Message- From: Peter Alspach [mailto:palsp...@hortresearch.co.nz] Sent: Thursday, May 14, 2009 4:47 PM To: Ping-Hsun Hsieh Subject: RE: [R] memory usage grows too fast Tena koe Mike If I understand you correctly, you should be able to use something like: apply(yourMatrix, 1, function(x) length(x[x==yourPattern]))/ncol(yourMatrix) I see you've divided by nrow(yourMatrix) so perhaps I am missing something. HTH ... Peter Alspach -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ping-Hsun Hsieh Sent: Friday, 15 May 2009 11:22 a.m. To: r-help@r-project.org Subject: [R] memory usage grows too fast Hi All, I have a 1000x100 matrix. The calculation I would like to do is actually very simple: for each row, calculate the frequency of a given pattern. For example, a toy dataset is as follows. Col1 Col2Col3Col4 0102 02 00 = Freq of 02 is 0.5 0202 02 01 = Freq of 02 is 0.75 0002 01 01 ... My code is quite simple as the following to find the pattern 02. OccurrenceRate_Fun-function(dataMatrix) { tmp-NULL tmpMatrix-apply(dataMatrix,1,match,02) for ( i in 1: ncol(tmpMatrix)) { tmpRate-table(tmpMatrix[,i])[[1]]/ nrow(tmpMatrix) tmp-c(tmp,tmpHET) } rm(tmpMatrix) rm(tmpRate) return(tmp) gc() } The problem is the memory usage grows very fast and hard to be handled on machines with less RAM. Could anyone please give me some comments on how to reduce the space complexity in this calculation? Thanks, Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are confidential and may be ...{{dropped:14}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulation
I wrote replicate but the darn e-mail program fixed it for me. I expected replicate to be a bit slower, but not by that amount. I just wanted to include replicate as a more readable version of lapply while still improving over the loop approach. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Ben Bolker [mailto:bol...@ufl.edu] Sent: Friday, May 15, 2009 10:19 AM To: Greg Snow Cc: r-help@r-project.org Subject: Re: [R] Simulation Greg Snow wrote: Another possibility (maybe more readable, gives the option of a list, probably not faster): Replicate(1000, rexp(15,1) ) I think that should be replicate The matrix form is quite a bit faster, but don't know if that will matter -- times below are for doing this task (1000 x 15 replicates) 1000 times ... system.time(replicate(1000,replicate(1000,rexp(15,1 user system elapsed 12.689 0.220 12.985 system.time(replicate(1000,matrix(rexp(15000,1),ncol=15))) user system elapsed 2.512 0.452 2.976 -- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bol...@ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] drawing arrows
On 5/15/2009 12:43 PM, christophe dutang wrote: Hi, I would like to draw arrows in a classic 2D plot. Which package should I use? is there R base functions that do job? On google, I could not find any useful discussion about this topic, except a link to the function 'grid.arrows' of the grid package. My problem is I would like to draw arrows at the edge of circles drawn by the 'symbols' function. Maybe there is already a dedicated function for this? Any help is appreciated. See ?arrows. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] drawing arrows
Hi, I would like to draw arrows in a classic 2D plot. Which package should I use? is there R base functions that do job? On google, I could not find any useful discussion about this topic, except a link to the function 'grid.arrows' of the grid package. My problem is I would like to draw arrows at the edge of circles drawn by the 'symbols' function. Maybe there is already a dedicated function for this? Any help is appreciated. Christophe -- Christophe DUTANG Ph. D. student at ISFA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can you tell what .Random.seed *was*?
Set.seed takes an integer argument, that is, 2^32-1 distinct values (cf NA_integer_), so the very simplest approach, brute-force search, has a hope of working: whatseed - function (v) { i - as.integer(-2^31+1); max - as.integer(2^31-1) while (imax) { set.seed(i); if (runif(1)==v) return(i); i-i+1 } } (OK, being able to figure it out in 2*10^68 years doesn't count, but within a couple months is acceptable.) set.seed(-2^31+10) system.time(whatseed(runif(1))) user system elapsed 1.53 0.00 1.53 2^32*(1.53/10)/3600 = 18.25 18 hours 3) does the answer change if there is a remove(.Random.seed) command right before the save.image() command? Depending on which RNG algorithm (RNGkind) you use, there may be cryptographic techniques that are more efficient than brute-force search, especially if the full internal state (.Random.seed) is preserved. This all assumes that the seed is set *only* with set.seed. If .Random.seed is modified directly, there are many more possibilities for most of the RNGs. -s Thanks very much to Warren and Stavros for their additional insight. Putting all of this together, I think I am now ready to formulate my question intelligently: Using Sweave, I want to distribute randomly generated problems AND answers to both teacher AND student. More precisely, I want to distribute: 1) the .Rnw file 2) the .RData file saved near the end of the Sweave process. I want it to be *easy* for the Instructor to change my seed and generate new problems. I want it to be *difficult* for students to figure out the seed and automatically generate solutions on their own. Of course, difficult is a relative term, since what is difficult for them may well be easy for me, and what is difficult for me will be trivial to cryptographers and some people on this list. The audience would be, say, upper division undergraduate students at a public university. What is clear so far: a brute force search of set.seed() is really pretty easy and fast... even for students at this level. However, relating to Duncan's second remark: what if the Instructor inserted an *unknown* very large number of calls to the RNG near the beginning of the .Rnw (but after the set.seed)... and did not distribute this information to the students... that would make it much harder, yes? Any ideas that are even better than this? Conceivably, some of my students will be searching these archives in the future; please feel free to respond off-list if appropriate. Jay -- *** G. Jay Kerns, Ph.D. Associate Professor Department of Mathematics Statistics Youngstown State University Youngstown, OH 44555-0002 USA Office: 1035 Cushwa Hall Phone: (330) 941-3310 Office (voice mail) -3302 Department -3170 FAX E-mail: gke...@ysu.edu http://www.cc.ysu.edu/~gjkerns/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Any R workshops on BUGS or resampling or other...?
I would like to know about any workshops/meetings on the topics of (1) using some version of BUGS with R (2) resampling methods (3) other advanced courses. Thanks for any ideas. Kevin Wright [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory usage grows too fast
Hi William, Thanks for the comments and explanation. It is really good to know the details of rowMeans. I did modified Peter's codes from length(x[x==02]) to sum(x==02), though it improved only in few seconds. :) Best, Mike -Original Message- From: William Dunlap [mailto:wdun...@tibco.com] Sent: Friday, May 15, 2009 10:09 AM To: Ping-Hsun Hsieh Subject: RE: [R] memory usage grows too fast rowMeans(dataMatrix==02) must (a) make a logical matrix the dimensions of dataMatrix in which to put the result of dataMatrix==02 (4 bytes/logical element) (b) make a double precision matrix (8 bytes/element) the size of that logical matrix because rowMeans uses some C code that only works on doubles apply(dataMatrix,1,function(x)length(x[x==02])/ncol(dataMatrix)) never has to make any copies of the entire matrix. It extracts a row at a time and when it is done with the row, the memory used for working on the row is available for other uses. Note that it would probably be a tad faster if it were changed to apply(dataMatrix,1,function(x)sum(x==02)) / ncol(dataMatrix) as sum(logicalVector) is the same as length(x[logicalVector]) and there is no need to compute ncol(dataMatrix) more than once. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com -Original Message- From: Ping-Hsun Hsieh [mailto:hsi...@ohsu.edu] Sent: Friday, May 15, 2009 9:58 AM To: Peter Alspach; William Dunlap; hadley wickham Cc: r-help@r-project.org Subject: RE: [R] memory usage grows too fast Thanks for Peter, William, and Hadley's helps. Your codes are much more concise than mine. :P Both William and Hadley's comments are the same. Here are their codes. f - function(dataMatrix) rowMeans(datamatrix==02) And Peter's codes are the following. apply(yourMatrix, 1, function(x) length(x[x==yourPattern]))/ncol(yourMatrix) In terms of the running time, the first one ran faster than the later one on my dataset (2.5 mins vs. 6.4 mins) The memory consumption, however, of the first one is much higher than the later. ( 8G vs. ~3G ) Any thoughts? My guess is the rowMeans created extra copies to perform its calculation, but not so sure. And I am also interested in understanding ways to handle memory issues. Help someone could shed light on this for me. :) Best, Mike -Original Message- From: Peter Alspach [mailto:palsp...@hortresearch.co.nz] Sent: Thursday, May 14, 2009 4:47 PM To: Ping-Hsun Hsieh Subject: RE: [R] memory usage grows too fast Tena koe Mike If I understand you correctly, you should be able to use something like: apply(yourMatrix, 1, function(x) length(x[x==yourPattern]))/ncol(yourMatrix) I see you've divided by nrow(yourMatrix) so perhaps I am missing something. HTH ... Peter Alspach -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ping-Hsun Hsieh Sent: Friday, 15 May 2009 11:22 a.m. To: r-help@r-project.org Subject: [R] memory usage grows too fast Hi All, I have a 1000x100 matrix. The calculation I would like to do is actually very simple: for each row, calculate the frequency of a given pattern. For example, a toy dataset is as follows. Col1Col2Col3Col4 01 02 02 00 = Freq of 02 is 0.5 02 02 02 01 = Freq of 02 is 0.75 00 02 01 01 ... My code is quite simple as the following to find the pattern 02. OccurrenceRate_Fun-function(dataMatrix) { tmp-NULL tmpMatrix-apply(dataMatrix,1,match,02) for ( i in 1: ncol(tmpMatrix)) { tmpRate-table(tmpMatrix[,i])[[1]]/ nrow(tmpMatrix) tmp-c(tmp,tmpHET) } rm(tmpMatrix) rm(tmpRate) return(tmp) gc() } The problem is the memory usage grows very fast and hard to be handled on machines with less RAM. Could anyone please give me some comments on how to reduce the space complexity in this calculation? Thanks, Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are confidential and may be subject to legal privilege. If you are not the intended recipient you must not use, disseminate, distribute or reproduce all or any part of this e-mail or attachments. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. Any opinion or views expressed in this e-mail are those of the individual sender and may not represent those of The New Zealand Institute for Plant and Food Research Limited.
[R] Rotating x-axis categorical labels
Hello, I am using barplot to generate a histogram of population by county. I need to plot the bars for about 35 counties, and would like to rotate the county name labels on the x-axis to a vertical orientation so that I can fit them all. An example of my syntax is below: r.barplot(x,main=main, xlab=xlab,ylab=ylab,names_arg=counties,axis_lty=1,col=lavender,ylim=r.c(0,100),cex_axis=0.7,cex_names=0.7,offset=0,las=1) Using las rotates the y-axis labels --- how do I rotate the X-axis labels...? Thanks, William __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rotating x-axis categorical labels
On May 15, 2009, at 12:18 PM, Bill Hudspeth wrote: Hello, I am using barplot to generate a histogram of population by county. I need to plot the bars for about 35 counties, and would like to rotate the county name labels on the x-axis to a vertical orientation so that I can fit them all. An example of my syntax is below: r.barplot(x,main=main, xlab = xlab ,ylab = ylab ,names_arg = counties ,axis_lty = 1 ,col =lavender,ylim=r.c(0,100),cex_axis=0.7,cex_names=0.7,offset=0,las=1) Using las rotates the y-axis labels --- how do I rotate the X-axis labels...? Thanks, William par(las) takes 4 values 0:3. See ?par Try: # Rotate both x and y barplot(1:5, names.arg = paste(Bar, 1:5), las = 2) # Rotate just x barplot(1:5, names.arg = paste(Bar, 1:5), las = 3) If you want something other than a 90 degree rotation, see: http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-create-rotated-axis-labels_003f HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating and then executing command strings
The arrow - is used to assign a value to a variable, the equals sign = is used to specify the value for a function argument. Recent versions of R allow = to be used for - at the top level and certain circumstances which some people find more convenient, but can also lead to confusion (purists always keep them separate). The code: parse( text - paste( ... Will take the results of paste, save them in a variable named text, then pass a copy to the first argument of parse, which is file, not text, so parse will just get confused (looking for a file named what your code is). The code: parse( text = paste( ... Will take the results of paste and pass them to the parse function as the text argument. But having said that, you should refer to fortune(106) (type that after loading the fortunes package) and possibly fortune(181). There are probably better ways to do what you want, Romain's second example is one way. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Philipp Schmidt Sent: Friday, May 15, 2009 8:35 AM To: Romain Francois Cc: r-help@r-project.org Subject: Re: [R] creating and then executing command strings On Fri, May 15, 2009 at 3:38 PM, Romain Francois romain.franc...@dbmail.com wrote: Hi, You can either parse and eval the string you are making, as in: eval( parse( text = paste(avg_,colname, - 0;, sep='') ) ) Or you can do something like this: df[[ paste( avg_, colname, sep = ) ]] - 0 Thanks you so much! I used the first version and it worked. What puzzles me, is that I am not able to use - instead of = (my R book says the two can be exchanged) or break the command into different parts and execute them one after another. I get various error messages when I try: eval( parse( text - paste(avg_,colname, - 0;, sep='') ) ) or text = paste(avg_,colname, - 0;, sep='') parse(text) eval(parse(text)) Anyway, thanks a lot - you greatly improved the likelihood of me not working on the weekend! Best - P __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulation
Greg Snow wrote: Another possibility (maybe more readable, gives the option of a list, probably not faster): Replicate(1000, rexp(15,1) ) provided that simplify=FALSE: is(replicate(10, rexp(15, 1))) # matrix ... is(replicate(10, rexp(15, 1), simplify=FALSE)) # list ... vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] drawing arrows
Duncan mentioned the arrows function, which may do everything you want. But, also look at the my.symbols function in the TeachingDemos package for another way to draw arrows, or to draw your circles and arrows in 1 step. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of christophe dutang Sent: Friday, May 15, 2009 10:44 AM To: r-help@r-project.org Subject: [R] drawing arrows Hi, I would like to draw arrows in a classic 2D plot. Which package should I use? is there R base functions that do job? On google, I could not find any useful discussion about this topic, except a link to the function 'grid.arrows' of the grid package. My problem is I would like to draw arrows at the edge of circles drawn by the 'symbols' function. Maybe there is already a dedicated function for this? Any help is appreciated. Christophe -- Christophe DUTANG Ph. D. student at ISFA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Optimization algorithm to be applied to S4 classes - specifically sparse matrices
I suggest you try to translate your constraints into an unconstrained constrained problem using logarithms, then do nonlinear mixed effects modeling as described in chapters 6-8 of Pinheiro and Bates (2000). To do this, I would first start with the simpler linear estimation problem to get starting values for the nonlinear estimation. You should be able to do this using the nlme function in the nlme package. If you have trouble with this, you might consider the nlmer function in the lme4 package. The latter is newer and better in many ways but not as well documented. Hope this helps. Spencer Graves avraham.ad...@guycarp.com wrote: Thank you both very much for your replies. What makes this a little less straightforward, at least to me, is that there needs to be constraints on the solved parameters. They most certainly need to be positive and there may be an upper limit as well. The true best linear fit would have negative entries for some of the parameters. Originally, I was using the L-BFGS-B method of optim which both allows for box constraints and has the limited memory advantage useful when dealing with large matrices. Having the analytic gradient, I thought of using BFGS and having a statement in the function returning Inf for any parameters outside the allowable constraints. I do /not/ know how to apply parameter constraints when using linear models. I looked around at the various manuals and help features, and outside of package glmc I did not find anything I could use. Perhaps I overlooked something. If there is something I missed, please let me know. If there truly is no standard optimization routine that works on sparse matrices, my next step may be to use the normal equations to shrink the size of the matrix, recast it as a dense matrix (it would only be 1173x1173 then) and then hand it off to optim. Any further suggestions or corrections would be very much appreciated. Thank you, --Avraham Adler Douglas Bates ba...@stat.wisc. edu To Sent by: avraham.ad...@guycarp.com dmba...@gmail.com cc r-help@r-project.org Subject 05/15/2009 11:57 Re: [R] Optimization algorithm to AMbe applied to S4 classes - specifically sparse matrices On Wed, May 13, 2009 at 5:21 PM, avraham.ad...@guycarp.com wrote: Hello. I am trying to optimize a set of parameters using /optim/ in which the actual function to be minimized contains matrix multiplication and is of the form: SUM ((A%*%X - B)^2) where A is a matrix and X and B are vectors, with X as parameter vector. As Spencer Graves pointed out, what you are describing here is a linear least squares problem, which has a direct (i.e. non-iterative) solution. A comparison of the speed of various ways of solving such a system is given in one of the vignettes in the Matrix package. This has worked well so far. Recently, I was given a data set A of size 360440 x 1173, which could not be handled as a normal matrix. I brought it into 'R' as a sparse matrix (dgCMatrix - using sparseMatrix from the Matrix package), and the formulæ and gradient work, but /optim/ returns an error of the form no method for coercing this S4 class to a vector. If you just want the least squares solution X then X - solve(crossprod(A), crossprod(A, B)) will likely be the fastest method where A is the sparse matrix. I do feel obligated to point out that the least squares solution for such large systems is rarely a sensible solution to the underlying problem. If you have over 1000 columns in A and it is very sparse then likely at least parts of A are based on indicator columns for a categorical variable. In such situations a model with random effects for the category is often preferable to the fixed-effects model you are fitting. After briefly looking into methods and classes, I realize I am in way over my head. Is there any way I could use /optim/
Re: [R] drawing arrows
Thanks, I'll take a look. Christophe Le 15 mai 09 à 20:11, Greg Snow a écrit : Duncan mentioned the arrows function, which may do everything you want. But, also look at the my.symbols function in the TeachingDemos package for another way to draw arrows, or to draw your circles and arrows in 1 step. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of christophe dutang Sent: Friday, May 15, 2009 10:44 AM To: r-help@r-project.org Subject: [R] drawing arrows Hi, I would like to draw arrows in a classic 2D plot. Which package should I use? is there R base functions that do job? On google, I could not find any useful discussion about this topic, except a link to the function 'grid.arrows' of the grid package. My problem is I would like to draw arrows at the edge of circles drawn by the 'symbols' function. Maybe there is already a dedicated function for this? Any help is appreciated. Christophe -- Christophe DUTANG Ph. D. student at ISFA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Christophe Dutang Ph. D. student at ISFA, Lyon, France website: http://dutangc.free.fr [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can you tell what .Random.seed *was*?
G. Jay Kerns wrote: I want it to be *difficult* for students to figure out the seed and automatically generate solutions on their own. Hmmm Would it really be a bad thing if someone reverse engineered this to generate answers given the problem set? If it's hard enough to do that, it'd be more worth solving than the given problem set. I call that extra credit. a brute force search of set.seed() is really pretty easy and fast... even for students at this level. Either you're misunderstanding Stavros' benchmark results, or I am. Could easily be the latter...I'm an R newbie. As far as I can tell, the inner part of the loop does very little. If that's right, Stavros is saying it will take 18 hours to try every possible seed when the algorithm based on that seed takes almost no time to run. But, if generating each problem set takes, say, a minute, it will take 4.7 million years to generate a complete rainbow table when there are 2^32 possible seeds. what if the Instructor inserted an *unknown* very large number of calls to the RNG near the beginning of the .Rnw (but after the set.seed)... and did not distribute this information to the students... that would make it much harder, yes? There are better ways. As above, one key to making rainbow tables impractical is making the per-iteration time long enough. Even if it only takes a second to generate each possible problem set, that's enough when multiplied by high enough powers of 2. The other key is using big enough powers of 2. I hadn't looked into R's random number generation before, but it appears quite robust. Seeding it with the current wall clock time (a 32-bit integer on most systems) is an insult to its capability. The default pseudo-random number generator (PRNG) in my copy of R is the Mersenne Twister, a truly awesome algorithm. It's capable of very high quality results, as long as you give it a good seed. It will take a vector of *many* integers as a seed, not just one. It's not clear to me from the R docs if you can pass an arbitrary array of integers with any value, or if it needs something special. Assuming you can give it any old passel of randomness as a seed, you just have to find a good source of randomness to create that seed. On a Linux box, you could concatenate several dozen bytes read from /dev/random, the current wall clock time in microseconds, the inode of the R script being run, the process ID of the R interpreter, and the current mouse cursor position into a single string. Feed all that into a hash algorithm, and break off pieces of that 4 bytes long, cast them to integers, and send that array of ints to set.seed(). If you use SHA-256 as the hash algorithm, that scheme should give you enough input randomness to get any of the possible 2^256 hash outputs, making that the amount of possible problem sets. That's more than a rainbow table buster...there aren't enough atoms in the visible universe to construct a computer big enough to cope with 2^256 possible outputs. That said, the quality of the PRNG just *allows* you to avoid screwing up. It doesn't make it impossible make a weak algorithm. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] readBin: read from defined offset TO defined offset?
Thanks guys! Duncan's hints regarding character (which I was naturally using ;0) and the double readBin solved my problem - I'm extracting an index from a REALLY big XML file to get fast direct access to subsections, so that I only have to parse them rather than the whole thing (only SAX-style passing would be possible, since there's no way the thing will fit into memory). Thanks again, Joh Johannes Graumann wrote: Hello, With the help of seek I can start readBin from any byte offset within my file that I deem appropriate. What I would like to do is to be able to define the endpoint of that read as well. Is there any solution to that already out there? Thanks for any hints, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave: Howto write real TeX formula in plot
cls59 wrote: install.packages('pgfSweave',repos='http://www.rforge.net') For others that are not on Linux. I would suggest using the r-forge site for binary installation. The binaries on rforge.net are not completely up to date in some cases. install.packages(pgfSweave, repos=http://R-Forge.R-project.org;) Otherwise use if you are not on a Linux system: install.packages('pgfSweave',repos='http://www.rforge.net',type='source') -Cameron -- View this message in context: http://www.nabble.com/Sweave%3A-Howto-write-real-TeX-formula-in-plot-tp23127536p23565286.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Optimization algorithm to be applied to S4 classes - specifically sparse matrices
Hi, I think quadratic programming is the way to go. Look at solve.QP or limSolve package. Here is a toy example that I had worked out some time back for a linear least squares problem with simple box constraints: # Problem: minimize ||Ax - y||, subject to low = x = upp require(limSolve) nc - 7 # 7 unknown parameters nr - 20 # 20 equations # Bounds on the parameters: 0 x 1, for all x # set.seed(123) A - matrix(rnorm(nr*nc), nr, nc) x - c(runif(nc-1), 1.5) # Note: the last component is out of bounds! y - A %*% x + rnorm(nr, sd=0.1) qr.solve(A, y) # unconstrained least-squares low - rep(0, nc) # lower bounds upp - rep(1, nc) # upper bounds # Implementing the bounds (there is probably a simpler way to do this) # c1 - matrix(0, nc, nc) diag(c1) - 1 c2 - matrix(0, nc, nc) diag(c2) - -1 cmat - rbind(c1, c2) vec - rep(0, 10) vec[seq(1, 2*nc, by=2)] - 1:nc vec[seq(2, 2*nc, by=2)] - (nc+1):(2*nc) Cmat - rbind(c1, c2)[vec, ] # Constraint matrix G b0 - c(low, -upp)[vec] ans - lsei(A = A, B = y, G = Cmat, H = b0) ans Hope this helps, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of spencerg Sent: Friday, May 15, 2009 2:22 PM To: avraham.ad...@guycarp.com Cc: r-help@r-project.org; Douglas Bates Subject: Re: [R] Optimization algorithm to be applied to S4 classes - specifically sparse matrices I suggest you try to translate your constraints into an unconstrained constrained problem using logarithms, then do nonlinear mixed effects modeling as described in chapters 6-8 of Pinheiro and Bates (2000). To do this, I would first start with the simpler linear estimation problem to get starting values for the nonlinear estimation. You should be able to do this using the nlme function in the nlme package. If you have trouble with this, you might consider the nlmer function in the lme4 package. The latter is newer and better in many ways but not as well documented. Hope this helps. Spencer Graves avraham.ad...@guycarp.com wrote: Thank you both very much for your replies. What makes this a little less straightforward, at least to me, is that there needs to be constraints on the solved parameters. They most certainly need to be positive and there may be an upper limit as well. The true best linear fit would have negative entries for some of the parameters. Originally, I was using the L-BFGS-B method of optim which both allows for box constraints and has the limited memory advantage useful when dealing with large matrices. Having the analytic gradient, I thought of using BFGS and having a statement in the function returning Inf for any parameters outside the allowable constraints. I do /not/ know how to apply parameter constraints when using linear models. I looked around at the various manuals and help features, and outside of package glmc I did not find anything I could use. Perhaps I overlooked something. If there is something I missed, please let me know. If there truly is no standard optimization routine that works on sparse matrices, my next step may be to use the normal equations to shrink the size of the matrix, recast it as a dense matrix (it would only be 1173x1173 then) and then hand it off to optim. Any further suggestions or corrections would be very much appreciated. Thank you, --Avraham Adler Douglas Bates ba...@stat.wisc. edu To Sent by: avraham.ad...@guycarp.com dmba...@gmail.com cc r-help@r-project.org Subject 05/15/2009 11:57 Re: [R] Optimization algorithm to AMbe applied to S4 classes - specifically sparse matrices On Wed, May 13, 2009 at 5:21 PM, avraham.ad...@guycarp.com wrote: Hello. I am trying to optimize a set of parameters using /optim/ in which the actual function to be minimized contains matrix multiplication and is of the form: SUM ((A%*%X - B)^2) where A is a matrix and X and B are vectors, with X as parameter vector. As Spencer Graves pointed out, what you are describing here is a linear least squares problem, which
Re: [R] can you tell what .Random.seed *was*?
On Fri, May 15, 2009 at 12:07 PM, Stavros Macrakis macra...@alum.mit.edu wrote: system.time(whatseed(runif(1))) Sorry, though I got lucky and my overall result is roughly correct, this is an incorrect time measure. It should be r - runif(1); system.time(whatseed(r)) because R's call-by-need semantics don't evaluate the runif before it starts running whatseed. The correct time (on my machine) is then 28 hours, not 18. Better to avoid side-effect functions as arguments -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can you tell what .Random.seed *was*?
On 15 May 2009 at 13:08, G. Jay Kerns wrote: | Thanks very much to Warren and Stavros for their additional insight. | Putting all of this together, I think I am now ready to formulate my | question intelligently: | | Using Sweave, I want to distribute randomly generated problems AND | answers to both teacher AND student. | | More precisely, I want to distribute: | 1) the .Rnw file | 2) the .RData file saved near the end of the Sweave process. | | I want it to be *easy* for the Instructor to change my seed and | generate new problems. | | I want it to be *difficult* for students to figure out the seed and | automatically generate solutions on their own. | | Of course, difficult is a relative term, since what is difficult | for them may well be easy for me, and what is difficult for me will | be trivial to cryptographers and some people on this list. The | audience would be, say, upper division undergraduate students at a | public university. | | | What is clear so far: a brute force search of set.seed() is really | pretty easy and fast... even for students at this level. | | However, relating to Duncan's second remark: what if the Instructor | inserted an *unknown* very large number of calls to the RNG near the | beginning of the .Rnw (but after the set.seed)... and did not | distribute this information to the students... that would make it | much harder, yes? | | Any ideas that are even better than this? You could use (one or more) seeds from a hardware RNGs. The website http://random.org by Mads Haahr distributes such numbers (and my CRAN package 'random' gets them for you in a convenient fashion). Have a look at the docs at random.org, and the two vignettes in the random package: RANDOM.ORG offers true random numbers to anyone on the Internet. The randomness comes from atmospheric noise, which for many purposes is better than the pseudo-random number algorithms typically used in computer programs. People use RANDOM.ORG for holding drawings, lotteries and sweepstakes, to drive games and gambling sites, for scientific applications and for art and music. Hth, Dirk -- Three out of two people have difficulties with fractions. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.