[R] set up a blank csv file and write time series to it row by row
Dear Friends. Greetings! I have asked the question of how to set up a blank file and write a list to it as a row for many times, with the number of lists unknown. I have received many beautiful solutions. Thanks go to Professor *Murdoch, Professor *Menne, Professor Grothendieck and Dr. Olshansky. I have organized the solutions below: ## *Set up a blank table in harddrive and write to it row by row* *#Method 1* *blank = data.frame(name=character(0), wife=character(0), no.children=numeric(0))* write.csv(blank, 'file1.csv', row.names=FALSE) a1 = list(name=Tom, wife=Joy, no.children=9) a2 = list(name=Paul, wife=Alic, no.children=5) write.table(a1, file=file1.csv, sep=',', append=TRUE, row.names=FALSE, col.names=FALSE) write.table(a2, file=file1.csv, sep=',', append=TRUE, row.names=FALSE, col.names=FALSE)** * * *#Method 2* *blank = data.frame(name=character(0), wife=character(0), no.children=numeric(0))* write.csv(blank, 'file2.csv') a1 = list(name=Tom, wife=Joy, no.children=9) a2 = list(name=Paul, wife=Alic, no.children=5) write.table(a1, file=file2.csv, sep=',', append=TRUE, row.names=2, col.names=FALSE) write.table(a2, file=file2.csv, sep=',', append=TRUE, row.names=3, col.names=FALSE)** ### My problem now is, how to write a time series (instead of a list) to a csv file? Also, how to set up such a csv file to accept the time series? I know the length of the time series' but I do not know how many of them are going to come up. Examples are : bb1=c(1:10) bb2=c(101:110) How to write bb1 and bb2 to a csv file and how to set up blank csv file to accept such time series in the first place? Your help will be highly appreciated!!! Best Wishes! Yuchen Luo [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unexpected behavior in PBSmapping package
Using R 2.5.1 on Windows XP Professional, and PBSmapping package version 2.51, I have encountered some behavior which puzzles me. I am including the package's listed maintainer on this email but also seek the thoughts of the R-help community. I have a set of EventData, which I want to plot as points, and to color the points according to some criterion. It turns out that some of my points fall outside my desired plotting region. It looks like this causes the PBSmapping functions plotPoints and addPoints to incorrectly deal with the color assignments. Consider the following toy example: ### Begin Example ### library( PBSmapping ) # Define some EventData events - as.EventData( read.table( textConnection( 'EID X Y Color 1 494 1494 red 2 497 1497 blue 3 500 1500 green 4 503 1503 yellow' ), header=TRUE, strings=FALSE ), proj='UTM', zone=10 ) par( mfrow=c(3,1) ) # Plot the events with plot limits large enough to show # the full extent of all the symbols plotPoints( events, pch=16, cex=5, col=events$Color, xlim=c(490,508), ylim=c(1490,1508), proj=TRUE ) with( events, text( X, Y, toupper( substr( Color, 1, 1 ) ), font=2, cex=2 ) ) # Normal plot extents; partial symbols cut off by edges # of plotting region (as expected) plotPoints( events, pch=16, cex=5, col=events$Color, proj=TRUE ) with( events, text( X, Y, toupper( substr( Color, 1, 1 ) ), font=2, cex=2 ) ) ## Now use more-restrictive plot limits plotPoints( events, pch=16, cex=5, col=events$Color, xlim=c(499,505), ylim=c(1499,1505), proj=TRUE ) with( events, text( X, Y, toupper( substr( Color, 1, 1 ) ), font=2, cex=2 ) ) # Note that symbols are plotted in the right places (note text labels) # but colors are not as expected ### End example ### For the moment, I have worked around this issue by using a with( events, points( ... ) ) construction, but this seems suboptimal; I would prefer to use addPoints (which exhibits the same problem as plotPoints does in the toy example above). I would appreciate any insights those on the list might have. Please include me directly on any reply to the list, as I am at least a couple weeks behind on reading the digested version of the list. I see that there have been no mentions of the PBSmapping package even in the digests I have not yet read. Session info: sessionInfo() R version 2.5.1 (2007-06-27) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: PBSmapping 2.51 --David Dailey Shoreline, Washington, USA __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help using gPath
Hi Emilio Gagliardi wrote: Hi Paul, I'm sorry for not posting code, I wasn't sure if it would be helpful without the data...should I post the code and a sample of the data? I will remember to do that next time! It's important not only to post code, but also to make sure that other people can run it (i.e., include real data or have the code generate data or use one of R's predefined data sets). Also, isn't this next time ? :) grid.gedit(gPath(ylabel.text.382), gp=gpar(fontsize=16)) OK, I think my confusion comes from the notation that current.grobTree() produces and what strings are required in order to make changes to the underlying grobs. But, from what you've provided, it looks like I can access each grob with its unique name, regardless of which parent it is nested in...that helps Yes. By default, grid will search the tree of all grobs to find the name you provide. You can even just provide part of the name and it will find partial matches (depending on argument settings). On the other hand, by specifying a path that specified parent and child grobs, you can make sure you get exactly the grob you want. like to remove the left border on the first panel. I'd like to adjust the I'd guess you'd have to remove the grob background.rect.345 and then draw in just the sides you want, which would require getting to the right viewport, for which you'll need to study the viewport tree (see current.vpTree()) I did some digging into this and it seems pretty complicated, is there an example anywhere that makes sense to the beginner? The whole viewport grob relationship is not clear to me. So, accessing viewports and removing objects and drawing new ones is beyond me at this point. I can get my mind around your example below because I can see the object I want to modify in the viewer, and the code changes a property of that object, click enter, and bang the object changes. When you start talking external pointers and finding viewports and pushing and popping grobs I just get lost. I found the viewports for the grobTree, it looks like this: There's a book that provides a full explanation and the (basic) grid chapter is online (see http://www.stat.auckland.ac.nz/~paul/RGraphics/rgraphics.html) viewport[ROOT]-(viewport[layout]-(viewport[axis_h_1_1]-(viewport[bottom_axis]-(viewport[labels], viewport[ticks])), viewport[axis_h_1_2]-(viewport[bottom_axis]-(viewport[labels], viewport[ticks])), viewport[axis_v_1_1]-(viewport[left_axis]-(viewport[labels], viewport[ticks])), viewport[panel_1_1], viewport[panel_1_2], viewport[strip_h_1_1], viewport[strip_h_1_2], viewport[strip_v_1_1])) at that point I was like, ok, I'm done. :S Yep, the facilities for investigating the viewport and grob tree are basically inadequate. Based on some work Hadley did for ggplot, the development version of R has a slightly better tool called grid.ls() that can show how the grob tree and the viewport tree intertwine. That would allow you to see which viewport each grob was drawn in, which would help you, for example, to know which viewport you had to go to to replace a rectangle you want to remove. Something like ... grid.gedit(geom_bar.rect, gp=gpar(col=green)) Again, it would really help to have some code to run. My apologies, I thought the grobTree was sufficient in this case. Thanks very much for your help. Sorry to harp on about it, but if I had your code I could show you an example of how grid.ls() might help. Paul -- Dr Paul Murrell Department of Statistics The University of Auckland Private Bag 92019 Auckland New Zealand 64 9 3737599 x85392 [EMAIL PROTECTED] http://www.stat.auckland.ac.nz/~paul/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need help with pdf-plot
I still have this problem. Does anybody know any solution? Antje Antje schrieb: Hello, I'm trying to plot a set of barplots like a matrix (2 rows, 10 columns fromreduced_mat) to a pdf. It works with the following parameters: pdf(test.pdf,width=ncol(reduced_mat)*2, height=nrow(reduced_mat)*2, pointsize = 12) par(mfcol = c(nrow(reduced_mat),ncol(reduced_mat)), oma = c(0,0,0,0), lwd=48/96, cex.axis = 0.5, las = 2, cex.main = 1.0) The I get a long narrow page format with the quadratic barplots. But I would like to have a A4 format in the end and the plots not filling the whole page (they should stay somehow quadratic and not be stretched...). What shall I look for to achieve this? Antje __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory
Thanks for all the comments, The artificial dataset is as representative of my 440MB file as I could design. I did my best to reduce the complexity of my problem to minimal reproducible code as suggested in the posting guidelines. Having searched the archives, I was happy to find that the topic had been covered, where Prof Ripley suggested that the I/O manuals gave some advice. However, I was unable to get anywhere with the I/O manuals advice. I spent 6 hours preparing my post to R-help. Sorry not to have read the 'R-Internals' manual. I just wanted to know if I could use scan() more efficiently. My hurdle seems nothing to do with efficiently calling scan() . I suspect the same is true for the originator of this memory experiment thread. It is the overhead of storing short strings, as Charles identified and Brian explained. I appreciate the investigation and clarification you both have made. 56B overhead for a 2 character string seems extreme to me, but I'm not complaining. I really like R, and being free, accept that it-is-what-it-is. In my case pre-processing is not an option, it is not a one off problem with a particular file. In my application, R is run in batch mode as part of a tool chain for arbitrary csv files. Having found cases where memory usage was as high as 20x file size, and allowing for a copy of the the loaded dataset, I'll just need to document that it is possible that files as small as 1/40th of system memory may consume it all. That rules out some important datasets (US Census, UK Office of National Statistics files, etc) for 2GB servers. Regards, Mike On 8/9/07, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Thu, 9 Aug 2007, Charles C. Berry wrote: On Thu, 9 Aug 2007, Michael Cassin wrote: I really appreciate the advice and this database solution will be useful to me for other problems, but in this case I need to address the specific problem of scan and read.* using so much memory. Is this expected behaviour? Yes, and documented in the 'R Internals' manual. That is basic reading for people wishing to comment on efficiency issues in R. Can the memory usage be explained, and can it be made more efficient? For what it's worth, I'd be glad to try to help if the code for scan is considered to be worth reviewing. Mike, This does not seem to be an issue with scan() per se. Notice the difference in size of big2, big3, and bigThree here: big2 - rep(letters,length=1e6) object.size(big2)/1e6 [1] 4.000856 big3 - paste(big2,big2,sep='') object.size(big3)/1e6 [1] 36.2 On a 32-bit computer every R object has an overhead of 24 or 28 bytes. Character strings are R objects, but in some functions such as rep (and scan for up to 10,000 distinct strings) the objects can be shared. More string objects will be shared in 2.6.0 (but factors are designed to be efficient at storing character vectors with few values). On a 64-bit computer the overhead is usually double. So I would expect just over 56 bytes/string for distinct short strings (and that is what big3 gives). But 56Mb is really not very much (tiny on a 64-bit computer), and 1 million items is a lot. [...] -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Positioning text in top left corner of plot
Hi Daniel Brewer wrote: Thanks for the replies, but I still cannot get what I want. I do not want the label inside the plot area, but in the top left of the paper, I suppose in the margins. When I try to use text to do this, it does not seem to plot it outside the plot area. I have also tried to use mtext, but that does not really cut it, as I cannot get the label in the correct position. Ideally, it would be best if I could use legend but have it outside the plot area. Any ideas? plot(1:10) library(grid) grid.text(What do we want? Text in the corner!\nWhere do we want it? Here!, x=unit(2, mm), y=unit(1, npc) - unit(2, mm), just=c(left, top)) Paul Thanks Benilton Carvalho wrote: maybe this is what you want? plot(rnorm(10)) legend(topleft, A), bty=n) ? b On Aug 7, 2007, at 11:08 AM, Daniel Brewer wrote: Simple question how can you position text in the top left hand corner of a plot? I am plotting multiple plots using par(mfrow=c(2,3)) and all I want to do is label these plots a), b), c) etc. I have been fiddling around with both text and mtext but without much luck. text is fine but each plot has a different scale on the axis and so this makes it problematic. What is the best way to do this? Many thanks Dan -- Dr Paul Murrell Department of Statistics The University of Auckland Private Bag 92019 Auckland New Zealand 64 9 3737599 x85392 [EMAIL PROTECTED] http://www.stat.auckland.ac.nz/~paul/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Positioning text in top left corner of plot
Daniel Brewer wrote: Thanks for the replies, but I still cannot get what I want. I do not want the label inside the plot area, but in the top left of the paper, I suppose in the margins. When I try to use text to do this, it does not seem to plot it outside the plot area. I have also tried to use mtext, but that does not really cut it, as I cannot get the label in the correct position. Ideally, it would be best if I could use legend but have it outside the plot area. Any ideas? Hi Dan, Try this: plot(1:5) par(xpd=TRUE) text(0.5,5.5,Outside) par(xpd=FALSE) Jim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GLMM: MEEM error due to dichotomous variables
I am trying to run a GLMM on some binomial data. My fixed factors include 2 dichotomous variables, day, and distance. When I run the model: modelA-glmmPQL(Leaving~Trial*Day*Dist,random=~1|Indiv,family= binomial) I get the error: iteration 1 Error in MEEM(object, conLin, control$niterEM) : Singularity in backsolve at level 0, block 1 From looking at previous help topics,( http://tolstoy.newcastle.edu.au/R/help/02a/4473.html) I gather this is because of the dichotomous predictor variables - what approach should I take to avoid this problem? Are you sure? I have never had problems including factors in a glmmPQL so far. More likely, the combination of your explanatory variables leads to a fragmentation in your response such that each combination of your factor levels only contain 0s or 1s. Thus, your model is 'too good' (it has too many predictors given the amount of data). Try e.g. to fit a model without the interactions. Cheers, Lorenz - Lorenz Gygax Centre for proper housing of ruminants and pigs Agroscope Reckenholz-Tänikon Research Station ART Tänikon, CH-8356 Ettenhausen / Switzerland __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GLM with tweedie: NA for AIC
Dear R users; I am modelling densities of some species of birds, so I have a problem with a great ammount of zeros. I have decided to try GLMs with the tweedie family, but in all the models I have tried I got an NA for the AIC value. Just to check the problem I've compared the a glm using the Gaussian family with the identity link and a glm using the tweedie family with var.power=0 and link.power=1. These are equal, as expected, except the fact that the tweedie output gives me an NA for the AIC. Can anyone help me with this problem? Below you can find the two outputs I refer. Best Wishes; Catarina summary(glm(formula=ACIN~DIST_REF+DIST_H2O+DIST_OST+ COTA+H2O_SUP+vasa,family=gaussian(link=identity))) Call:glm(formula = ACIN ~ DIST_REF + DIST_H2O + DIST_OST + COTA + H2O_SUP + vasa, family = gaussian(link = identity)) Deviance Residuals: Min 1Q Median 3QMax -0.112792 -0.042860 -0.021113 -0.006311 1.551824 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -6.625e-02 5.454e-02 -1.215 0.2256 DIST_REF 3.581e-06 1.336e-05 0.268 0.7889 DIST_H2O-3.168e-05 1.527e-05 -2.074 0.0391 *DIST_OST -1.799e-05 1.953e-05 -0.921 0.3579 COTA 5.648e-04 2.470e-04 2.287 0.0230 *H2O_SUP -2.172e-04 3.994e-04 -0.544 0.5870 vasa 3.695e-02 4.573e-02 0.808 0.4199 ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for gaussian family taken to be 0.02151985) Null deviance: 5.6028 on 257 degrees of freedomResidual deviance: 5.4015 on 251 degrees of freedomAIC: -249.33 Number of Fisher Scoring iterations: 2 summary(glm(formula=ACIN~DIST_REF+DIST_H2O+DIST_OST+ COTA+H2O_SUP+vasa,control=glm.control(maxit=750),family=tweedie(var.power=0, link.power=1))) Call:glm(formula = ACIN ~ DIST_REF + DIST_H2O + DIST_OST + COTA + H2O_SUP + vasa, family = tweedie(var.power = 0, link.power = 1), control = glm.control(maxit = 750)) Deviance Residuals: Min 1Q Median 3QMax -0.112792 -0.042860 -0.021113 -0.006311 1.551824 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -6.625e-02 5.454e-02 -1.215 0.2256 DIST_REF 3.581e-06 1.336e-05 0.268 0.7889 DIST_H2O-3.168e-05 1.527e-05 -2.074 0.0391 *DIST_OST -1.799e-05 1.953e-05 -0.921 0.3579 COTA 5.648e-04 2.470e-04 2.287 0.0230 *H2O_SUP -2.172e-04 3.994e-04 -0.544 0.5870 vasa 3.695e-02 4.573e-02 0.808 0.4199 ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for Tweedie family taken to be 0.02151985) Null deviance: 5.6028 on 257 degrees of freedomResidual deviance: 5.4015 on 251 degrees of freedomAIC: NA Number of Fisher Scoring iterations: 2 _ Conheça o Windows Live Spaces, a rede de relacionamentos conectada ao Messenger! [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cleaning up the memory
Hi, I have 4 huge tables on which i want to do a PCA analysis and a kmean clustering. If i run each table individually i have no problems, but if i want to run it in a for loop i exceed the memory alocation after the second table, even if i save the results as a csv table and i clean up all the big objects with rm command. To me it seems that even if i don't have the objects anymore, the memory these objects used to occupy is not cleared. Is there any way to clear up the memory as well? I don't want to close R and start it up again. Also i am running R under Windows. thanks, Monica _ [[trailing spam removed]] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] need help to manipulate function and time interval
Hi R-users, I have to define a noise level function L and its energy in the various moment of the day by: if time is between 18:00:00 and 23:59:59 then L[j] - L[j]+5 and W - 10^((L+5)/10) if time is between 22:00:00 and 05:59:59 == L - L+10 and W - 10^((L+10)/10) else L=L and W = W Could someone help me to realize this function please? You will find my following proposal code, but my main problem is to handle the time interval. Best regard ### myfunc - function(mytab, Time, Level) { vect - rep(0, length(mytab)) for(i in 1:length(vect)) { for(j in 1:length(Time)) if(time[j] is between 18:00:00 and 23:59:59) L[i] - L[j]+5 vect[i] - 10^((L[i])/10 if (time[j] is between 22:00:00 and 05:59:59) L[i] - L[j]+10 vect[i] - 10^((L[i])/10 else L[i] = L[j] vect[i] - 10^((L[i])/10 } } ### Lassana KOITA Chargé d'Etudes de Sécurité Aéroportuaire et d'Analyse Statistique / Project Engineer Airport Safety Studies Statistical analysis Service Technique de l'Aviation Civile (STAC) / Civil Aviation Technical Department Direction Générale de l'Aviation Civile (DGAC) / French Civil Aviation Headquarters Tel: 01 49 56 80 60 Fax: 01 49 56 82 14 E-mail: [EMAIL PROTECTED] http://www.stac.aviation-civile.gouv.fr/ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] small sample techniques
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nair, Murlidharan T Sent: Thursday, August 09, 2007 12:02 PM To: Nordlund, Dan (DSHS/RDA); r-help@stat.math.ethz.ch Subject: Re: [R] small sample techniques n=300 30% taking A relief from pain 23% taking B relief from pain Question; If there is no difference are we likely to get a 7% difference? Hypothesis H0: p1-p2=0 H1: p1-p2!=0 (not equal to) 1Weighed average of two sample proportion 300(0.30)+300(0.23) --- = 0.265 300+300 2Std Error estimate of the difference between two independent proportions sqrt((0.265 *0.735)*((1/300)+(1/300))) = 0.03603 3Evaluation of the difference between sample proportion as a deviation from the hypothesized difference of zero ((0.30-0.23)-(0))/0.03603 = 1.94 z did not approach 1.96 hence H0 is not rejected. This is what I was trying to do using prop.test. prop.test(c(30,23),c(300,300)) What function should I use? The proportion test above indicates that p1=0.1 and p2=0.0767. But in your t-test you specify p1=0.3 and p2=0.23. Which is correct? If p1=0.3 and p2=0.23, then use prop.test(c(.30*300,.23*300),c(300,300)) Hope this is helpful, Dan Daniel J. Nordlund Research and Data Analysis Washington State Department of Social and Health Services Olympia, WA 98504-5204 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] write.table
write.table(mydata.frame, mydata, col.names=NA, quote=F, sep=\t) will solve the problem. Deng -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Weiwei Shi Sent: August 10, 2007 12:41 PM To: r-help@stat.math.ethz.ch Subject: [R] write.table Hi, I am always with this qustion when I tried to write a data.frame with row.names and col.names. I have to re-make the data frame to let its first column be the rownames and let row.names=F so that I can align the colnames correctly. Is there a way or option in write.table to automatically do that? thanks. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subject: Re: how to include bar values in a barplot?
Jim Lemon Wrote: I also greatly enjoyed Ted's rebuttal of the Bar charts are evil and must be banned argument. If bar charts are appropriate for the audience, give 'em bar charts. One great way to turn off your customers is to tell them what they can and can't do with your product. I don't remember anyone saying that barcharts are evil or that they should be banned (3-D bar charts and pie charts on the other hand ...). I think that a variation on fortune(108) applies here. While barcharts may be appropriate for some audiences, it is also appropriate to educate our audiences to better alternatives. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Test (ignore)
Daniel J. Nordlund Research and Data Analysis Washington State Department of Social and Health Services Olympia, WA 98504-5204 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to include bar values in a barplot?
Welcome to the world of R. I'm glad that you found the discussion enlightening, now that you have thought about things a bit, here is some code to try out that shows some of the alternatives to the original plots you provided (which is best depends on your audience and what your main question of interest is (which comparisons are most important): tmp - c(34,22,77) tmp2 - barplot(tmp, names=LETTERS[1:3]) # put numbers at bottom of bars axis(1, at=tmp2, labels=as.character(tmp), tick=FALSE, line = -1) # put numbers at top of plot axis(3, at=tmp2, labels=as.character(tmp), tick=FALSE) # horizontal boxplot op - par(mar=c(5,6,4,6)+0.1) tmp2 - barplot(tmp, names=LETTERS[1:3], horiz=TRUE) # put numbers on the right axis(4, at=tmp2, labels=as.character(tmp), tick=FALSE, las=1) par(op) # the dotplot library(Hmisc) dotchart2(tmp, labels=LETTERS[1:3], auxdata=tmp, xlim=range(0,tmp)) # alternatives to stacked bars tmp1 - c(8, 22, 60, 10, 10, 21, 59, 10) tmp2 - factor(rep(c('A','B'), each=4)) tmp3 - factor(rep(1:4, 2)) dotchart2(tmp1, groups=tmp2, labels=tmp3, xlim=range(0,tmp1)) dotchart2(tmp1, groups=tmp3, labels=tmp2, xlim=range(0,tmp1)) library(lattice) tmp - data.frame( tmp1=tmp1, tmp2=tmp2, tmp3=tmp3 ) dotplot( tmp2~tmp1, data=tmp, groups=tmp3, pch=levels(tmp3), scales=list(x=list(limits=range(0,tmp1))) ) dotplot( tmp3~tmp1, data=tmp, groups=tmp2, pch=levels(tmp2), xlim=range(0,tmp1) ) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Donatas G. Sent: Friday, August 10, 2007 2:15 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] how to include bar values in a barplot? Quoting Greg Snow [EMAIL PROTECTED]: My original intent was to get the original posters out of the mode of thinking they want to match what the spreadsheet does and into thinking about what message they are trying to get across. To get them (and possibly others) thinking I made the statements a bit more bold than my actual position (I did include a couple of qualifiers). As an original poster (and a brand new user of R), I would like to comment on the educational experience I have just received. ;) The discussion was interesting and enlightening, and gives some good ideas about the ways (tables, graphs, graphs with numbers etc.) to get the data accross to the ones one is presenting to. I see some of you guys do feel quite strongly about it, which is fine for me. I do not. I usually care for barplot aesthetics and informativeness more than for visual simplicity. That may change in time :) I see R graphical capabilities are huge but hard to access at times - that is when spreadsheet seems preferrable. For example, as a user of Linux I still cannot figure out why the fonts (and graphics in general) look much more ugly on R in Linux than they do in R on Windows - no smoothing, sub-pixell hinting, anything like that. That is what my next free time homework on R will be about :) Sincerely Donatas Glodenis PhD candidate Department of Sociology of the Faculty of Philosophy Vilnius University Lithuania __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rfImpute
I am having trouble with the rfImpute function in the randomForest package. Here is a sample... clunk.roughfix-na.roughfix(clunk) clunk.impute-rfImpute(CONVERT~.,data=clunk) ntree OOB 1 2 300: 26.80% 3.83% 85.37% ntree OOB 1 2 300: 18.56% 5.74% 51.22% Error in randomForest.default(xf, y, ntree = ntree, ..., do.trace = ntree, : NA not permitted in predictors So roughFix works, but rfImpute doesn't Thanks, Eric ent3c *at* virginia.edu -- Eric Turkheimer, PhD Department of Psychology University of Virginia PO Box 400400 Charlottesville, VA 22904-4400 434-982-4732 434-982-4766 (FAX) [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with counting how many times each value occur in each column
[Gabor Grothendieck] table(col(mat), mat) Clever, simple, and elegant! :-) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Odp: having problems with factor()
You've spotted it! table(df$area) 0 1 2 3 4 5 7 21 27 71 46 19 3 1 There are no values in area 6. Thank you very much. Jabez - Original Message From: Petr PIKAL [EMAIL PROTECTED] To: Jabez Wilson [EMAIL PROTECTED] Cc: R-Help r-help@stat.math.ethz.ch Sent: Friday, 10 August, 2007 1:02:21 PM Subject: Odp: [R] having problems with factor() Hi [EMAIL PROTECTED] napsal dne 10.08.2007 13:41:53: Dear R Help, I have a set of data of heights of trees described by area that they are in. The areas are numerical (0 to 7). htarea 1 320 3 2 410 4 3 230 2 4 360 3 5 126 1 6 280 2 7 260 2 8 280 2 9 280 2 10 260 2 ... 180 450 4 181 90 1 182 120 1 183 440 4 184 210 2 185 330 3 186 210 2 187 100 1 188 0 0 I want to convert the area column values to factors, to do an anova. However, if I use: df$areaf - factor(df$area, labels=c(0,I,II,III,IV,V,VI,VII)) it gives the following message: Hm, maybe some of the values are missing num-sample(1:3, 10, replace=T) num [1] 1 3 1 2 3 3 1 3 3 3 factor(num, labels=c(O, I, II)) [1] O II O I II II O II II II Levels: O I II factor(num, labels=c(O, I, II, III)) Error in factor(num, labels = c(O, I, II, III)) : invalid labels; length 4 should be 1 or 3 try table(df$area) to see what level you really have Regards Petr Error in factor(df$area, labels = c(0, I, II, III, IV, V, VI, : invalid labels; length 8 should be 1 or 7 Can anyone help? Jabez ___ now. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ___ now. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help wit matrices
An even simpler solution is: mat2 - 1 * (mat1 0.25) Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Lanre Okusanya Sent: Friday, August 10, 2007 2:20 PM To: jim holtman Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Help wit matrices that was ridiculously simple. duh. THanks Lanre On 8/10/07, jim holtman [EMAIL PROTECTED] wrote: Is this what you want: x - matrix(runif(100), 10) round(x, 3) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0.268 0.961 0.262 0.347 0.306 0.762 0.524 0.062 0.028 0.226 [2,] 0.219 0.100 0.165 0.131 0.578 0.933 0.317 0.109 0.527 0.131 [3,] 0.517 0.763 0.322 0.374 0.910 0.471 0.278 0.382 0.880 0.982 [4,] 0.269 0.948 0.510 0.631 0.143 0.604 0.788 0.169 0.373 0.327 [5,] 0.181 0.819 0.924 0.390 0.415 0.485 0.702 0.299 0.048 0.507 [6,] 0.519 0.308 0.511 0.690 0.211 0.109 0.165 0.192 0.139 0.681 [7,] 0.563 0.650 0.258 0.689 0.429 0.248 0.064 0.257 0.321 0.099 [8,] 0.129 0.953 0.046 0.555 0.133 0.499 0.755 0.181 0.155 0.119 [9,] 0.256 0.954 0.418 0.430 0.460 0.373 0.620 0.477 0.132 0.050 [10,] 0.718 0.340 0.854 0.453 0.943 0.935 0.170 0.771 0.221 0.929 ifelse(x .5, 1, 0) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,]010001100 0 [2,]000011001 0 [3,]110010001 1 [4,]011101100 0 [5,]011000100 1 [6,]101100000 1 [7,]110100000 0 [8,]010100100 0 [9,]010000100 0 [10,]101011010 1 On 8/10/07, Lanre Okusanya [EMAIL PROTECTED] wrote: Hello all, I am working with a 1000x1000 matrix, and I would like to return a 1000x1000 matrix that tells me which value in the matrix is greater than a theshold value (1 or 0 indicator). i have tried mat2-as.matrix(as.numeric(mat10.25)) but that returns a 1:10 matrix. I have also tried for loops, but they are grossly inefficient. THanks for all your help in advance. Lanre __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up the memory
On Fri, 10 Aug 2007, Monica Pisica wrote: Thanks! I will look into ... I have 4 GB RAM, and i was monitoring the memory with Windows task manager so i was looking how R gets more and more memory allocation from less than 100Mb to 1500Mb . Then you are almost certainly fragmenting the address space. We still don't know your OS and whether you have enabled the /3GB switch (if relevant to that version of Windows). Most versions of Windows have a 2Gb address space, but some can be as high as 4Gb (Vista 64 which I use is one: the details are in the rw-FAQ for the latest versions of R, e.g. R-patched and R-devel). That factor of 2 can make a big difference. My initial tables are between 30 to 80 Mb and the resulting tables that incorporate the initial tables plus PCA and kmeans results are inbetween 50 to 200MB or thereabouts! And yes, i don't really care about memory allocation in detail - what i want is to free that memory after every cycle ;-) Although, after i didn't do anything in R and it was idle for more than 30 min. the memory allocation according to Task manager dropped to 15 Mb . which is good - but i cannot wait inbetween cycles half an hour though . Calling gc() will reduce the memory allocation, but that is not the point. You can have 15Mb allocated and still not a 50Mb hole in the address space (although that would be extremely unlucky, not having several 200Mb holes is quite likely). Again thanks, Monica Date: Fri, 10 Aug 2007 18:28:07 +0100 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] CC: r-help@stat.math.ethz.ch Subject: Re: [R] Cleaning up the memory On Fri, 10 Aug 2007, Monica Pisica wrote:Hi, I have 4 huge tables on which i want to do a PCA analysis and a kmean clustering. If i run each table individually i have no problems, but if i want to run it in a for loop i exceed the memory alocation after the second table, even if i save the results as a csv table and i clean up all the big objects with rm command. To me it seems that even if i don't have the objects anymore, the memory these objects used to occupy is not cleared. Is there any way to clear up the memory as well? I don't want to close R and start it up again. Also i am running R under Windows. See ?gc, which does the clearing. However, unless you study the memory allocation in detail (which you cannot do from R code), you don't actually know that this is the problem. More likely is that you have fragmentation of your 32-bit address space: see ?Memory-limits. Without any idea what memory you have and what 'huge' means, we can only make wild guesses. It might be worth raising the memory limit (the --max-mem-size flag). thanks, Monica _ [[trailing spam removed]] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 _ Messenger Café ? open for fun 24/7. Hot games, cool activities served daily. Visit now. http://cafemessenger.com?ocid=TXT_TAGLM_AugWLtagline -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help wit matrices
Will something like this help? mm - matrix(rnorm(100),nrow=10) mm nn - mm .5 nn --- Lanre Okusanya [EMAIL PROTECTED] wrote: Hello all, I am working with a 1000x1000 matrix, and I would like to return a 1000x1000 matrix that tells me which value in the matrix is greater than a theshold value (1 or 0 indicator). i have tried mat2-as.matrix(as.numeric(mat10.25)) but that returns a 1:10 matrix. I have also tried for loops, but they are grossly inefficient. THanks for all your help in advance. Lanre __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help wit matrices
I hope you don't mind that I offer also two solutions. No.1 is really bad. No.2 should be on par with the other ones. Best, Roland mydata - matrix(rnorm(10*10), ncol=10) threshold.value - 1.5 mydata2 - matrix(0, nrow=nrow(mydata), ncol=ncol(mydata)) mydata3 - matrix(0, nrow=nrow(mydata), ncol=ncol(mydata)) ### not really the way to go: for (i in 1:nrow(mydata)) { for (j in 1:ncol(mydata)) { if (mydata[i,j]threshold.value) { mydata2[i,j] - 1 } } } ### a better way... mydata3[mydata threshold.value] - 1 mydata2 mydata3 Lanre Okusanya wrote: Hello all, I am working with a 1000x1000 matrix, and I would like to return a 1000x1000 matrix that tells me which value in the matrix is greater than a theshold value (1 or 0 indicator). i have tried mat2-as.matrix(as.numeric(mat10.25)) but that returns a 1:10 matrix. I have also tried for loops, but they are grossly inefficient. THanks for all your help in advance. Lanre __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] half-logit and glm (again)
I know this has been dealt with before on this list, but the previous messages lacked detail, and I haven't figured it out yet. The model is: \x_{ij} = \mu + \alpha_i + \beta_j \alpha is a random effect (subjects), and \beta is a fixed effect (condition). I have a link function: p_{ij} = .5 + .5( 1 / (1 + exp{ -x_{ij} } ) ) Which is simply a logistic transformed to be between .5 and 1. The data y_{ij} ~ Binomial( p_{ij}, N_{ij} ) I've generated data using this model, and I'd like to fit it. My data is a data frame with 3 columns, response (0/1), subject (a factor), and condition (another factor). Here is my link definition: # halflogit=function(){ half.logit=function(mu) qlogis(2*mu-1) half.logit.inv=function(eta) .5*plogis(eta)+.5 half.logit.deriv=function(eta) .5*(exp(eta/2)+exp(-eta/2))^-2 half.logit.inv.indicator=function(eta) TRUE half.logit.indicator=function(mu) mu.5 mu1 link - half.logit structure(list(linkfun = half.logit, linkinv = half.logit.inv, mu.eta = half.logit.deriv, validmu = half.logit.indicator ,valideta = half.logit.inv.indicator, name = link), class = link-glm) } binomial(halflogit()) Family: binomial Link function: half.logit # I based this off the help for the family() function. So I try to call glmmPQL (based on other R-help posts, this is the easiest to use?) # glmmPQL(response ~ condition, random = ~ 1|subject, family = binomial(halflogit()), data = dat) Error in if (!(validmu(mu) valideta(eta))) stop(cannot find valid starting values: please specify some) : missing value where TRUE/FALSE needed In addition: Warning message: NaNs produced in: qlogis(p, location, scale, lower.tail, log.p) # It looks like I've misdefined something and it is going outside the specified domains for the functions. I can't find any reference to starting starting values in help for glmmPQL() or lme(). If anyone has any working code where they've done a user defined link function, it would be greatly appreciated. Thanks, Richard __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] write.table
I did not read ?write.table in details about CSV section. Thanks. On 8/10/07, Yinghai Deng [EMAIL PROTECTED] wrote: write.table(mydata.frame, mydata, col.names=NA, quote=F, sep=\t) will solve the problem. Deng -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Weiwei Shi Sent: August 10, 2007 12:41 PM To: r-help@stat.math.ethz.ch Subject: [R] write.table Hi, I am always with this qustion when I tried to write a data.frame with row.names and col.names. I have to re-make the data frame to let its first column be the rownames and let row.names=F so that I can align the colnames correctly. Is there a way or option in write.table to automatically do that? thanks. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kde2d error message
If X or Y contains missing values, _you_ supplied missing values as the 'lims' argument and it will be those missing values that are reported. I do not see how you expect to be able to do density estimation with missing values: they are unknown and so no part of the answer is known. If you are prepared to omit them, you can do so but my software (if this is indeed kde2d from package MASS, uncredited) does not make such arbitrary choices for you. On Fri, 10 Aug 2007, Jennifer Dillon wrote: Hello! I am trying to do a smooth with the kde2d function, That is not what the only kde2d function I know of does. and I'm getting an error message about NAs. Does anyone have any suggestions? Does this function not do well with NAs in general? fit - kde2d(X, Y, n=100,lims=c(range(X),range(Y))) Error in if (from == to || length.out 2) by - 1 : missing value where TRUE/FALSE needed Thanks in advance!! Jen [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. PLEASE do as we ask. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] write.table
Hi, I am always with this qustion when I tried to write a data.frame with row.names and col.names. I have to re-make the data frame to let its first column be the rownames and let row.names=F so that I can align the colnames correctly. Is there a way or option in write.table to automatically do that? thanks. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help using gPath
haha Paul, It's important not only to post code, but also to make sure that other people can run it (i.e., include real data or have the code generate data or use one of R's predefined data sets). Oh, I hadn't thought of using the predefined datasets, thats a good idea! Also, isn't this next time ? :) By next time I meant, when I ask a question in the future, I didn't think you'd respond! So here is some code! library(reshape) library(ggplot2) theme_t - list(grid.fill=white,grid.colour=lightgrey,background.colour= black,axis.colour=dimgrey) ggtheme(theme_t) grp - c(2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3) time - c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2) cc - c(0.7271,0.7563,0.6979,0.8208,0.7521,0.7875,0.7563,0.7771,0.8208, 0.7938,0.8083,0.7188,0.7521,0.7854,0.7979,0.7583,0.7646,0.6938,0.6813,0.7708 ,0.7375,0.8104,0.8104,0.7792,0.7833,0.8083,0.8021,0.7313,0.7958,0.7021, 0.8167,0.8167,0.7583,0.7167,0.6563,0.6896,0.7333,0.8208,0.7396,0.8063,0.7083 ,0.6708,0.7292,0.7646,0.7667,0.775,0.8021,0.8125,0.7646,0.6917,0.7458,0.7833 ,0.7396,0.7229,0.7708,0.7729,0.8083,0.7771,0.6854,0.8417,0.7667,0.7063,0.75, 0.7813,0.8271,0.7896,0.7979,0.625,0.7938,0.7583,0.7396,0.7583,0.7938,0.7333, 0.7875,0.8146) data - as.data.frame(cbind(time,grp,cc)) data$grp - factor(data$grp,labels=c(Group A,Group B)) data$time - factor(data$time,labels=c(Pre-test,Post-test)) boxplot - qplot(grp, cc, data=data, geom=boxplot, orientation=horizontal, ylim=c(0.5,1), main=Hello World!, xlab=Label X, ylab=Label Y, facets=.~time, colour=red, size=2) boxplot + geom_jitter(aes(colour=steelblue)) + scale_colour_identity() + scale_size_identity() grid.gedit(ylabel, gp=gpar(fontsize=16)) There's a book that provides a full explanation and the (basic) grid chapter is online (see http://www.stat.auckland.ac.nz/~paul/RGraphics/rgraphics.html) Awesome, I'll check that out. Yep, the facilities for investigating the viewport and grob tree are basically inadequate. Based on some work Hadley did for ggplot, the development version of R has a slightly better tool called grid.ls() that can show how the grob tree and the viewport tree intertwine. That would allow you to see which viewport each grob was drawn in, which would help you, for example, to know which viewport you had to go to to replace a rectangle you want to remove. okie dokie, I'm ready to be amazed! hehe. emilio [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] smoothing function for proportions
It is not entirely clear what you are using for y values in smooth.spline, but it would appear that it is just the point estimates. I would suggest using instead -- at each x value -- a few equally spaced quantiles of the estimated proportions. Implicitly, smooth.spline expects to be fitting a mean curve to data that has constant variance, so you might also consider reweighting to approximate this, as well. url:www.econ.uiuc.edu/~rogerRoger Koenker email[EMAIL PROTECTED]Department of Economics vox: 217-333-4558University of Illinois fax: 217-244-6678Champaign, IL 61820 On Aug 10, 2007, at 10:23 AM, Rose Hoberman wrote: Sorry, forgot to attach the graph. On 8/10/07, Rose Hoberman [EMAIL PROTECTED] wrote: I am looking for a function that can fit a smooth function to a vector of estimated proportions, such that the smoothed value is within specified confidence bounds of each proportion. In other words, given a small number of trials and large confidence intervals, I would prefer the function to vary smoothly, but given a large number of trials and small confidence intervals, I would prefer the function to lie within the confidence intervals, even if it is not smooth. I have attached a postscript file illustrating a data set I would like to smooth. As the figure shows, for large values of x, I have few data points, and so the ML estimate of the proportion varies widely, and the confidence intervals are very large. When I use the smooth.spline function with a large value of spar (the red line), the function is not as smooth as desired for large values of x. When I use a smaller value of spar (the green line), the function fails to stay within the confidence bounds of the proportions. Is there a smoothing function for which I can specify upper and lower limits for the y value for specific values of x? Thanks for any suggestions, Rose smoothProportions.ps __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a question on lda{MASS}
hi, maybe I should re-phrase my question a bit: is there a way to get explicit formulae like Y ~ sum of CiXi from the model build by lda{MASS} to calculate $x (value) ? I assume scaling is the coeff and Xi is from test data and Y is $x called LD1. But I want to confirm this. Thanks. Weiwei On 8/9/07, Weiwei Shi [EMAIL PROTECTED] wrote: hi, assume val is the test data while m is lda model value by using CV=F x = predict(m, val) val2 = val[, 1:(ncol(val)-1)] # the last column is class label # col is sample, row is variable then I am wondering if x$x == (apply(val2*m$scaling), 2, sum) i.e., the scaling (is it coeff vector?) times val data and sum is the discrimant result $x? Thanks. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] smoothing function for proportions
Sorry, forgot to attach the graph. On 8/10/07, Rose Hoberman [EMAIL PROTECTED] wrote: I am looking for a function that can fit a smooth function to a vector of estimated proportions, such that the smoothed value is within specified confidence bounds of each proportion. In other words, given a small number of trials and large confidence intervals, I would prefer the function to vary smoothly, but given a large number of trials and small confidence intervals, I would prefer the function to lie within the confidence intervals, even if it is not smooth. I have attached a postscript file illustrating a data set I would like to smooth. As the figure shows, for large values of x, I have few data points, and so the ML estimate of the proportion varies widely, and the confidence intervals are very large. When I use the smooth.spline function with a large value of spar (the red line), the function is not as smooth as desired for large values of x. When I use a smaller value of spar (the green line), the function fails to stay within the confidence bounds of the proportions. Is there a smoothing function for which I can specify upper and lower limits for the y value for specific values of x? Thanks for any suggestions, Rose smoothProportions.ps Description: PostScript document __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help wit matrices
Is this what you want: x - matrix(runif(100), 10) round(x, 3) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0.268 0.961 0.262 0.347 0.306 0.762 0.524 0.062 0.028 0.226 [2,] 0.219 0.100 0.165 0.131 0.578 0.933 0.317 0.109 0.527 0.131 [3,] 0.517 0.763 0.322 0.374 0.910 0.471 0.278 0.382 0.880 0.982 [4,] 0.269 0.948 0.510 0.631 0.143 0.604 0.788 0.169 0.373 0.327 [5,] 0.181 0.819 0.924 0.390 0.415 0.485 0.702 0.299 0.048 0.507 [6,] 0.519 0.308 0.511 0.690 0.211 0.109 0.165 0.192 0.139 0.681 [7,] 0.563 0.650 0.258 0.689 0.429 0.248 0.064 0.257 0.321 0.099 [8,] 0.129 0.953 0.046 0.555 0.133 0.499 0.755 0.181 0.155 0.119 [9,] 0.256 0.954 0.418 0.430 0.460 0.373 0.620 0.477 0.132 0.050 [10,] 0.718 0.340 0.854 0.453 0.943 0.935 0.170 0.771 0.221 0.929 ifelse(x .5, 1, 0) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,]010001100 0 [2,]000011001 0 [3,]110010001 1 [4,]011101100 0 [5,]011000100 1 [6,]101100000 1 [7,]110100000 0 [8,]010100100 0 [9,]010000100 0 [10,]101011010 1 On 8/10/07, Lanre Okusanya [EMAIL PROTECTED] wrote: Hello all, I am working with a 1000x1000 matrix, and I would like to return a 1000x1000 matrix that tells me which value in the matrix is greater than a theshold value (1 or 0 indicator). i have tried mat2-as.matrix(as.numeric(mat10.25)) but that returns a 1:10 matrix. I have also tried for loops, but they are grossly inefficient. THanks for all your help in advance. Lanre __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with counting how many times each value occur in each column
Try this where we have constructed the example to illustrate that it does handle the case where not all values are in each column: mat - matrix(rep(1:6, each = 4), 6) table(col(mat), mat) On 8/10/07, Tom Cohen [EMAIL PROTECTED] wrote: Dear list, I have the following dataset and want to know how many times each value occur in each column. data [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] -100 -100 -100000000 -100 [2,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [3,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [4,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [5,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -50 [6,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [7,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [8,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [9,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [10,] -100 -100 -100 -50 -100 -100 -100 -100 -100 -100 [11,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [12,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [13,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [14,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [15,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [16,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [17,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [18,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [19,] -100 -100 -100000000 -100 [20,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 The result matrix should look like -100 0 -50 [1] 20 [2] 20 [3] 20 [4] 17 [5] 18 [6] 18 [7] 18 and so on [8] [9] [10] How can I do this in R ? Thanks alot for your help, Tom - Jämför pris på flygbiljetter och hotellrum: http://shopping.yahoo.se/c-169901-resor-biljetter.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Positioning text in top left corner of plot
Jim Lemon wrote: Daniel Brewer wrote: Thanks for the replies, but I still cannot get what I want. I do not want the label inside the plot area, but in the top left of the paper, I suppose in the margins. When I try to use text to do this, it does not seem to plot it outside the plot area. I have also tried to use mtext, but that does not really cut it, as I cannot get the label in the correct position. Ideally, it would be best if I could use legend but have it outside the plot area. Any ideas? Hi Dan, Try this: plot(1:5) par(xpd=TRUE) text(0.5,5.5,Outside) par(xpd=FALSE) Jim Here is what I used in the end: par(xpd=T) text(-0.15*(par(usr)[2]-par(usr)[1]),par(usr)[4]+0.14*(par(usr)[4]-par(usr)[3]),labels[i],cex=1.5) par(xpd=F) Ans that worked a treat. Thanks Dan -- ** Daniel Brewer, Ph.D. Institute of Cancer Research Email: [EMAIL PROTECTED] ** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addre...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot legend in margin
Daniel Brewer wrote: Hi all, Another plotting question I am afraid. Is there anyway of putting a legend for a plot in a margin rather than within the figure. I am trying to plot a 3x2 plot and I want to have: 1) One key along the bottom for all the plots 2) A label (a,b,c) for each plot (see previous emails) Is there any websites etc. that explain this sort of thing? Please read the posting guide. After that, type: RSiteSearch(legend margin) Currently, the fourth entry shows a solution: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/67979.html Uwe Ligges Dan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot legend in margin
Thanks. That got me onto the right track. Because it is a multiplot and I wanted it along the bottom, I found that I had to use par(xpd=NA) and then position it relative to the last of the multiplots. After a bit of trial and error I got there. Thanks Lauri Nikkinen wrote: Very simple example: opar - par(mar = c(10, 4, 4, 4)) plot(1:10) lines(1:10) par(xpd=TRUE) legend(4,-1.5,lty=1, col=black, legend=straigh line) par(opar) -Lauri -- ** Daniel Brewer, Ph.D. Institute of Cancer Research Email: [EMAIL PROTECTED] ** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addre...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plot legend in margin
Hi all, Another plotting question I am afraid. Is there anyway of putting a legend for a plot in a margin rather than within the figure. I am trying to plot a 3x2 plot and I want to have: 1) One key along the bottom for all the plots 2) A label (a,b,c) for each plot (see previous emails) Is there any websites etc. that explain this sort of thing? Dan -- ** Daniel Brewer, Ph.D. Institute of Cancer Research Email: [EMAIL PROTECTED] ** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addre...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multivariate lme or lmer?
How can we get variance/covariance components in a linear model with random effects when the response is multivariate? e.g. variance components estimates are obtained through lme or lmer in the univariate case but these functions do not seem to extend to the multivariate case. I'd like to estimate covariance components within or between levels of factors in a general case. Sorry for this basic question and thank you for the help. Francois __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subject: Re: how to include bar values in a barplot?
Gabor Grothendieck wrote: You could put the numbers inside the bars in which case it would not add to the height of the bar: x - 1:5 names(x) - letters[1:5] bp - barplot(x) text(bp, x - .02 * diff(par(usr)[3:4]), x) Indeed, the boxed.labels function makes this pretty easy. boxed.labels(bp,x-0.2*diff(par(usr)[3:4]),x) gives you the labels in a little white rectangle so that none are invisible. I also greatly enjoyed Ted's rebuttal of the Bar charts are evil and must be banned argument. If bar charts are appropriate for the audience, give 'em bar charts. One great way to turn off your customers is to tell them what they can and can't do with your product. Jim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Combining two ANOVA outputs of different lengths
Dear R users, I have been trying to combine two anova outputs into one single table (for later publication). The outputs are of different length, and share only some common explanatory variables. Using merge() or melt() (from the reshape package) did not work out. Here are the model outputs and what I would like to have: anova(model1) numDF denDF F-value p-value (Intercept) 174 0.063446 0.8018 days174 6.613997 0.0121 logdiv 174 1.587983 0.2116 leg 174 4.425843 0.0388 anova(model2) numDF denDF F-value p-value (Intercept) 173 165.94569 .0001 funcgr 173 7.91999 0.0063 grass173 42.16909 .0001 leg 173 4.72108 0.0330 funcgr:grass 173 8.49068 0.0047 #merge(anova(model1),anova(model2),...) F-value 1 p-val1 F-value 2 p-value 2 (Intercept) 0.0634460.8018 165.94569 .0001 days6.6139970.0121 NA NA logdiv 1.5879830.2116 NA NA leg 4.4258430.0388 4.72108 0.033 funcgr NA NA 7.91999 0.0063 grass NA NA 42.16909.0001 funcgr:grassNA NA 8.49068 0.0047 I would be glad if someone would have an idea of how to do this in principle. I am using R 2.5.1 on Windows XP. Thanks very much in advance! Best wishes Christoph -- Dr. Christoph Scherber DNPW, Agroecology University of Goettingen Waldweg 26 D-37073 Goettingen Germany phone +49(0)551 39 8807 fax +49(0)551 39 8806 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Seasonality
Hello Alberto, hello Felix, aside of monthplot() and stl(), there is the possibility to use Census X-12-ARIMA. The program can be downloaded from: http://www.census.gov/srd/www/x12a/ It should be mentioned that this is *not* a pure R solution, but one can set up the relevant scripts and output files and call the program from R and read in the relevant numbers back into R again. Best, Bernhard ?monthplot ?stl On 8/10/07, Alberto Monteiro [EMAIL PROTECTED] wrote: I have a time series x = f(t), where t is taken for each month. What is the best function to detect if _x_ has a seasonal variation? If there is such seasonal effect, what is the best function to estimate it? Function arima has a seasonal parameter, but I guess this is too complex to be useful. Alberto Monteiro __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Felix Andrews / 安福立 PhD candidate Integrated Catchment Assessment and Management Centre The Fenner School of Environment and Society The Australian National University (Building 48A), ACT 0200 Beijing Bag, Locked Bag 40, Kingston ACT 2604 http://www.neurofractal.org/felix/ voice:+86_1051404394 (in China) mobile:+86_13522529265 (in China) mobile:+61_410400963 (in Australia) xmpp:[EMAIL PROTECTED] 3358 543D AAC6 22C2 D336 80D9 360B 72DD 3E4C F5D8 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. * Confidentiality Note: The information contained in this mess...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ordering a data.frame by average rank of multiple columns
Hi I have run into a problem and i wonder if anyone has a smart way of doing this. For example i have this data frame for 5 different test groups: Res1 - c(1,5,4,-0.5,3) Res2 - c(-1,8,2,0,3) Mean - c(0.5,1,1.5,-.5,2) MyFrame - data.frame(Res1,Res2,Mean,row.names=c(G1,G2,G3,G4,G5)) where the first two columns are the results of two different tests, the third column is the mean of the group. I want to order this data.frame by the combined rank of Res1 Res2, but where weigths are assigned to the importeance av each column. Lets assume that Res1 is twice as important and lower values rank better. MyRanks-data.frame(Rank1=rank(MyFrame[,Res1]),Rank2=rank(MyFrame[,Res2]),CombR=2*rank(MyFrame[,Res1])+rank(MyFrame[,Res2]),row.names=c(G1,G2,G3,G4,G5)) Rank1 Rank2 CombR G1 2 1 5 G2 5 515 G3 4 311 G4 1 2 4 G5 3 410 and the rank of the combined is 2,5,4,1,3 , but to be able to sort MyFrame in that order I need to enter this vector of positions c(4,1,5,3,2) but do anyone have a smart way of converting ranks to positions? Tom -- View this message in context: http://www.nabble.com/ordering-a-data.frame-by-average-rank-of-multiple-columns-tf4247393.html#a12087498 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory
I don't understand why one would run a 64-bit version of R on a 2GB server, especially if one were worried about object size. You can run 32-bit versions of R on x86_64 Linux (see the R-admin manual for a comprehensive discussion), and most other 64-bit OSes default to 32-bit executables. Since most OSes limit 32-bit executables to around 3GB of address space, there starts to become a case for 64-bit executables at 4GB RAM but not much case at 2GB. It was my intention when providing the infrastructure for it that Linux binary distributions on x86_64 would provide both 32-bit and 64-bit executables, but that has not happened. It would be possible to install ix86 builds on x86_64 if -m32 was part of the ix86 compiler specification and the dependency checks would notice they needed 32-bit libraries. (I've had trouble with the latter on FC5: an X11 update removed all my 32-bit X11 RPMs.) On Fri, 10 Aug 2007, Michael Cassin wrote: Thanks for all the comments, The artificial dataset is as representative of my 440MB file as I could design. I did my best to reduce the complexity of my problem to minimal reproducible code as suggested in the posting guidelines. Having searched the archives, I was happy to find that the topic had been covered, where Prof Ripley suggested that the I/O manuals gave some advice. However, I was unable to get anywhere with the I/O manuals advice. I spent 6 hours preparing my post to R-help. Sorry not to have read the 'R-Internals' manual. I just wanted to know if I could use scan() more efficiently. My hurdle seems nothing to do with efficiently calling scan() . I suspect the same is true for the originator of this memory experiment thread. It is the overhead of storing short strings, as Charles identified and Brian explained. I appreciate the investigation and clarification you both have made. 56B overhead for a 2 character string seems extreme to me, but I'm not complaining. I really like R, and being free, accept that it-is-what-it-is. Well, there are only about 5 2-char strings in an 8-bit locale, so this does seem a case for using factors (as has been pointed out several times). And BTW, it is not 56B overhead, but 56B total for up to 7 chars. In my case pre-processing is not an option, it is not a one off problem with a particular file. In my application, R is run in batch mode as part of a tool chain for arbitrary csv files. Having found cases where memory usage was as high as 20x file size, and allowing for a copy of the the loaded dataset, I'll just need to document that it is possible that files as small as 1/40th of system memory may consume it all. That rules out some important datasets (US Census, UK Office of National Statistics files, etc) for 2GB servers. Regards, Mike On 8/9/07, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Thu, 9 Aug 2007, Charles C. Berry wrote: On Thu, 9 Aug 2007, Michael Cassin wrote: I really appreciate the advice and this database solution will be useful to me for other problems, but in this case I need to address the specific problem of scan and read.* using so much memory. Is this expected behaviour? Yes, and documented in the 'R Internals' manual. That is basic reading for people wishing to comment on efficiency issues in R. Can the memory usage be explained, and can it be made more efficient? For what it's worth, I'd be glad to try to help if the code for scan is considered to be worth reviewing. Mike, This does not seem to be an issue with scan() per se. Notice the difference in size of big2, big3, and bigThree here: big2 - rep(letters,length=1e6) object.size(big2)/1e6 [1] 4.000856 big3 - paste(big2,big2,sep='') object.size(big3)/1e6 [1] 36.2 On a 32-bit computer every R object has an overhead of 24 or 28 bytes. Character strings are R objects, but in some functions such as rep (and scan for up to 10,000 distinct strings) the objects can be shared. More string objects will be shared in 2.6.0 (but factors are designed to be efficient at storing character vectors with few values). On a 64-bit computer the overhead is usually double. So I would expect just over 56 bytes/string for distinct short strings (and that is what big3 gives). But 56Mb is really not very much (tiny on a 64-bit computer), and 1 million items is a lot. [...] -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK
Re: [R] how to include bar values in a barplot?
Quoting Greg Snow [EMAIL PROTECTED]: My original intent was to get the original posters out of the mode of thinking they want to match what the spreadsheet does and into thinking about what message they are trying to get across. To get them (and possibly others) thinking I made the statements a bit more bold than my actual position (I did include a couple of qualifiers). As an original poster (and a brand new user of R), I would like to comment on the educational experience I have just received. ;) The discussion was interesting and enlightening, and gives some good ideas about the ways (tables, graphs, graphs with numbers etc.) to get the data accross to the ones one is presenting to. I see some of you guys do feel quite strongly about it, which is fine for me. I do not. I usually care for barplot aesthetics and informativeness more than for visual simplicity. That may change in time :) I see R graphical capabilities are huge but hard to access at times - that is when spreadsheet seems preferrable. For example, as a user of Linux I still cannot figure out why the fonts (and graphics in general) look much more ugly on R in Linux than they do in R on Windows - no smoothing, sub-pixell hinting, anything like that. That is what my next free time homework on R will be about :) Sincerely Donatas Glodenis PhD candidate Department of Sociology of the Faculty of Philosophy Vilnius University Lithuania __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need help to manipulate function and time interval
Hi, Try whit: if(time[j] = 18:00:00 23:59:59) ... ... -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O On 10/08/07, KOITA Lassana - STAC/ACE [EMAIL PROTECTED] wrote: Hi R-users, I have to define a noise level function L and its energy in the various moment of the day by: if time is between 18:00:00 and 23:59:59 then L[j] - L[j]+5 and W - 10^((L+5)/10) if time is between 22:00:00 and 05:59:59 == L - L+10 and W - 10^((L+10)/10) else L=L and W = W Could someone help me to realize this function please? You will find my following proposal code, but my main problem is to handle the time interval. Best regard ### myfunc - function(mytab, Time, Level) { vect - rep(0, length(mytab)) for(i in 1:length(vect)) { for(j in 1:length(Time)) if(time[j] is between 18:00:00 and 23:59:59) L[i] - L[j]+5 vect[i] - 10^((L[i])/10 if (time[j] is between 22:00:00 and 05:59:59) L[i] - L[j]+10 vect[i] - 10^((L[i])/10 else L[i] = L[j] vect[i] - 10^((L[i])/10 } } ### Lassana KOITA Chargé d'Etudes de Sécurité Aéroportuaire et d'Analyse Statistique / Project Engineer Airport Safety Studies Statistical analysis Service Technique de l'Aviation Civile (STAC) / Civil Aviation Technical Department Direction Générale de l'Aviation Civile (DGAC) / French Civil Aviation Headquarters Tel: 01 49 56 80 60 Fax: 01 49 56 82 14 E-mail: [EMAIL PROTECTED] http://www.stac.aviation-civile.gouv.fr/ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up the memory
On Fri, 10 Aug 2007, Monica Pisica wrote: Hi, I have 4 huge tables on which i want to do a PCA analysis and a kmean clustering. If i run each table individually i have no problems, but if i want to run it in a for loop i exceed the memory alocation after the second table, even if i save the results as a csv table and i clean up all the big objects with rm command. To me it seems that even if i don't have the objects anymore, the memory these objects used to occupy is not cleared. Is there any way to clear up the memory as well? I don't want to close R and start it up again. Also i am running R under Windows. See ?gc, which does the clearing. However, unless you study the memory allocation in detail (which you cannot do from R code), you don't actually know that this is the problem. More likely is that you have fragmentation of your 32-bit address space: see ?Memory-limits. Without any idea what memory you have and what 'huge' means, we can only make wild guesses. It might be worth raising the memory limit (the --max-mem-size flag). thanks, Monica _ [[trailing spam removed]] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [Fwd: Re: How to apply functions over rows of multiple matrices]
[Apologies to Gabor, who I sent a personal copy of the reply erroneously instead of posting to List directly] [...] Perhaps what you really intend is to take the average over those elements in each row of the first matrix which correspond to 1's in the second in the corresponding row of the second. In that case its just: rowSums(newtest * goldstandard) / rowSums(goldstandard) Thank you for clearing my thoughts about the particular example. My question was a bit more general though, as I have different functions which are applied row-wise to multiple matrices. An example that sets all values of a row of matrix A to NA after the first occurrence of TRUE in matrix B. fillfrom - function(applvec, testvec=NULL) { if (is.null(testvec)) testvec - applvec if (length(testvec) != length(applvec)) { stop(applvec and testvec have to be of same length!) } else if(any(testvec, na.rm=TRUE)) { applvec[min(which(testvec)) : length(applvec)] - NA } applvec } fillafter - function(applvec, testvec=NULL) { if (is.null(testvec)) testvec - applvec fillfrom(applvec, c(FALSE, testvec[-length(testvec)])) } numtest - 6 numsubj - 20 newtest - array(rbinom(numtest*numsubj, 1, .5), dim=c(numsubj, numtest)) goldstandard - array(rbinom(numtest*numsubj, 1, .5), dim=c(numsubj, numtest)) newtest.NA - t(sapply(1:nrow(newtest), function(i) { fillafter(newtest[i,], goldstandard[i,]==1)})) My general question is if R provides some syntactic sugar for the awkward sapply(1:nrow(A)) expression. Maybe in this case there is also a way to bypass the apply mechanism and my way of thinking about the problem has to be adapted. But as the *apply calls are galore in R, I feel this is a standard way of dealing with vectors and matrices. -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Positioning text in top left corner of plot
Thanks for the replies, but I still cannot get what I want. I do not want the label inside the plot area, but in the top left of the paper, I suppose in the margins. When I try to use text to do this, it does not seem to plot it outside the plot area. I have also tried to use mtext, but that does not really cut it, as I cannot get the label in the correct position. Ideally, it would be best if I could use legend but have it outside the plot area. Any ideas? Thanks Benilton Carvalho wrote: maybe this is what you want? plot(rnorm(10)) legend(topleft, A), bty=n) ? b On Aug 7, 2007, at 11:08 AM, Daniel Brewer wrote: Simple question how can you position text in the top left hand corner of a plot? I am plotting multiple plots using par(mfrow=c(2,3)) and all I want to do is label these plots a), b), c) etc. I have been fiddling around with both text and mtext but without much luck. text is fine but each plot has a different scale on the axis and so this makes it problematic. What is the best way to do this? Many thanks Dan -- ** Daniel Brewer, Ph.D. Institute of Cancer Research Email: [EMAIL PROTECTED] ** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addre...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reading xcms files
Hi, I am using xcms library to read mass spectrum data. I generate objects from CDF files using the command line SME10 - xcmsRaw(SME_10.CDF) I have 50 CDF files with different name and I don't want to repeat the command for each one. Is there any option to read all the files and generate a corresponding object name? In advance thank you Roberto [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combining two ANOVA outputs of different lengths
Christoph Scherber wrote: Dear R users, I have been trying to combine two anova outputs into one single table (for later publication). The outputs are of different length, and share only some common explanatory variables. Using merge() or melt() (from the reshape package) did not work out. Here are the model outputs and what I would like to have: anova(model1) numDF denDF F-value p-value (Intercept) 174 0.063446 0.8018 days 174 6.613997 0.0121 logdiv 174 1.587983 0.2116 leg 174 4.425843 0.0388 anova(model2) numDF denDF F-value p-value (Intercept) 173 165.94569 .0001 funcgr 173 7.91999 0.0063 grass173 42.16909 .0001 leg 173 4.72108 0.0330 funcgr:grass 173 8.49068 0.0047 #merge(anova(model1),anova(model2),...) F-value 1 p-val1 F-value 2 p-value 2 (Intercept) 0.0634460.8018 165.94569 .0001 days 6.6139970.0121 NA NA logdiv1.5879830.2116 NA NA leg 4.4258430.0388 4.72108 0.033 funcgrNA NA 7.91999 0.0063 grass NA NA 42.16909.0001 funcgr:grass NA NA 8.49068 0.0047 I would be glad if someone would have an idea of how to do this in principle. The main problems are that the merge key is the rownames and that you want to keep entries that are missing in one of the analysis. There are ways to deal with that: example(anova.lm) . merge(anova(fit2), anova(fit4), by=0, all=T) Row.names Df.x Sum Sq.x Mean Sq.x F value.xPr(F).x Df.y Sum Sq.y 1 ddpi NANANANA NA1 63.05403 2 dpi NANANANA NA1 12.40095 3 pop151 204.11757 204.11757 13.211166 0.0006878681 204.11757 4 pop751 53.34271 53.34271 3.452517 0.0694253851 53.34271 5 Residuals 47 726.16797 15.45038NA NA 45 650.71300 Mean Sq.y F value.y Pr(F).y 1 63.05403 4.3604959 0.0424711387 2 12.40095 0.8575863 0.3593550848 3 204.11757 14.1157322 0.0004921955 4 53.34271 3.6889104 0.0611254598 5 14.46029 NA NA Presumably, you can take it from here. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Re : compute ROC curve?
see ROCR or accuracy package. Justin BEM BP 1917 Yaoundé Tél (237) 99597295 (237) 22040246 - Message d'origine De : gallon li [EMAIL PROTECTED] À : r-help r-help@stat.math.ethz.ch Envoyé le : Vendredi, 10 Août 2007, 4h15mn 36s Objet : [R] compute ROC curve? Hello, i have continuous test results for dieased and nondiseased subjects, say X and Y. Both are vectors of numbers. is there any R function which can generate the step function of ROC curve automatically? Thanks! [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ ail ! [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading xcms files
On Fri, 10 Aug 2007, Roberto Olivares Hernandez wrote: Hi, I am using xcms library to read mass spectrum data. I generate objects from CDF files using the command line SME10 - xcmsRaw(SME_10.CDF) I have 50 CDF files with different name and I don't want to repeat the command for each one. Is there any option to read all the files and generate a corresponding object name? Something like for(f in Sys.glob(*.CDF)) assign(sub(\\.CDF$, , f), xcmsRaw(f)) (untested, of course). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need help to manipulate function and time interval
Henrique Dallazuanna wrote: Hi, Try whit: if(time[j] = 18:00:00 23:59:59) This code is obviously wrong and does not help for the next few lines in the questioner's message, please do not post unsensible stuff. Uwe Ligges ... ... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need help to manipulate function and time interval
KOITA Lassana - STAC/ACE wrote: Hi R-users, I have to define a noise level function L and its energy in the various moment of the day by: if time is between 18:00:00 and 23:59:59 then L[j] - L[j]+5 and W - 10^((L+5)/10) What kind of object is time? Just a character or some Time/Date format? Do you know the day? If time is between 18:00:00 and 23:59:59, should the next point (time is between 22:00:00 and 05:59:59) be executed additionally if time is, e.g., 23:00:00 or is there any other condition I cannot see? All the information is quite essential in order to help... BTW: I don't think the rest of your code is sensible (at least, some braces are missing). Uwe Ligges if time is between 22:00:00 and 05:59:59 == L - L+10 and W - 10^((L+10)/10) else L=L and W = W Could someone help me to realize this function please? You will find my following proposal code, but my main problem is to handle the time interval. Best regard ### myfunc - function(mytab, Time, Level) { vect - rep(0, length(mytab)) for(i in 1:length(vect)) { for(j in 1:length(Time)) if(time[j] is between 18:00:00 and 23:59:59) L[i] - L[j]+5 vect[i] - 10^((L[i])/10 if (time[j] is between 22:00:00 and 05:59:59) L[i] - L[j]+10 vect[i] - 10^((L[i])/10 else L[i] = L[j] vect[i] - 10^((L[i])/10 } } ### Lassana KOITA Chargé d'Etudes de Sécurité Aéroportuaire et d'Analyse Statistique / Project Engineer Airport Safety Studies Statistical analysis Service Technique de l'Aviation Civile (STAC) / Civil Aviation Technical Department Direction Générale de l'Aviation Civile (DGAC) / French Civil Aviation Headquarters Tel: 01 49 56 80 60 Fax: 01 49 56 82 14 E-mail: [EMAIL PROTECTED] http://www.stac.aviation-civile.gouv.fr/ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] smoothing function for proportions
I am looking for a function that can fit a smooth function to a vector of estimated proportions, such that the smoothed value is within specified confidence bounds of each proportion. In other words, given a small number of trials and large confidence intervals, I would prefer the function to vary smoothly, but given a large number of trials and small confidence intervals, I would prefer the function to lie within the confidence intervals, even if it is not smooth. I have attached a postscript file illustrating a data set I would like to smooth. As the figure shows, for large values of x, I have few data points, and so the ML estimate of the proportion varies widely, and the confidence intervals are very large. When I use the smooth.spline function with a large value of spar (the red line), the function is not as smooth as desired for large values of x. When I use a smaller value of spar (the green line), the function fails to stay within the confidence bounds of the proportions. Is there a smoothing function for which I can specify upper and lower limits for the y value for specific values of x? Thanks for any suggestions, Rose __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help wit matrices
mat2-matrix(as.numeric(mat10.25), ncol=1000) -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O On 10/08/07, Lanre Okusanya [EMAIL PROTECTED] wrote: Hello all, I am working with a 1000x1000 matrix, and I would like to return a 1000x1000 matrix that tells me which value in the matrix is greater than a theshold value (1 or 0 indicator). i have tried mat2-as.matrix(as.numeric(mat10.25)) but that returns a 1:10 matrix. I have also tried for loops, but they are grossly inefficient. THanks for all your help in advance. Lanre __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot legend in margin
Another couple of things to think about: You could use the layout function to set up your multiple plots and include an extra plotting area at the bottom to place the legend in. If you stick with the solution below then the cnvrt.coords function from the TeachingDemos package may be useful (will help you find the coordinates relative to the last plot). Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daniel Brewer Sent: Friday, August 10, 2007 4:55 AM To: Lauri Nikkinen; r-help@stat.math.ethz.ch Subject: Re: [R] Plot legend in margin Thanks. That got me onto the right track. Because it is a multiplot and I wanted it along the bottom, I found that I had to use par(xpd=NA) and then position it relative to the last of the multiplots. After a bit of trial and error I got there. Thanks Lauri Nikkinen wrote: Very simple example: opar - par(mar = c(10, 4, 4, 4)) plot(1:10) lines(1:10) par(xpd=TRUE) legend(4,-1.5,lty=1, col=black, legend=straigh line) par(opar) -Lauri -- ** Daniel Brewer, Ph.D. Institute of Cancer Research Email: [EMAIL PROTECTED] ** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the\ ...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Row name of empty string issue
I have a data.frame with rownames taken from a database. Unfortunately, one of the rownames (automatically obtained from the DB) is an empty string. I often do computations on the DB s.t. the answers (rows) are indexed with respect to the rownames so a computation on a DB record might necessitate the indexing into the data.frame by the emtpy row name. Unfortunately, that doesn't seem to work. Explicitly: I have this statement (in a loop going through sourceNames which are rownames of the data.frames CurrentRecordBlankFieldsCountSums and BlankFieldsCount): BlankFieldsCount[sourceNamei,]= BlankFieldsCount[sourceNamei,] + CurrentRecordBlankFieldsCountSums[sourceNamei ,]; if sourceNamei is any name other than it works fine but otherwise CurrentRecordBlankFieldsCountSums[sourceNamei ,] returns a bunch of NAs because apparently it didn't fine a row named . IMHO, if R lets you name a row , then it should let you index it with the name . Anyway, as further proof of the setup: rownames(CurrentRecordBlankFieldsCountSums)[1] [1] # So the first rowname of CurrentRecordBlankFieldsCountSums is an empty string CurrentRecordBlankFieldsCountSums[1 ,] IDCaseNumber Category SSN LastName FirstName 00 10 0 # So, the first row has some data (not just NAs as would be returned if that row didn't exist) But ff I index that same row using the rowname it doesn't find the row: CurrentRecordBlankFieldsCountSums[rownames( CurrentRecordBlankFieldsCountSums)[1] ,] IDCaseNumber Category SSN LastName NA NA NA NA # I get the same result if I do this:CurrentRecordBlankFieldsCountSums[ ,] As a sanity check: == rownames(CurrentRecordBlankFieldsCountSums)[1] [1] TRUE For other rows, (rownames that aren't , there's no problem): rownames(CurrentRecordBlankFieldsCountSums)[2] [1] FRED CurrentRecordBlankFieldsCountSums[rownames( CurrentRecordBlankFieldsCountSums)[2] ,] IDCaseNumber Category SSN LastName FirstName FRED00 00 -- View this message in context: http://www.nabble.com/Row-name-of-empty-string-issue-tf4250291.html#a12096455 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with counting how many times each value occur in each column
[Tom Cohen] I have the following dataset and want to know how many times each value occur in each column. data [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] -100 -100 -100000000 -100 [2,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [3,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [4,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [5,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -50 [6,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [7,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [8,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [9,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [10,] -100 -100 -100 -50 -100 -100 -100 -100 -100 -100 [11,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [12,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [13,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [14,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [15,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [16,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [17,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [18,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [19,] -100 -100 -100000000 -100 [20,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 The result matrix should look like -100 0 -50 [1] 20 [2] 20 [3] 20 [4] 17 [5] 18 [6] 18 [7] 18 and so on [8] [9] [10] Presuming that data is a matrix, one could try a sequence like this: dataf - factor(data) dim(dataf) - dim(data) result - t(apply(dataf, 2, tabulate, nlevels(dataf))) colnames(result) - levels(dataf) result If you want the columns sorted, you might decide the order of the levels on the factor() call, or explicitly reorder columns afterwards. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] kde2d error message
Hello! I am trying to do a smooth with the kde2d function, and I'm getting an error message about NAs. Does anyone have any suggestions? Does this function not do well with NAs in general? fit - kde2d(X, Y, n=100,lims=c(range(X),range(Y))) Error in if (from == to || length.out 2) by - 1 : missing value where TRUE/FALSE needed Thanks in advance!! Jen [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Fwd: Re: How to apply functions over rows of multiple matrices]
1. matrices are stored columnwise so R is better at column-wise operations than row-wise. I am seeing this by my code which contains more t() than what seems healthy. However, the summaries are patient-wise over repeated measurements. Out of convention, I am storing patients in rows and measurements in columns. 2. Here is one way to do it (although I am not sure its better than the index approach): row.apply - function(f, a, b) t(mapply(f, as.data.frame(t(a)), as.data.frame(t(b Ah, thank you so much. I'll take the generalization to N arguments à la mapply() as an exercise for the reader. 3. The code for the example in this post could be simplified to: first.1 - apply(cbind(goldstandard, 1), 1, which.max) ifelse(col(newtest) first.1, NA, newtest) Ouch! Consider this scholar slapped. 4. given that both examples did not inherently need row by row operations I wonder if that is the wrong generalization in the first place? Given that you managed to squeeze my 20 lines of code into 2 lines AND that row.apply() does not exist in base without many people missing it, I'll have to concede this point and eliminate the craving for row.apply() in favour of the whole-object approach. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] having problems with factor()
I am afraid the above example will not work. In original dataset of Jabez Wilson numerical range is from 0..7. So try this one: df-as.factor(c(0,I,II,III,IV,V,VI,VII)[df$area+1]) Hope this is what you want, Rainer Henrique Dallazuanna schrieb: Hi, df ht area 1 3203 2 4104 3 2302 4 3603 5 1261 6 2802 7 2602 8 2802 9 2802 10 2602 df$area - as.factor(df$area) levels(df$area) - c(I, II, III, IV) On 10/08/07, Jabez Wilson [EMAIL PROTECTED] wrote: Dear R Help, I have a set of data of heights of trees described by area that they are in. The areas are numerical (0 to 7). htarea 1 320 3 2 410 4 3 230 2 4 360 3 5 126 1 6 280 2 7 260 2 8 280 2 9 280 2 10 260 2 ... 180 450 4 181 90 1 182 120 1 183 440 4 184 210 2 185 330 3 186 210 2 187 100 1 188 0 0 I want to convert the area column values to factors, to do an anova. However, if I use: df$areaf - factor(df$area, labels=c(0,I,II,III,IV,V,VI,VII)) it gives the following message: Error in factor(df$area, labels = c(0, I, II, III, IV, V, VI, : invalid labels; length 8 should be 1 or 7 Can anyone help? Jabez __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] having problems with factor()
Hi, df ht area 1 3203 2 4104 3 2302 4 3603 5 1261 6 2802 7 2602 8 2802 9 2802 10 2602 df$area - as.factor(df$area) levels(df$area) - c(I, II, III, IV) -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O On 10/08/07, Jabez Wilson [EMAIL PROTECTED] wrote: Dear R Help, I have a set of data of heights of trees described by area that they are in. The areas are numerical (0 to 7). htarea 1 320 3 2 410 4 3 230 2 4 360 3 5 126 1 6 280 2 7 260 2 8 280 2 9 280 2 10 260 2 ... 180 450 4 181 90 1 182 120 1 183 440 4 184 210 2 185 330 3 186 210 2 187 100 1 188 0 0 I want to convert the area column values to factors, to do an anova. However, if I use: df$areaf - factor(df$area, labels=c(0,I,II,III,IV,V,VI,VII)) it gives the following message: Error in factor(df$area, labels = c(0, I, II, III, IV, V, VI, : invalid labels; length 8 should be 1 or 7 Can anyone help? Jabez ___ now. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] having problems with factor()
Dear R Help, I have a set of data of heights of trees described by area that they are in. The areas are numerical (0 to 7). htarea 1 320 3 2 410 4 3 230 2 4 360 3 5 126 1 6 280 2 7 260 2 8 280 2 9 280 2 10 260 2 ... 180 450 4 181 90 1 182 120 1 183 440 4 184 210 2 185 330 3 186 210 2 187 100 1 188 0 0 I want to convert the area column values to factors, to do an anova. However, if I use: df$areaf - factor(df$area, labels=c(0,I,II,III,IV,V,VI,VII)) it gives the following message: Error in factor(df$area, labels = c(0, I, II, III, IV, V, VI, : invalid labels; length 8 should be 1 or 7 Can anyone help? Jabez ___ now. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ordering a data.frame by average rank of multiple columns
Try this: positions - order(ranks) On 8/10/07, Tom.O [EMAIL PROTECTED] wrote: Hi I have run into a problem and i wonder if anyone has a smart way of doing this. For example i have this data frame for 5 different test groups: Res1 - c(1,5,4,-0.5,3) Res2 - c(-1,8,2,0,3) Mean - c(0.5,1,1.5,-.5,2) MyFrame - data.frame(Res1,Res2,Mean,row.names=c(G1,G2,G3,G4,G5)) where the first two columns are the results of two different tests, the third column is the mean of the group. I want to order this data.frame by the combined rank of Res1 Res2, but where weigths are assigned to the importeance av each column. Lets assume that Res1 is twice as important and lower values rank better. MyRanks-data.frame(Rank1=rank(MyFrame[,Res1]),Rank2=rank(MyFrame[,Res2]),CombR=2*rank(MyFrame[,Res1])+rank(MyFrame[,Res2]),row.names=c(G1,G2,G3,G4,G5)) Rank1 Rank2 CombR G1 2 1 5 G2 5 515 G3 4 311 G4 1 2 4 G5 3 410 and the rank of the combined is 2,5,4,1,3 , but to be able to sort MyFrame in that order I need to enter this vector of positions c(4,1,5,3,2) but do anyone have a smart way of converting ranks to positions? Tom -- View this message in context: http://www.nabble.com/ordering-a-data.frame-by-average-rank-of-multiple-columns-tf4247393.html#a12087498 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Positioning text in top left corner of plot
This works fine for one plot, but if it is a multiple plot (mfrow=c(2,2) say) then each individual label is placed in the same position i.e. absolute top left on the canvas. I would like it top left of each individual plot. Thanks anyway. Got any idea how to fix this? Dan Paul Murrell wrote: Hi Daniel Brewer wrote: Thanks for the replies, but I still cannot get what I want. I do not want the label inside the plot area, but in the top left of the paper, I suppose in the margins. When I try to use text to do this, it does not seem to plot it outside the plot area. I have also tried to use mtext, but that does not really cut it, as I cannot get the label in the correct position. Ideally, it would be best if I could use legend but have it outside the plot area. Any ideas? plot(1:10) library(grid) grid.text(What do we want? Text in the corner!\nWhere do we want it? Here!, x=unit(2, mm), y=unit(1, npc) - unit(2, mm), just=c(left, top)) Paul Thanks Benilton Carvalho wrote: maybe this is what you want? plot(rnorm(10)) legend(topleft, A), bty=n) ? b On Aug 7, 2007, at 11:08 AM, Daniel Brewer wrote: Simple question how can you position text in the top left hand corner of a plot? I am plotting multiple plots using par(mfrow=c(2,3)) and all I want to do is label these plots a), b), c) etc. I have been fiddling around with both text and mtext but without much luck. text is fine but each plot has a different scale on the axis and so this makes it problematic. What is the best way to do this? Many thanks Dan The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addre...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Positioning text in top left corner of plot
Thanks. That works if it is only a single plot, but if there are multiple plots (e.g. par(mfrow=c(2,2))) it confusingly puts the label in the absolute top left always i.e. the top left of plot one. Dan S Ellison wrote: Try something like mtext(side=3, line=-1, text=Here again?, adj=0, outer=T) This puts text just inside the top left corner. Jim Lemon [EMAIL PROTECTED] 10/08/2007 10:37:30 Daniel Brewer wrote: Thanks for the replies, but I still cannot get what I want. I do not want the label inside the plot area, but in the top left of the paper, I suppose in the margins. When I try to use text to do this, it does not seem to plot it outside the plot area. I have also tried to use mtext, but that does not really cut it, as I cannot get the label in the correct position. Ideally, it would be best if I could use legend but have it outside the plot area. Any ideas? Hi Dan, Try this: plot(1:5) par(xpd=TRUE) text(0.5,5.5,Outside) par(xpd=FALSE) Jim The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addre...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need help with pdf-plot
Dear Antje I cannot see that you have got any replies yet, so I will make and attempt. However, I am sure other have more formally correct solutions. When you call the pdf(), you can set paper=a4 (or a4r for landscape). However, the width and the height of your plot should then not exceed the size of the paper (which is approximately 8.27*11.69 inches for a4). Try (I have only tested on windows XP, R 2.5.0): pdf(test1.pdf, width=10, heigh=5, paper=a4r) par(mfrow=c(1,3), pty=s) #pty=s gives square plotting regions plot(rnorm(100)) plot(rnorm(100)) plot(rnorm(100)) dev.off() Hope this helps Ivar Antje skrev: I still have this problem. Does anybody know any solution? Antje Antje schrieb: Hello, I'm trying to plot a set of barplots like a matrix (2 rows, 10 columns fromreduced_mat) to a pdf. It works with the following parameters: pdf(test.pdf,width=ncol(reduced_mat)*2, height=nrow(reduced_mat)*2, pointsize = 12) par(mfcol = c(nrow(reduced_mat),ncol(reduced_mat)), oma = c(0,0,0,0), lwd=48/96, cex.axis = 0.5, las = 2, cex.main = 1.0) The I get a long narrow page format with the quadratic barplots. But I would like to have a A4 format in the end and the plots not filling the whole page (they should stay somehow quadratic and not be stretched...). What shall I look for to achieve this? Antje __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with counting how many times each value occur in each column
Dear list, I have the following dataset and want to know how many times each value occur in each column. data [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] -100 -100 -100000000 -100 [2,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [3,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [4,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [5,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -50 [6,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [7,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [8,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [9,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [10,] -100 -100 -100 -50 -100 -100 -100 -100 -100 -100 [11,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [12,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [13,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [14,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [15,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [16,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [17,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [18,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [19,] -100 -100 -100000000 -100 [20,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 The result matrix should look like -100 0 -50 [1] 20 [2] 20 [3] 20 [4] 17 [5] 18 [6] 18 [7] 18 and so on [8] [9] [10] How can I do this in R ? Thanks alot for your help, Tom - Jämför pris på flygbiljetter och hotellrum: http://shopping.yahoo.se/c-169901-resor-biljetter.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: having problems with factor()
Hi [EMAIL PROTECTED] napsal dne 10.08.2007 13:41:53: Dear R Help, I have a set of data of heights of trees described by area that they are in. The areas are numerical (0 to 7). htarea 1 320 3 2 410 4 3 230 2 4 360 3 5 126 1 6 280 2 7 260 2 8 280 2 9 280 2 10 260 2 ... 180 450 4 181 90 1 182 120 1 183 440 4 184 210 2 185 330 3 186 210 2 187 100 1 188 0 0 I want to convert the area column values to factors, to do an anova. However, if I use: df$areaf - factor(df$area, labels=c(0,I,II,III,IV,V,VI,VII)) it gives the following message: Hm, maybe some of the values are missing num-sample(1:3, 10, replace=T) num [1] 1 3 1 2 3 3 1 3 3 3 factor(num, labels=c(O, I, II)) [1] O II O I II II O II II II Levels: O I II factor(num, labels=c(O, I, II, III)) Error in factor(num, labels = c(O, I, II, III)) : invalid labels; length 4 should be 1 or 3 try table(df$area) to see what level you really have Regards Petr Error in factor(df$area, labels = c(0, I, II, III, IV, V, VI, : invalid labels; length 8 should be 1 or 7 Can anyone help? Jabez ___ now. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Remove redundant observations for cross-validation
Hi, This is a general statistics question that I believe occurs often so may have some R functions/packages dedicated to it. Suppose you want to check the accuracy of a classifier using a large training data-set where each row represents an observation. Is there a simple approach for removing redundant rows (rows with very similar values for all columns) from the training data so as to obtain a realistic classification performance upon x-validation? The only one I can think of is clustering the data into an arbitary number of clusters and selecting one observation from each cluster. e.g library(cluster) x - rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)), cbind(rnorm(10,5,2.5), rnorm(15,5,2.5)), cbind(rnorm(10,15,0.5), rnorm(15,15,0.5)), cbind(rnorm(5,5,0.1), rnorm(5,5,0.1))) pamx - pam(x, 15) y=array(NA, dim=c(15,ncol(x))) for(i in 1:15){ y[i,]=x[sample(which(pamx$clustering==i), 1),] } This seems a bit subjective though... Any better ideas? Eleni Rapsomaniki __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Fwd: Re: How to apply functions over rows of multiple matrices]
1. matrices are stored columnwise so R is better at column-wise operations than row-wise. 2. Here is one way to do it (although I am not sure its better than the index approach): row.apply - function(f, a, b) t(mapply(f, as.data.frame(t(a)), as.data.frame(t(b 3. The code for the example in this post could be simplified to: first.1 - apply(cbind(goldstandard, 1), 1, which.max) ifelse(col(newtest) first.1, NA, newtest) 4. given that both examples did not inherently need row by row operations I wonder if that is the wrong generalization in the first place? On 8/10/07, Johannes Hüsing [EMAIL PROTECTED] wrote: [Apologies to Gabor, who I sent a personal copy of the reply erroneously instead of posting to List directly] [...] Perhaps what you really intend is to take the average over those elements in each row of the first matrix which correspond to 1's in the second in the corresponding row of the second. In that case its just: rowSums(newtest * goldstandard) / rowSums(goldstandard) Thank you for clearing my thoughts about the particular example. My question was a bit more general though, as I have different functions which are applied row-wise to multiple matrices. An example that sets all values of a row of matrix A to NA after the first occurrence of TRUE in matrix B. fillfrom - function(applvec, testvec=NULL) { if (is.null(testvec)) testvec - applvec if (length(testvec) != length(applvec)) { stop(applvec and testvec have to be of same length!) } else if(any(testvec, na.rm=TRUE)) { applvec[min(which(testvec)) : length(applvec)] - NA } applvec } fillafter - function(applvec, testvec=NULL) { if (is.null(testvec)) testvec - applvec fillfrom(applvec, c(FALSE, testvec[-length(testvec)])) } numtest - 6 numsubj - 20 newtest - array(rbinom(numtest*numsubj, 1, .5), dim=c(numsubj, numtest)) goldstandard - array(rbinom(numtest*numsubj, 1, .5), dim=c(numsubj, numtest)) newtest.NA - t(sapply(1:nrow(newtest), function(i) { fillafter(newtest[i,], goldstandard[i,]==1)})) My general question is if R provides some syntactic sugar for the awkward sapply(1:nrow(A)) expression. Maybe in this case there is also a way to bypass the apply mechanism and my way of thinking about the problem has to be adapted. But as the *apply calls are galore in R, I feel this is a standard way of dealing with vectors and matrices. -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: help with counting how many times each value occur in each column
Hi mat-sample(c(-50,0,-100), 100,replace=T) dim(mat)-c(10,10) mat [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,]0000 -500000 0 [2,] -100 -100 -50 -5000 -100 -50 -100 -50 [3,]0 -50 -100 -1000 -50 -10000 -100 [4,]0 -1000 -50 -100 -100 -50 -500 -100 [5,] -50 -50000 -100 -100 -1000 -100 [6,]00 -50 -5000 -100 -100 -50 -100 [7,] -100 -100 -100 -50 -1000 -100 -1000 -100 [8,] -1000000 -1000 -1000 -100 [9,] -1000 -50 -100 -5000 -500 -100 [10,] -50 -10000 -50 -50 -50 -50 -100 -100 apply(mat, 2, table) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] -100442223542 8 -50 223432241 1 0 445455327 1 Transposing and ordering columns is up to you. Regards Petr [EMAIL PROTECTED] napsal dne 10.08.2007 14:01:44: Dear list, I have the following dataset and want to know how many times each value occur in each column. data [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] -100 -100 -100000000 -100 [2,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [3,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [4,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [5,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -50 [6,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [7,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [8,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [9,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [10,] -100 -100 -100 -50 -100 -100 -100 -100 -100 -100 [11,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [12,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [13,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [14,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [15,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [16,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [17,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [18,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [19,] -100 -100 -100000000 -100 [20,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 The result matrix should look like -100 0 -50 [1] 20 [2] 20 [3] 20 [4] 17 [5] 18 [6] 18 [7] 18 and so on [8] [9] [10] How can I do this in R ? Thanks alot for your help, Tom - Jämför pris pĺ flygbiljetter och hotellrum: http://shopping.yahoo.se/c-169901- resor-biljetter.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up the memory
Thanks! I will look into ... I have 4 GB RAM, and i was monitoring the memory with Windows task manager so i was looking how R gets more and more memory allocation from less than 100Mb to 1500Mb . My initial tables are between 30 to 80 Mb and the resulting tables that incorporate the initial tables plus PCA and kmeans results are inbetween 50 to 200MB or thereabouts! And yes, i don't really care about memory allocation in detail - what i want is to free that memory after every cycle ;-) Although, after i didn't do anything in R and it was idle for more than 30 min. the memory allocation according to Task manager dropped to 15 Mb . which is good - but i cannot wait inbetween cycles half an hour though . Again thanks, Monica Date: Fri, 10 Aug 2007 18:28:07 +0100 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] CC: r-help@stat.math.ethz.ch Subject: Re: [R] Cleaning up the memory On Fri, 10 Aug 2007, Monica Pisica wrote:Hi, I have 4 huge tables on which i want to do a PCA analysis and a kmean clustering. If i run each table individually i have no problems, but if i want to run it in a for loop i exceed the memory alocation after the second table, even if i save the results as a csv table and i clean up all the big objects with rm command. To me it seems that even if i don't have the objects anymore, the memory these objects used to occupy is not cleared. Is there any way to clear up the memory as well? I don't want to close R and start it up again. Also i am running R under Windows. See ?gc, which does the clearing. However, unless you study the memory allocation in detail (which you cannot do from R code), you don't actually know that this is the problem. More likely is that you have fragmentation of your 32-bit address space: see ?Memory-limits. Without any idea what memory you have and what 'huge' means, we can only make wild guesses. It might be worth raising the memory limit (the --max-mem-size flag).thanks, Monica _ [[trailing spam removed]] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 _ Messenger Café open for fun 24/7. Hot games, cool activities served daily. Visit now. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help wit matrices
that was ridiculously simple. duh. THanks Lanre On 8/10/07, jim holtman [EMAIL PROTECTED] wrote: Is this what you want: x - matrix(runif(100), 10) round(x, 3) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0.268 0.961 0.262 0.347 0.306 0.762 0.524 0.062 0.028 0.226 [2,] 0.219 0.100 0.165 0.131 0.578 0.933 0.317 0.109 0.527 0.131 [3,] 0.517 0.763 0.322 0.374 0.910 0.471 0.278 0.382 0.880 0.982 [4,] 0.269 0.948 0.510 0.631 0.143 0.604 0.788 0.169 0.373 0.327 [5,] 0.181 0.819 0.924 0.390 0.415 0.485 0.702 0.299 0.048 0.507 [6,] 0.519 0.308 0.511 0.690 0.211 0.109 0.165 0.192 0.139 0.681 [7,] 0.563 0.650 0.258 0.689 0.429 0.248 0.064 0.257 0.321 0.099 [8,] 0.129 0.953 0.046 0.555 0.133 0.499 0.755 0.181 0.155 0.119 [9,] 0.256 0.954 0.418 0.430 0.460 0.373 0.620 0.477 0.132 0.050 [10,] 0.718 0.340 0.854 0.453 0.943 0.935 0.170 0.771 0.221 0.929 ifelse(x .5, 1, 0) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,]010001100 0 [2,]000011001 0 [3,]110010001 1 [4,]011101100 0 [5,]011000100 1 [6,]101100000 1 [7,]110100000 0 [8,]010100100 0 [9,]010000100 0 [10,]101011010 1 On 8/10/07, Lanre Okusanya [EMAIL PROTECTED] wrote: Hello all, I am working with a 1000x1000 matrix, and I would like to return a 1000x1000 matrix that tells me which value in the matrix is greater than a theshold value (1 or 0 indicator). i have tried mat2-as.matrix(as.numeric(mat10.25)) but that returns a 1:10 matrix. I have also tried for loops, but they are grossly inefficient. THanks for all your help in advance. Lanre __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help wit matrices
Hello all, I am working with a 1000x1000 matrix, and I would like to return a 1000x1000 matrix that tells me which value in the matrix is greater than a theshold value (1 or 0 indicator). i have tried mat2-as.matrix(as.numeric(mat10.25)) but that returns a 1:10 matrix. I have also tried for loops, but they are grossly inefficient. THanks for all your help in advance. Lanre __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Seasonality
Alberto Monteiro wrote: I have a time series x = f(t), where t is taken for each month. What is the best function to detect if _x_ has a seasonal variation? If there is such seasonal effect, what is the best function to estimate it? From my own experience, I had the impression that there is nothing like a best approach to estimate the seasonal component of time series data. Maybe it is possible for you to simulate the assumed nature of your data (variable trend? variable seasonal pattern? count data with overdispersion? maybe a bimodal pattern every year?) and then try various of these methods and check if they can extract your input approximately correctly? Best, Roland __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help wit matrices
On 10-Aug-07 18:05:50, Lanre Okusanya wrote: Hello all, I am working with a 1000x1000 matrix, and I would like to return a 1000x1000 matrix that tells me which value in the matrix is greater than a theshold value (1 or 0 indicator). i have tried mat2-as.matrix(as.numeric(mat10.25)) but that returns a 1:10 matrix. I have also tried for loops, but they are grossly inefficient. THanks for all your help in advance. Lanre Simple-minded, but: S-matrix(rnorm(25),nrow=5) S [,1][,2] [,3] [,4] [,5] [1,] -0.9283624 -0.44418487 1.1174555 1.9040999 -0.4675796 [2,] 0.2658770 -0.28492642 -1.2271013 -0.5713291 1.8036235 [3,] 0.7010885 -0.42972262 0.7576021 0.3407972 -1.0628487 [4,] -0.2003087 0.87006841 0.6233792 -0.9974902 -0.9104270 [5,] 0.2729014 0.09781886 -1.0004486 1.5987385 -0.4747125 T-0*S T[S0.25] - 1+0*S[S0.25] T [,1] [,2] [,3] [,4] [,5] [1,]00110 [2,]10001 [3,]10110 [4,]01100 [5,]10010 Does this work OK for your big matrix? HTH Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 10-Aug-07 Time: 19:50:37 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with counting how many times each value occur in eachcolumn
Tom, If all values (-100,0,-50) would be in every column then simple apply(data,2,table) would work. Even if there aren0t all values in every column you could correct that and insert additional lines with all values for all columns like data - cbind(data,matrix(ncol=10,nrow=3,rep(c(-100,0,-50),10))) and then do apply(data,2,table)-1 to get correct results. But someone on a list can probably make much more elegant solution. Bye, Gasper Cankar, PhD Researcher National Examinations Centre Slovenia -Original Message- From: Tom Cohen [mailto:[EMAIL PROTECTED] Sent: Friday, August 10, 2007 2:02 PM To: r-help@stat.math.ethz.ch Subject: [R] help with counting how many times each value occur in eachcolumn Dear list, I have the following dataset and want to know how many times each value occur in each column. data [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] -100 -100 -100000000 -100 [2,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [3,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [4,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [5,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -50 [6,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [7,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [8,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [9,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [10,] -100 -100 -100 -50 -100 -100 -100 -100 -100 -100 [11,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [12,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [13,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [14,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [15,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [16,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [17,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [18,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [19,] -100 -100 -100000000 -100 [20,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 The result matrix should look like -100 0 -50 [1] 20 [2] 20 [3] 20 [4] 17 [5] 18 [6] 18 [7] 18 and so on [8] [9] [10] How can I do this in R ? Thanks alot for your help, Tom - Jämför pris på flygbiljetter och hotellrum: http://shopping.yahoo.se/c-169901-resor-biljetter.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] QUESTION ON R!!!!!!!!!!!1
Good day. I am employed at a public entity that handles million information and records of several variables and distributed in several topics. For the statistical analyses we use a statistical package, which allows us to call directly of the database (ORACLE) the information and to realize the statistical analysis. Everything is done in brief time. Nevertheless, we want to know if the statistical package R is capable of doing the same thing that does the statistical package with which we work, that is to say, is R capable of importing information of a database of million records to maximum speed and later statistical analysis allows to realize once imported the information?. I am grateful for prompt response to you since it is of supreme urgency to know this information. Luis Eduardo Castillo Méndez. ** Buen día. Trabajo en una entidad pública que maneja millones de datos y registros de varias variables y distribuidos en varios tópicos. Para los análisis estadísticos usamos un paquete estadístico, el cuál nos permite llamar directamente de la base de datos (ORACLE) la información y realizar el análisis estadístico. Todo se hace en tiempo breve. Sin embargo, queremos saber si el paquete estadístico R es capaz de hacer lo mismo que hace el paquete estadístico con el que trabajamos, es decir, R es capaz de importar datos de una base de datos de millones de registros a velocidad máxima y después permita realizar análisis estadístico una vez importado los datos? . Te agradezco pronta respuesta ya que es de suma urgencia saber este dato. Luis Eduardo Castillo Méndez. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-excel
I am running R 2.5.1 using Mac OSX 10.4.10. xlsReadWrite is a Windows binary. Instead, install and load packages: (1) gtools:(2) gdata. These are both Windows and Mac binaries. gdata depends on gtools, so be sure to load gtools first or set the installation depends parameters. Then you can use read.xls. Thus, in Mac: data-read.xls(/Users/your name/Documents/data.xls,sheet=1). For Windows, substitute the appropriate filepath and file name in the first argument of read.xls: e.g., data-read.xls(A:/filename.xls,sheet-1). Thanks to correspondents for their advice; but I hope that this may alleviate some of the frustration (referred to in the R Import/Export Manual) associated with dealing with EXCEL files in R. Erika Frigo wrote: Good morning to everybody, I have a problem : how can I import excel files in R??? thank you very much Dr.sa. Erika Frigo Università degli Studi di Milano Facoltà di Medicina Veterinaria Dipartimento di Scienze e Tecnologie Veterinarie per la Sicurezza Alimentare (VSA) Via Grasselli, 7 20137 Milano Tel. 02/50318515 Fax 02/50318501 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/R-excel-tf3975982.html#a12101349 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.