Re: [R] ggplot2: proper use of facet_grid inside a function
Thanks Thierry for the work-around. I was out of ideas. I had looked around for the facet_grid() analog of aes_string(), and concluded there wasn't one. The only thing I found was the notion of facet_grid(...) but apparently it is intended for some other use, as it doesn't work as I thought it would (like a hypothetical facet_grid_string()). Thanks so much. Bryan On 10/5/09 4:12 AM, ONKELINX, Thierry thierry.onkel...@inbo.be wrote: Dear Bryan, In the ggplot() function you can choose between aes() and aes_string(). In the first you need to hardwire the variable names, in the latter you can use objects which contain the variable names. So in your case you need aes_string(). Unfortunatly, facet_grid() works like aes() and not like aes_string(). That is why you are getting errors. A workaround would be to add a dummy column to your data. library(ggplot2) data - mpg fac1 - cty fac2 - drv res - displ data$dummy - data[, fac2] ggplot(data, aes_string(x = fac1, y = res)) + geom_point() + facet_grid(.~dummy) HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Bryan Hanson Verzonden: vrijdag 2 oktober 2009 17:21 Aan: R Help Onderwerp: [R] ggplot2: proper use of facet_grid inside a function Hello Again R Folk: I have found items about this in the archives, but I'm still not getting it right. I want to use ggplot2 with facet_grid inside a function with user specified variables, for instance: p - ggplot(data, aes_string(x = fac1, y = res)) + facet_grid(. ~ fac2) Where data, fac1, fac2 and res are arguments to the function. I have tried p - ggplot(data, aes_string(x = fac1, y = res)) + facet_grid(. ~ as.name(fac2)) and p - ggplot(data, aes_string(x = fac1, y = res)) + facet_grid(. ~ fac2) But all of these produce the same error: Error in `[.data.frame`(plot$data, , setdiff(cond, names(df)), drop = FALSE) : undefined columns selected If I hardwire the true identity of fac2 into the function, it works as desired, so I know this is a problem of connecting the name with the proper value. I'm up to date on everything: R version 2.9.2 (2009-08-24) i386-apple-darwin8.11.1 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid datasets tools utils stats graphics grDevices methods [9] base other attached packages: [1] Hmisc_3.6-0ggplot2_0.8.3 reshape_0.8.3 proto_0.3-8 [5] mvbutils_2.2.0 ChemoSpec_1.1 lattice_0.17-25 mvoutlier_1.4 [9] plyr_0.1.8 RColorBrewer_1.0-2 chemometrics_0.4 som_0.3-4 [13] robustbase_0.4-5 rpart_3.1-45 pls_2.1-0 pcaPP_1.7 [17] mvtnorm_0.9-7 nnet_7.2-48mclust_3.2 MASS_7.2-48 [21] lars_0.9-7 e1071_1.5-19 class_7.2-48 loaded via a namespace (and not attached): [1] cluster_1.12.0 Thanks for any help! Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented
[R] ggplot2: proper use of facet_grid inside a function
Hello Again R Folk: I have found items about this in the archives, but I’m still not getting it right. I want to use ggplot2 with facet_grid inside a function with user specified variables, for instance: p - ggplot(data, aes_string(x = fac1, y = res)) + facet_grid(. ~ fac2) Where data, fac1, fac2 and res are arguments to the function. I have tried p - ggplot(data, aes_string(x = fac1, y = res)) + facet_grid(. ~ as.name(fac2)) and p - ggplot(data, aes_string(x = fac1, y = res)) + facet_grid(“. ~ fac2”) But all of these produce the same error: Error in `[.data.frame`(plot$data, , setdiff(cond, names(df)), drop = FALSE) : undefined columns selected If I hardwire the true identity of fac2 into the function, it works as desired, so I know this is a problem of connecting the name with the proper value. I'm up to date on everything: R version 2.9.2 (2009-08-24) i386-apple-darwin8.11.1 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid datasets tools utils stats graphics grDevices methods [9] base other attached packages: [1] Hmisc_3.6-0ggplot2_0.8.3 reshape_0.8.3 proto_0.3-8 [5] mvbutils_2.2.0 ChemoSpec_1.1 lattice_0.17-25 mvoutlier_1.4 [9] plyr_0.1.8 RColorBrewer_1.0-2 chemometrics_0.4 som_0.3-4 [13] robustbase_0.4-5 rpart_3.1-45 pls_2.1-0 pcaPP_1.7 [17] mvtnorm_0.9-7 nnet_7.2-48mclust_3.2 MASS_7.2-48 [21] lars_0.9-7 e1071_1.5-19 class_7.2-48 loaded via a namespace (and not attached): [1] cluster_1.12.0 Thanks for any help! Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Teasing out logrank differences *between* groups using survdiff or something else?
R Folk: Please forgive what I'm sure is a fairly naïve question; I hope it's clear. A colleague and I have been doing a really simple one-off survival analysis, but this is an area with which we are not very familiar, we just happen to have gathered some data that needs this type of analysis. We've done quite a bit of reading, but answers escape us, even though the question below seems simple. Considering the following example from ?survdiff: survdiff(Surv(time, status) ~ pat.karno, data=lung) Call: survdiff(formula = Surv(time, status) ~ pat.karno, data = lung) n=225, 3 observations deleted due to missingness. N Observed Expected (O-E)^2/E (O-E)^2/V pat.karno=30 210.6580.1774 0.179 pat.karno=40 211.3370.0847 0.086 pat.karno=50 441.0797.9088 8.013 pat.karno=60 30 27 15.2379.080810.148 pat.karno=70 41 31 26.2640.8540 1.027 pat.karno=80 51 39 40.8810.0865 0.117 pat.karno=90 60 38 49.4112.6354 3.853 pat.karno=100 35 21 27.1331.3863 1.684 Chisq= 22.6 on 7 degrees of freedom, p= 0.00202 The p value here is for the entire group (right?). How do we go about determining the p value for the comparison of any four arbitrary groups in all combinations, say pat.karno = 40, 60, 80, and 100? We know (we think) that we can't just run the coxph analysis for the only the groups of interest, as the hazard ratio for any one group in an analysis with several groups is computed by holding the other groups at their average value, so the hazard ratio varies by the context. Seems like we need some sort of t-test or chi-squared test, but being mere chemists and molecular biologists, we don't quite see it and wouldn't trust ourselves anyway, given the special nature of survival analysis. Manual instructions or a function suggestion would be great. Thanks in Advance, Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Teasing out logrank differences *between* groups using survdiff or something else?
Thomas, thanks for your comments. We weren't entirely sure we we even framing the question right, your comments are encouraging. Here are our results: Call: coxph(formula = Surv(lifespan, status) ~ group, data = four) n= 573 coef exp(coef) se(coef) z Pr(|z|) groupT1 4.1371 62.6224 0.2472 16.734 2e-16 *** groupT1.U1 3.8921 49.0122 0.2367 16.442 2e-16 *** groupU1-0.51770.5959 0.1232 -4.201 2.65e-05 *** --- Signif. codes: 0 ***¹ 0.001 **¹ 0.01 *¹ 0.05 .¹ 0.1 ¹ 1 exp(coef) exp(-coef) lower .95 upper .95 groupT1 62.62240.0159738.574 101.6637 groupT1.U1 49.01220.0204030.819 77.9462 groupU1 0.59591.67824 0.4680.7586 Rsquare= 0.697 (max possible= 1 ) Likelihood ratio test= 683.6 on 3 df, p=0 Wald test= 348.9 on 3 df, p=0 Score (logrank) test = 646.6 on 3 df, p=0 Which shows that there are huge differences in our treatments. Here's the survdiff output on the object created by the call above: N Observed Expected (O-E)^2/E (O-E)^2/V group=WISO 145 145213.2 21.8 39.2 group=T1152 152 52.9 185.5 248.1 group=T1.U1 144 144 52.1 162.0 209.3 group=U1132 132254.8 59.2 130.7 Chisq= 618 on 3 degrees of freedom, p= 0 To make sure I understand, the null hypothesis here is that these all have the same survival and censoring functions, and we have shown here that they do not. But, we are particularly interesting in comparing the differential effect of treatments (these are actually genes inserted into Drosophila that are generally toxic to various degrees). What's the proper way to show/prove that: T1.U1 compared to U1 is more hazardous than T1 vs WISO If in fact it is true? Maybe the answer is already in our output, in the sense that the CI's don't overlap much? Maybe we are wrong to seek a p value as well? Thanks again, Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA On 9/15/09 10:43 PM, Thomas Lumley tlum...@u.washington.edu wrote: I think you do in fact want to just run the analysis for the four groups you are interested in. The logrank chisquared test would then be of the hypothesis that these four groups have the same survival and censoring distributions, with the greatest power for detecting proportional-hazards differences between the groups. You are correct in noting that the results you get for comparing these four groups would change depending on what other groups are in the analysis. This is a seriously underappreciated property of rank-based analyses. However, because of this dependence I think you can make a good case that restricting the analysis to the groups of interest is the best way to run the test. -thomas On Tue, 15 Sep 2009, Bryan Hanson wrote: R Folk: Please forgive what I'm sure is a fairly naïve question; I hope it's clear. A colleague and I have been doing a really simple one-off survival analysis, but this is an area with which we are not very familiar, we just happen to have gathered some data that needs this type of analysis. We've done quite a bit of reading, but answers escape us, even though the question below seems simple. Considering the following example from ?survdiff: survdiff(Surv(time, status) ~ pat.karno, data=lung) Call: survdiff(formula = Surv(time, status) ~ pat.karno, data = lung) n=225, 3 observations deleted due to missingness. N Observed Expected (O-E)^2/E (O-E)^2/V pat.karno=30 210.6580.1774 0.179 pat.karno=40 211.3370.0847 0.086 pat.karno=50 441.0797.9088 8.013 pat.karno=60 30 27 15.2379.080810.148 pat.karno=70 41 31 26.2640.8540 1.027 pat.karno=80 51 39 40.8810.0865 0.117 pat.karno=90 60 38 49.4112.6354 3.853 pat.karno=100 35 21 27.1331.3863 1.684 Chisq= 22.6 on 7 degrees of freedom, p= 0.00202 The p value here is for the entire group (right?). How do we go about determining the p value for the comparison of any four arbitrary groups in all combinations, say pat.karno = 40, 60, 80, and 100? We know (we think) that we can't just run the coxph analysis for the only the groups of interest, as the hazard ratio for any one group in an analysis with several groups is computed by holding the other groups at their average value, so the hazard ratio varies by the context. Seems like we need some sort of t-test or chi-squared test, but being mere chemists and molecular biologists, we don't quite see it and wouldn't trust ourselves anyway, given the special nature of survival analysis. Manual instructions or a function suggestion would be great. Thanks in Advance, Bryan * Bryan Hanson Professor of Chemistry Biochemistry
[R] xyplot {lattice} are different types possible for each panel?
Hello R Folks... Using the example below, I¹d like two of the panels to be plotted with type = ³p² but the third to be done with type = ³h². I can¹t use type = c(³p², ³p², ³h²) because this syntax applies all given types to every panel. I don¹t think I can use groups and distribute.type because these are intended for different styles of plotting within a single panel. As you can see, I tried to do a panel function following something I saw in the Lattice book, but this has no effect at all. Looks like it may have to be more elaborate, but I¹m stuck. Any suggestions appreciated! Thanks, Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA y - rnorm(100) x - rnorm(100) names - rep(c(Set 1, Set 2, Set 3), 4) df - data.frame(y = y, x = y, names = as.factor(names)) p - xyplot(y ~ x | names, layout = c(1, 3), panel = function(...) { panel.xyplot(...) if (panel.number() == 1) type = h }) plot(p) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xyplot {lattice} are different types possible for each panel?
Thanks Baptiste, your suggestion works wonderfully. Bryan For anyone following along, the following line needs to replace the similar one in my original example: names - rep(c(Set 1, Set 2, Set 3, Set 4), 25) Or the data lengths will be wrong. On 9/7/09 4:19 PM, baptiste auguie baptiste.aug...@googlemail.com wrote: Hi, Something like this perhaps, p - xyplot(y ~ x | names,   layout = c(1, 3),   panel = function(...,type=p) {       if (panel.number() == 1) {         panel.xyplot(...,type = h)          } else {         panel.xyplot(...,type = type)         }       }) plot(p) HTH, baptiste 2009/9/7 Bryan Hanson han...@depauw.edu Hello R Folks... Using the example below, IÄ d like two of the panels to be plotted with type = ÅpË but the third to be done with type = ÅhË.  I canÄ t use type = c(ÅpË, ÅpË, ÅhË) because this syntax applies all given types to every panel  I donÄ t think I can use groups and distribute.type because these are intended for different styles of plotting within a single panel.  As you can see, I tried to do a panel function following something I saw in the Lattice book, but this has no effect at all.  Looks like it may have to be more elaborate, but IÄ m stuck.  Any suggestions appreciated! Thanks, Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA y - rnorm(100) x - rnorm(100) names - rep(c(Set 1, Set 2, Set 3), 4) df - data.frame(y = y, x = y, names = as.factor(names)) p - xyplot(y ~ x | names,   layout = c(1, 3),   panel = function(...) {     panel.xyplot(...)     if (panel.number() == 1) type = h     }) plot(p)     [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matrix as input to xyplot {lattice} - proper extended formula syntax
Hello R Folks... I have a list with the following structure: str(df) List of 3 $ y: num [1:4, 1:1242] -0.005379 0.029874 -0.023274 0.000655 -0.004537 .. $ x: num [1:1242] 501 503 505 507 509 ... $ names: Factor w/ 4 levels PC Loading 1,..: 1 2 3 4 I want to plot each row of df$y against df$x, and have each plot in it¹s own panel according to the levels of df$names. The following works in the sense that the layout is right, but the y values have clearly been recycled or skipped in some fashion (and an error is thrown for each panel that the length of x and y aren¹t the same): p - xyplot(y ~ x | names, data = df, main = title, layout = c(1, dim(y)[1]) In reviewing the extended formula interface in the Lattice Book, what I want to happen is y1 + y2 + y3 + y4 ~ x | names, outer = TRUE I see two options: figure out a way to create the extended formula on the fly (and the actual number of rows in y may vary), which seems potentially tricky, or create a data frame by stacking each row of y and repeating x and names to match. This seems like a waste of memory. I¹ve looked through the archives and haven¹t come across something quite like this, or at least I don¹t recognize it if I have! Is there a more elegant way to tell xyplot I want to use each row of y repeatedly with the same x, in a loop-like fashion? TIA. Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix as input to xyplot {lattice} - proper extended formula syntax
Thanks David, your way of constructing df is much more compact than what I was using, so I've incorporated it. I also had my rows and columns transposed relative to how xyplot wanted them (though I had tested for that, other problems interfered). In my case, I may have varying numbers of y columns, from y.1 to y.n let's say. Is there an easy way of creating the phrase y.1+y.2+...y.n to pass to xyplot, or even better, some sort of syntax that says take all y.n and plot them against x? Thanks, Bryan On 9/6/09 12:51 AM, David Winsemius dwinsem...@comcast.net wrote: I'm not exactly sure what structure df has. Here's my effort to duplicate it: df - data.frame(y=matrix(rnorm(24), nrow=6), x=1:6) df y.1y.2y.3y.4 x 1 0.1734636 0.2348417 -1.2375648 -1.3246439 1 2 1.9551669 -1.1027262 -0.7307332 0.3953752 2 3 -0.7645778 1.6297861 0.4743805 -0.4476145 3 4 -0.5308756 -0.5246534 -0.3854609 -1.609 4 5 0.7406525 -0.8691720 -0.8194084 1.6122059 5 6 -0.9625619 -1.0774165 1.0760829 0.3659436 6 And this seems to accomplish the desired task. Presumably you have assigned off-stage the value of title to a meaningful character string? p - xyplot(y.1+y.2+y.3+y.4 ~ x |1:4, data = df, main = title ,layout=c(1,4) ) p On Sep 5, 2009, at 11:52 PM, Bryan Hanson wrote: Hello R Folks... I have a list with the following structure: str(df) List of 3 $ y: num [1:4, 1:1242] -0.005379 0.029874 -0.023274 0.000655 -0.004537 .. $ x: num [1:1242] 501 503 505 507 509 ... $ names: Factor w/ 4 levels PC Loading 1,..: 1 2 3 4 I want to plot each row of df$y against df$x, and have each plot in it¹s own panel according to the levels of df$names. The following works in the sense that the layout is right, but the y values have clearly been recycled or skipped in some fashion (and an error is thrown for each panel that the length of x and y aren¹t the same): p - xyplot(y ~ x | names, data = df, main = title, layout = c(1, dim(y)[1]) In reviewing the extended formula interface in the Lattice Book, what I want to happen is y1 + y2 + y3 + y4 ~ x | names, outer = TRUE I see two options: figure out a way to create the extended formula on the fly (and the actual number of rows in y may vary), which seems potentially tricky, or create a data frame by stacking each row of y and repeating x and names to match. This seems like a waste of memory. I¹ve looked through the archives and haven¹t come across something quite like this, or at least I don¹t recognize it if I have! Is there a more elegant way to tell xyplot I want to use each row of y repeatedly with the same x, in a loop-like fashion? TIA. Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Google's R Style Guide (has become S3 vs S4, in part)
Looks like the discussion is no longer about R Style, but S3 vs S4? To that end, I asked more or less the same question a few weeks ago, arising from the much the same motivations. The discussion was helpful, here's the link: http://www.nabble.com/Need-Advice%3A-Considering-Converting-a-Package-from-S 3-to-S4-tc24901482.html#a24904049 For what it's worth, I decided, but with some ambivalence, to stay with S3 for now and possibly move to S4 later. In the spirit of S4, I did write a function that is nearly the equivalent of validObject for my S3 object of interest. Overall, it looked like I would have to spend a lot of time moving to S4, while staying with S3 would allow me to get the project done and get results going much faster (see Frank Harrell's comment in the thread above). As a concrete example (concrete for us non-programmers, non-statisticians), I recently decided that I wanted to add a descriptive piece of text to a number of my plots, and it made sense to include the text with the object. So I just added a list element to the existing S3 object, e.g. Myobject$descrip No further work was necessary, I could use it right away. If instead, if I had made Myobject an S4 object, then I would have to go back, redefine the object, update validObject, and possibly write some new accessor and definitely constructor functions. At least, that's how I understand the way one uses S4 classes. Back to trying to get something done! Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA On 9/1/09 6:16 AM, Duncan Murdoch murd...@stats.uwo.ca wrote: Corrado wrote: Thanks Duncan, Spencer, To clarify, the situation is: 1) I have no reasons to choose S3 on S4 or vice versa, or any other coding convention 2) Our group has not done any OO developing in R and I would be the first, so I can set up the standards 3) I am starting from scratch with a new package, so I do not have any code I need to re-use. 4) I am an R OO newbie, so whatever I can learn from the beginning what is better and good for me. So the questions would be two: 1) What coding style guide should we / I follow? Is the google style guide good, or is there something better / more prescriptive which makes our research group life easier? I don't think I can answer that. I'd recommend planning to spend some serious time on the decision, and then go by your personal impression. S4 is definitely harder to learn but richer, so don't make the decision too quickly. Take a look at John Chamber's new book, try small projects in each style, etc. 2) What class type should I use? From what you two say, I should use S3 because is easier to use what are the disadvantages? Is there an advantages / disadvantages table for S3 and S4 classes? S3 is much more limited than S4. It dispatches on just one argument, S4 can dispatch on several. S3 allows you to declare things to be of a certain class with no checks that anything will actually work; S4 makes it easier to be sure that if you say something is of a certain class, it really is. S4 hides more under the hood: if you understand how regular R functions work, learning S3 is easy, but there's still a lot to learn before you'll be able to use S4 properly. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lattice in a loop does not produce output
Lattice objects must be assigned and deliberately printed: png(test.png) p - xyplot(y~x|z) plot(p) dev.off() Should fix both problems. Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA On 8/18/09 8:13 AM, Alex van der Spek am...@xs4all.nl wrote: I cannot understand why xyplot does not work within a simple for loop. This works up to the for loop; inside the for loop the png files are opened and closed, but nothing is plotted. No error messages are written to the console either. This is the case on both Windows and Linux. By the way, running the script below on Linux using source() does not even produce the first xyplot. This is less of an issue for me though. #! usr/bin/env R # Test lattice loop rm(list=ls()) x-1:16 y-2*x-1 z-rep(c('A','B','C','D'),4) xyz-data.frame(x=x,y=y,z=z) require(lattice) png('Test.png') xyplot(y~x|z) dev.off() for (i in 1:5) { f-paste('Test',i,'.png',sep='') png(f) xyplot(y~x|z) dev.off() } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Selecting/Accessing the last vector in a list of a list of data.frames
Hello Again R Folks: I¹m trying to clean up some code. Suppose I have an object like this: str(test) List of 2 $ G:List of 2 ..$ cls:'data.frame':101 obs. of 2 variables: .. ..$ V1: num [1:101] -0.0019 -0.0019 -0.00189 -0.00188 -0.00186 ... .. ..$ V2: num [1:101] 0.000206 0.000247 0.000288 0.000329 0.000371 ... ..$ rob:'data.frame':101 obs. of 2 variables: .. ..$ V1: num [1:101] -0.00142 -0.00141 -0.0014 -0.00139 -0.00137 ... .. ..$ V2: num [1:101] 0.000424 0.000456 0.000487 0.000517 0.000546 ... $ T:List of 2 ..$ cls:'data.frame':101 obs. of 2 variables: .. ..$ V1: num [1:101] -0.00222 -0.00222 -0.00221 -0.00219 -0.00216 ... .. ..$ V2: num [1:101] -0.00077 -0.000742 -0.000712 -0.000681 -0.000648 .. ..$ rob:'data.frame':101 obs. of 2 variables: .. ..$ V1: num [1:101] -0.000981 -0.000979 -0.000972 -0.000961 -0.000946 .. .. ..$ V2: num [1:101] -0.000332 -0.000303 -0.000274 -0.000245 -0.000216 .. I need to perform some operations on each value of V1 in turn, then each value of V2 in turn (so for instance I want test$G$cls$V1). The structure of this object is nearly constant except the first elements of the list (G, T in the example) may vary in number and name, so I need something that accommodates this. I can do this with loops, but it seems like a job for lapply or rapply, but these don't quite work. I've played with quite a few variations, searched the help archives and found a number of useful ideas, but not quite what I need. The only thing that nearly works is do.call(cbind, object) enough times to bring V1 and V2 to the surface but then I've lost my carefully constructed naming. Any suggestions appreciated. It seems like there might be a simple approach, but I may be too tired right now to see it! Thanks, Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting/Accessing the last vector in a list of a list of data.frames
Thanks Henrique. I would have not thought of the syntax you suggest, though it embodies the sort of multilevel (not quite recursive) application of lapply I was thinking of. However, it returns ³test² with V2 missing, everything else intact. Strange; I can't really state in words what I think it should do, much less what it does do! I think an easier approach for me will be to re-write the function that generates test so it is simpler to extract what I need. I will think on it. Thanks, Bryan temp - lapply(test, lapply, '[', 'V1') str(temp) List of 2 $ G:List of 4 ..$ cls:'data.frame':101 obs. of 1 variable: .. ..$ V1: num [1:101] -0.0019 -0.0019 -0.00189 -0.00188 -0.00186 ... ..$ rob:'data.frame':101 obs. of 1 variable: .. ..$ V1: num [1:101] -0.00142 -0.00141 -0.0014 -0.00139 -0.00137 ... ..$ c : num NA ..$ r : num NA $ T:List of 4 ..$ cls:'data.frame':101 obs. of 1 variable: .. ..$ V1: num [1:101] -0.00222 -0.00222 -0.00221 -0.00219 -0.00216 ... ..$ rob:'data.frame':101 obs. of 1 variable: .. ..$ V1: num [1:101] -0.000981 -0.000979 -0.000972 -0.000961 -0.000946 .. ..$ c : num NA ..$ r : num NA On 8/11/09 7:28 PM, Henrique Dallazuanna www...@gmail.com wrote: If I understand correctly your question, you can try something about like this: # Access all elements named 'V1' in your list lapply(test, lapply, '[', 'V1') On Tue, Aug 11, 2009 at 3:49 PM, Bryan Hanson han...@depauw.edu wrote: Hello Again R Folks: I¹m trying to clean up some code. Suppose I have an object like this: str(test) List of 2 $ G:List of 2 ..$ cls:'data.frame': 101 obs. of 2 variables: .. ..$ V1: num [1:101] -0.0019 -0.0019 -0.00189 -0.00188 -0.00186 ... .. ..$ V2: num [1:101] 0.000206 0.000247 0.000288 0.000329 0.000371 ... ..$ rob:'data.frame': 101 obs. of 2 variables: .. ..$ V1: num [1:101] -0.00142 -0.00141 -0.0014 -0.00139 -0.00137 ... .. ..$ V2: num [1:101] 0.000424 0.000456 0.000487 0.000517 0.000546 ... $ T:List of 2 ..$ cls:'data.frame': 101 obs. of 2 variables: .. ..$ V1: num [1:101] -0.00222 -0.00222 -0.00221 -0.00219 -0.00216 ... .. ..$ V2: num [1:101] -0.00077 -0.000742 -0.000712 -0.000681 -0.000648 .. ..$ rob:'data.frame': 101 obs. of 2 variables: .. ..$ V1: num [1:101] -0.000981 -0.000979 -0.000972 -0.000961 -0.000946 .. .. ..$ V2: num [1:101] -0.000332 -0.000303 -0.000274 -0.000245 -0.000216 .. I need to perform some operations on each value of V1 in turn, then each value of V2 in turn (so for instance I want test$G$cls$V1). The structure of this object is nearly constant except the first elements of the list (G, T in the example) may vary in number and name, so I need something that accommodates this. I can do this with loops, but it seems like a job for lapply or rapply, but these don't quite work. I've played with quite a few variations, searched the help archives and found a number of useful ideas, but not quite what I need. The only thing that nearly works is do.call(cbind, object) enough times to bring V1 and V2 to the surface but then I've lost my carefully constructed naming. Any suggestions appreciated. It seems like there might be a simple approach, but I may be too tired right now to see it! Thanks, Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Need Advice: Considering Converting a Package from S3 to S4
Hello R Folks... Not a technical question, but I need some advice and perspective. I¹ve got a set of functions I¹m planning to put together into a package. The main hunk of data that gets used by different functions is currently an S3 list. I¹ve been reading about S4 objects, and I see the (numerous) advantages of them. I have seen the recommendation that all new packages be done with S4. Before I get much farther, I need to decide if I will go to S4 for this central hunk of data. My questions are about making the conversion, whether it is worth the trouble and what pitfalls I might encounter. I can easily (re)define my key list as an S4 object. But after that... 1. It seems the the simplest/minimalist approach is to update all the functions so that where I use ³data$element² I replace it with ³d...@slot². Is it really this easy, or have I missed something? Easy or not, this by itself doesn't take advantage of much, except the ability to define subclasses at a later date (maybe that is sufficient reason though). 2. I also see in my reading that I should consider writing accessor functions for my object. What I can't quite see is why I would want to do this, if I can get the contents with d...@slot? What am I missing here? 3. At this point, I'm not sure that I would write specific methods for this proposed S4 object. It would not be necessary in the short run. Making it S4 would mainly allow for future expansion as they say. If methods are not critical, does it make sense to spend the time making the change? Any perspective and advice would be welcomed. Thanks in advance, Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] str(data.frame) after subsetting reflects original structure, not subsetted structure?
I find that after subsetting (you may prefer conditional selection) a data frame and assigning it to a new object, the str(new object) reflects the original data frame, not the new one: A - rnorm(20) B - factor(rep(c(t, g), 10)) C - factor(rep(c(h, l), 10)) D - data.frame(A, B, C) str(D) # reports correctly E - D[D$C == h,] str(E) # reports that D$C still has 2 levels, but E # or E$C shows that subsetting worked properly Summary(E) # shows the original structure and that subsetting worked Is this the expected behavior, and if so, is there a particular rationale? I would be pretty certain that the information about E was inherited from D, but why wasn't it updated to reflect the revised object? Is there an argument that I can use to force the updating? For better or worse, I use str() a lot to check my work, and in this case, it seems to have misled me. Thanks as always, Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] str(data.frame) after subsetting reflects original structure, not subsetted structure?
Thanks Marc and Ben... Your answers were most helpful. I suspected something had been written about it, but was having trouble formulating a reasonable search query. I was looking in the help page for str(), which was sort of a dead end. Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA On 7/24/09 9:46 AM, Marc Schwartz marc_schwa...@me.com wrote: On Jul 24, 2009, at 8:17 AM, Bryan Hanson wrote: I find that after subsetting (you may prefer conditional selection) a data frame and assigning it to a new object, the str(new object) reflects the original data frame, not the new one: A - rnorm(20) B - factor(rep(c(t, g), 10)) C - factor(rep(c(h, l), 10)) D - data.frame(A, B, C) str(D) # reports correctly E - D[D$C == h,] str(E) # reports that D$C still has 2 levels, but E # or E$C shows that subsetting worked properly Summary(E) # shows the original structure and that subsetting worked Is this the expected behavior, and if so, is there a particular rationale? I would be pretty certain that the information about E was inherited from D, but why wasn't it updated to reflect the revised object? Is there an argument that I can use to force the updating? For better or worse, I use str() a lot to check my work, and in this case, it seems to have misled me. Thanks as always, Bryan See ?[.factor which is the extract (subset) method for factors. Note that the 'drop' argument is FALSE by default. It is this argument that controls the retention of unused factor levels. The reason that it is FALSE by default is to ensure that if you are comparing factors from more than one data source, the comparisons of or the use of the factor levels are consistent. For one approach to dropping unused factor levels from a data frame, see: http://wiki.r-project.org/rwiki/doku.php?id=tips:data-manip:drop_unused_levels HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] panel.lmline - are m, b, and r^2 accessible somehow?
Hi R Folks... Are the results of a fit carried out by panel.lmline readily available for use in a lattice plot? I¹d like to put r^2, m, and b on each panel. I can certainly write something that does this manually and then use it with panel.text, but if it¹s already available, that would be preferable, especially as lattice permits condition and subsetting so readily. Looking at panel.abline, I see that it returns invisibly (which makes sense as one doesn¹t assign panel.abline to a variable) so my guess is the answer is no. The only thing I could find in the archives had the calculations done manually within panel.groups ( http://www.nabble.com/add-trend-line-to-each-group-of-data-in%3A-xyplot(y1%2 By2-~-x-|-grp...-td3344023.html#a3382909) but that was a few versions back. Other suggestions? Thanks, Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Managing Packages: Which functions call other functions in package?
R Colleagues: I¹m moving toward building my own package, and it occurs to me that it might be useful to have some method of listing or better, graphically displaying, which functions call other functions within the package. In other words, I¹m seeking some means of seeing how the functions relate to each other. I¹ve looked around a bit, and one way to do this might be via one of the network graphing approaches used inter alia in Bioconductor. But I suspect someone has created such a tool already and I¹m just lacking the proper key words. It seems like something a version control system might provide; maybe that¹s where I should be looking? Does such a thing exist? Thanks, Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I define the method for gcheckboxgroup in gWidgets?
Thanks Michael... I was working by analogy to the gbuttons, so I was trying to ³add² the gcheckboxgroup, which is apparently not necessary (due to, I guess, the intrinsic differences between the widgets). The index thing I was just screwed up on! I have it working now. Nice package. Bryan On 6/27/09 8:25 AM, Michael Lawrence mflaw...@fhcrc.org wrote: On Thu, Jun 25, 2009 at 8:29 AM, Bryan Hanson han...@depauw.edu wrote: Hi All... I¹m trying to build a small demo using gWidgets which permits interactive scaling and selection among different things to plot. I can get the widgets for scaling to work just fine. I am using gcheckboxgroup to make the (possibly multiple) selections. However, I can¹t seem to figure out how to properly define the gcheckboxgroup; I can draw the widget properly, I think my handler would use the svalue right if it actually received it. Part of the problem is using the index of the possible values rather than the values themselves, but I'm pretty sure this is not all of the problem. I've been unable to find an example like this in any of the various resources I've come across. BTW, report.which is really only there for troubleshooting. It works to return the values, I can't get it to return the indices, which are probably what I need in this case. A demo script is at the bottom and the error is just below. tmp - gcheckboxgroup(stuff, handler = report.which, index = TRUE, + checked = c(TRUE, FALSE, FALSE, FALSE, FALSE), container = leftPanel) The above code should define the gcheckboxgroup. add(tmp, value = 1, expand = TRUE) I'm not sure what you are trying to add here. Error in function (classes, fdef, mtable) : unable to find an inherited method for function .add, for signature gCheckboxgroupRGtk, guiWidgetsToolkitRGtk2, numeric This error suggests that I don't have a method - I agree, but I don't know what goes into the method for gcheckboxgroup. For the sliders, it's clear to me how the actions and drawing of the widgets differ, but not so for gcheckboxgroup. A big TIA, Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA Full Script: x - 1:10 y1 - x y2 - x^2 y3 - x^0.5 y4 - y^3 df - as.data.frame(cbind(x, y1, y2, y3, y4)) stuff - c(y = x, y = x^2, y = x^0.5, y = x^3) which.y - 2 # inital value, to be changed later by the widget # Define a function for the widget handlers update.Plot - function(h,...) { plot(df[,1], df[,svalue(which.y)], type = l, ylim = c(0, svalue(yrange)), main = Interactive Selection Scaling, xlab = x values, ylab = y values) } report.which - function(h, ...) { print(svalue(h$obj), index = TRUE) } In the above handler, do you mean to pass the 'index' parameter to the svalue() function? # Define the actions type of widget, along with returned values. # Must be done before packing widgets. yrange - gslider(from = 0, to = max(y), by = 1.0, value = max(y), handler = update.Plot) which.y - gcheckboxgroup(stuff, handler = report.which, index = TRUE, checked = c(TRUE, FALSE, FALSE, FALSE, FALSE)) # Assemble the graphics window groups of containers mainWin - gwindow(Interactive Plotting) bigGroup - ggroup(cont = mainWin) leftPanel - ggroup(horizontal = FALSE, container = bigGroup) # Format and pack the widgets, link to their actions/type tmp - gframe(y range, container = leftPanel) add(tmp, yrange, expand = TRUE) tmp - gcheckboxgroup(stuff, handler = report.which, index = TRUE, checked = c(TRUE, FALSE, FALSE, FALSE, FALSE), container = leftPanel) add(tmp, value = 1, expand = TRUE) # Put it all together add(mainWin, ggraphics()) # puts the active graphic window w/i mainWin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do I define the method for gcheckboxgroup in gWidgets?
Hi All... I¹m trying to build a small demo using gWidgets which permits interactive scaling and selection among different things to plot. I can get the widgets for scaling to work just fine. I am using gcheckboxgroup to make the (possibly multiple) selections. However, I can¹t seem to figure out how to properly define the gcheckboxgroup; I can draw the widget properly, I think my handler would use the svalue right if it actually received it. Part of the problem is using the index of the possible values rather than the values themselves, but I'm pretty sure this is not all of the problem. I've been unable to find an example like this in any of the various resources I've come across. BTW, report.which is really only there for troubleshooting. It works to return the values, I can't get it to return the indices, which are probably what I need in this case. A demo script is at the bottom and the error is just below. tmp - gcheckboxgroup(stuff, handler = report.which, index = TRUE, + checked = c(TRUE, FALSE, FALSE, FALSE, FALSE), container = leftPanel) add(tmp, value = 1, expand = TRUE) Error in function (classes, fdef, mtable) : unable to find an inherited method for function .add, for signature gCheckboxgroupRGtk, guiWidgetsToolkitRGtk2, numeric This error suggests that I don't have a method - I agree, but I don't know what goes into the method for gcheckboxgroup. For the sliders, it's clear to me how the actions and drawing of the widgets differ, but not so for gcheckboxgroup. A big TIA, Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA Full Script: x - 1:10 y1 - x y2 - x^2 y3 - x^0.5 y4 - y^3 df - as.data.frame(cbind(x, y1, y2, y3, y4)) stuff - c(y = x, y = x^2, y = x^0.5, y = x^3) which.y - 2 # inital value, to be changed later by the widget # Define a function for the widget handlers update.Plot - function(h,...) { plot(df[,1], df[,svalue(which.y)], type = l, ylim = c(0, svalue(yrange)), main = Interactive Selection Scaling, xlab = x values, ylab = y values) } report.which - function(h, ...) { print(svalue(h$obj), index = TRUE) } # Define the actions type of widget, along with returned values. # Must be done before packing widgets. yrange - gslider(from = 0, to = max(y), by = 1.0, value = max(y), handler = update.Plot) which.y - gcheckboxgroup(stuff, handler = report.which, index = TRUE, checked = c(TRUE, FALSE, FALSE, FALSE, FALSE)) # Assemble the graphics window groups of containers mainWin - gwindow(Interactive Plotting) bigGroup - ggroup(cont = mainWin) leftPanel - ggroup(horizontal = FALSE, container = bigGroup) # Format and pack the widgets, link to their actions/type tmp - gframe(y range, container = leftPanel) add(tmp, yrange, expand = TRUE) tmp - gcheckboxgroup(stuff, handler = report.which, index = TRUE, checked = c(TRUE, FALSE, FALSE, FALSE, FALSE), container = leftPanel) add(tmp, value = 1, expand = TRUE) # Put it all together add(mainWin, ggraphics()) # puts the active graphic window w/i mainWin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Questíon regarding the use of write.csv2 , write.table ...
Write.table will give you all the control you need to get exactly what you want. Bryan On 6/18/09 7:50 AM, xavier.char...@free.fr xavier.char...@free.fr wrote: Hi, It sounds like the first column that is added is actually the row names. That's why a previous answer pointed this argumented. Default for write.csv is to write the row names along with the data. So, this should work: write.csv2(exampleDataframe,file=exampleDataframe.csv, row.names=FALSE) Xavier - Mail Original - De: Lavri Labi lavri.l...@tu-dortmund.de À: Jorge Ivan Velez jorgeivanve...@gmail.com Cc: r-help@r-project.org Envoyé: Jeudi 18 Juin 2009 12h35:31 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne Objet: Re: [R] Questíon regarding the use of write.csv2, write.table ... Dear Jorge, thank you for the quick answer. But I am afraid you didn´t understand my problem. I want to write the following data frame exampleDataframe in a csv2-file. a;b;c;d 1 ; 2 ; 3 ; 4 5 ; 6 ; 7 ; 8 9 ; 0 ; 1 ; 2 After sending the command: write.csv2(exampleDataframe,file=exampleDataframe.csv) I become the following file: ;a;b;c;d 1 ; 1 ; 2 ; 3 ; 4 2 ; 5 ; 6 ; 7 ; 8 3 ; 9 ; 0 ; 1 ; 2 How can I delete the first column added, which I do not need? The row.names you suggest me is not reallly helpful in this case. Cheers, Lavri Dear Lavri, Take a look at the row.names argument in ?write.table. HTH, Jorge On Thu, Jun 18, 2009 at 4:09 AM, Lavri Labi lavri.l...@tu-dortmund.dewrote: Hi all, I use write.csv and write.table to write a data frame in a file like following: write.csv2(allRandomTestCase_XDroped, allRandomTestCase.csv) But in the created file allRandomTestCase.csv an additional column with consecutive numbers is automatically added to the column of the data frame allRandomTestCase_XDroped. That is why my question, how can I write data in a file without this added column? Cheers, Lavri __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.or g mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in the NY Times
I believe the SAS person shot themselves in the foot more in more ways than one. In my mind, the reason you would pay, as Frank said, for non-peer-reviewed software with hidden implementations of analytic methods that cannot be reproduced by others Would be so that you can sue them later when a software problem in the designing of the engine makes your plane fall out of the sky! Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA ³I think it addresses a niche market for high-end data analysts that want free, readily available code, said Anne H. Milley, director of technology product marketing at SAS. She adds, ³We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.² Thanks for posting. Does anyone else find the statement by SAS to be humourous yet arrogant and short-sighted? Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Are there any guis out there, which will allow editing of the graph?
A colleague of mine, quite by accident, discovered that Adobe Illustrator can manipulate plots made by base graphics, and when you do, many pieces of the plot are separate items that can be manipulated with Illustrator. He cuts and pastes from a Quartz window on his Mac, into Illustrator. Apparently Illustrator has two kinds of selection arrows, one of which selects groups of things, the other selects individual things. He confirms that text from R may be changed, colors in polygonal areas may be changed, objects may be moved etc once they are selected. Apparently he saw a notation that this was possible on some R code that went with a Wikipedia entry, and he tried it, and it works. YMMV. Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA On 8/4/08 1:09 PM, Bert Gunter [EMAIL PROTECTED] wrote: No. Can't be. Editable graphs require that the graph be produced via code that produces changeable components. All R graphs are essentially static. That said, caveats: graphs drawn via the grid package functionality -- for example lattice graphs -- **are** produced via changeable code. If you read the lattice docs carefully, you'll see that there are a few features there that allow some graph editing. There may be other packages that also have some editing capabilties. R's base graphics also allow a little interaction via identify() and locator(), which can be useful (e.g. for positioning legends). One can also simulate interactivity by recording various components of graph construction and then modifying and redrawing them. But this is just manually doing what you're looking for, so probably a dumb suggestion. While graph editing certainly can be a nice feature, it is very difficult to implement without severely constraining graphing flexibility (IMO, of course). Graphs are very complex beasties, so it's hard to write clean code that allows flexibile editing capabilities. Look at S-Plus's graph editing, which I always found harder to use (and more buggy) than just issuing the commands. (To be fair, it's been some years since I tried). Again, just my 2 bits. Others may well disagree (and perhaps point you to what you seek). Cheers, Bert -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Arthur Roberts Sent: Monday, August 04, 2008 9:51 AM To: [EMAIL PROTECTED] Subject: [R] Are there any guis out there,which will allow editing of the graph? Hi, all, I would like to know if there is any gui interface out there (academic or commercial) that allows one to edit R-language generated graphs (e.g positioning x axis labels.) It would be nice to have something like the user interface of Igor or Origin. I have already used JGR and R-gui. These are good, but they don't allow one to easily edit graphs. I have also tried locator() and the package iplots. Your input is greatly appreciated. Best wishes, Art Roberts University of Washington __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Properly Parsing Pre-Superscripts Displaying Them With grid.text
Thanks Gavin, that nicely solved one problem. On a fresh look at the archives, I see my other problem was trying to paste expressions, a bad idea. So, I'm writing each line separately. All problems are fixed! By the way, I discovered from the archives that to get a % in the final output, you have to quote it in the expression: % which I suppose is a general feature. I may have missed it, but that behavior doesn't seem to be mentioned in the plotmath help page - perhaps it's too obvious? Thanks, Bryan On 8/2/08 2:19 PM, Gavin Simpson [EMAIL PROTECTED] wrote: On Fri, 2008-08-01 at 17:23 -0400, Bryan Hanson wrote: Hi all... I¹m making a chart dealing with frequencies of isotopes of various elements. For instance, I'd like the following text to appear on a chart with the 35 and 37 as superscripts: Based upon: 35Cl: 75% 37Cl: 25% I am having problems properly parsing the superscript that preceeds the Cl, since there is no character ahead of the superscript (I saw examples in the archives where there was a preceeding character). Also, the construction of the string seems to not be working as I expect either. So, I think there are two problems here. Here is a sample of what doesn't quite work: expression(phantom()^{35}*Cl[1]) works if I understand what you want. phantom() is documented on ?plotmath (?phantom is an alias for this help page also) and allows you to leave space as though argument was there, but I use it here with no object so no space left but this has the side effect of allowing the superscript for this space. Note that you need to wrap multiple character superscripts in {} ([] for subscripts). Also, you need to produce a valid expression so the * achieves this between the two components (the phantom()^{35} and the Cl[1] bits). You could also achieve the same result by pasting the bits together: expression(paste(phantom()^{35}, Cl[1])) but the former seems more familiar and intuitive to me now after grappling with plotmath for a while. G Cl1 - rbinom(1000, size = 1, prob = 0.25) pCl1 - histogram(Cl1, main = expression(Cl[1]), xlab = , ylab = , scales = list(draw = FALSE), ylim = c(0:80)) plot(pCl1) # This works fine but doesn't have everything I want: leg.txt1 - paste(Based upon:\n, : 75%\n, : 25%, sep = ) grid.text(leg.txt1, 0.5, 0.5) # This paste doesn't work due to the expression statements: leg.txt2 - paste(Based upon:\n, expression(^35*Cl), : 75%\n, expression(^37*Cl), : 25%, sep = ) # This doesnt' produce an error, but doesn't produce what is wanted either, # as the expression is taken (almost) literally: leg.txt3 - paste(Based upon:\n, expression(^35*Cl), : 75%\n, expression(^37*Cl), : 25%, sep = ) grid.text(leg.txt3, 0.5, 0.3) From watching the help list, I know parsing things can be tricky. TIA, Bryan sessionInfo() R version 2.7.1 (2008-06-23) i386-apple-darwin8.10.1 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] datasets grid grDevices graphics stats utils methods [8] base other attached packages: [1] fastICA_1.1-9 DescribeDisplay_0.1.3 ggplot_0.4.2 [4] RColorBrewer_1.0-2reshape_0.8.0 MASS_7.2-42 [7] pcaPP_1.5 mvtnorm_0.9-0 hints_1.0.1-1 [10] mvoutlier_1.3 robustbase_0.2-8 lattice_0.17-8 [13] rggobi_2.1.9 RGtk2_2.12.5-3 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Properly Parsing Pre-Superscripts Displaying Them With grid.text
Hi all... I¹m making a chart dealing with frequencies of isotopes of various elements. For instance, I'd like the following text to appear on a chart with the 35 and 37 as superscripts: Based upon: 35Cl: 75% 37Cl: 25% I am having problems properly parsing the superscript that preceeds the Cl, since there is no character ahead of the superscript (I saw examples in the archives where there was a preceeding character). Also, the construction of the string seems to not be working as I expect either. So, I think there are two problems here. Here is a sample of what doesn't quite work: Cl1 - rbinom(1000, size = 1, prob = 0.25) pCl1 - histogram(Cl1, main = expression(Cl[1]), xlab = , ylab = , scales = list(draw = FALSE), ylim = c(0:80)) plot(pCl1) # This works fine but doesn't have everything I want: leg.txt1 - paste(Based upon:\n, : 75%\n, : 25%, sep = ) grid.text(leg.txt1, 0.5, 0.5) # This paste doesn't work due to the expression statements: leg.txt2 - paste(Based upon:\n, expression(^35*Cl), : 75%\n, expression(^37*Cl), : 25%, sep = ) # This doesnt' produce an error, but doesn't produce what is wanted either, # as the expression is taken (almost) literally: leg.txt3 - paste(Based upon:\n, expression(^35*Cl), : 75%\n, expression(^37*Cl), : 25%, sep = ) grid.text(leg.txt3, 0.5, 0.3) From watching the help list, I know parsing things can be tricky. TIA, Bryan sessionInfo() R version 2.7.1 (2008-06-23) i386-apple-darwin8.10.1 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] datasets grid grDevices graphics stats utils methods [8] base other attached packages: [1] fastICA_1.1-9 DescribeDisplay_0.1.3 ggplot_0.4.2 [4] RColorBrewer_1.0-2reshape_0.8.0 MASS_7.2-42 [7] pcaPP_1.5 mvtnorm_0.9-0 hints_1.0.1-1 [10] mvoutlier_1.3 robustbase_0.2-8 lattice_0.17-8 [13] rggobi_2.1.9 RGtk2_2.12.5-3 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Conditionally Updating Lattice Plots
Hi All... I can¹t seem to find an answer to this in the help pages, archives, or Deepayan¹s Lattice Book. I want to do a Lattice plot, and then update it, possibly more than once, depending upon some logical options. Code below; it produces a second plot page when the second update is called, from which I would infer that you can't update the update or I'm not calling it correctly. I have a nagging sense too that the real way to do this is with a non-standard use of panel.superpose but I don't quite see how to do that from available examples. TIF for any suggestions, Bryan Example: a function then, the call to the function fancy.lm - function(x, y, fit = TRUE, resid = TRUE){ model - lm(y ~ x) y.pred - predict(model) # Compute residuals for plotting res.x - as.vector(rbind(x, x, rep(NA,length(x # NAs induce breaks in line res.y - as.vector(rbind(y, y.pred, rep(NA,length(x # after Fig 5.1 of DAAG (clever!) p - xyplot(y ~ x, pch = 20, panel = function(...) { panel.xyplot(...) # not strictly necessary if I understand correctly }) plot(p, more = TRUE) if (fit) { plot(update(p, more = TRUE, panel = function(...){ panel.xyplot(...) panel.abline(model, col = red) }))} if (resid) { plot(update(p, more = TRUE, panel = function(...){ panel.xyplot(res.x, res.y, col = lightblue, type = l) }))} } x - jitter(c(1:10), factor = 5) y - jitter(c(1:10), factor = 10) fancy.lm(x, y, fit = TRUE, resid = TRUE) Session Info sessionInfo() R version 2.7.1 (2008-06-23) i386-apple-darwin8.10.1 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] datasets grid grDevices graphics stats utils methods [8] base other attached packages: [1] fastICA_1.1-9 DescribeDisplay_0.1.3 ggplot_0.4.2 [4] RColorBrewer_1.0-2reshape_0.8.0 MASS_7.2-42 [7] pcaPP_1.5 mvtnorm_0.9-0 hints_1.0.1-1 [10] mvoutlier_1.3 robustbase_0.2-8 lattice_0.17-8 [13] rggobi_2.1.9 RGtk2_2.12.5-3 loaded via a namespace (and not attached): [1] tools_2.7.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lattice Version of grconvertX or variant on panel.text?
Still playing with Lattice... I want to use panel.text(x, y etc) but with x and y in plot coordinates (0,1), not user coordinates. I think if I had this problem with traditional graphics, I could use grconvertX to make the change. I did come across convertX {grid} but this doesn't seem to be what I need. Is there a function like grconvertX in Lattice, or is there a flag or some other method of making panel.text use plot coordinates? Thanks, Bryan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lattice Version of grconvertX or variant on panel.text?
Never mind, I just hard-coded it using ratios. Simpler than I thought. Thanks, Bryan On 7/20/08 9:03 PM, Bryan Hanson [EMAIL PROTECTED] wrote: Still playing with Lattice... I want to use panel.text(x, y etc) but with x and y in plot coordinates (0,1), not user coordinates. I think if I had this problem with traditional graphics, I could use grconvertX to make the change. I did come across convertX {grid} but this doesn't seem to be what I need. Is there a function like grconvertX in Lattice, or is there a flag or some other method of making panel.text use plot coordinates? Thanks, Bryan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] .First and .Rprofile won't run on startup
I accomplish this a little differently. On the mac, in your home directory (e.g. /Users/susanamrose) there is/could be a hidden file called .Rprofile You can edit it with vi for instance by getting a terminal window and vi .Rprofile It will be created it if it doesn't exist. I keep all my local functions in a particular directory, then load them via the .Rprofile by putting the following lines in .Rprofile funcdir - /Users/susanamrose/Functions z - paste(funcdir, /LoadFunctions.R, sep = ) source(z, chdir = TRUE) This will source/execute whatever you put in the file LoadFunctions.R in the specified directory when R starts up. So, for instance, LoadFunctions.R could be a bunch of source(func.R) statements. This also gives you a short cut to get to your functions directory by setwd(fundir). I actually have a number of commonly used directories defined this way for convenience. HTH Bryan On 7/14/08 2:34 PM, Susan Amrose [EMAIL PROTECTED] wrote: I'm trying to source a file automatically every time I start R. I tried adding the following .First function in a file Rprofile.site in my $R_HOME/etc/ directory (verified $R_HOME by Sys.getenv()) as well as in a file .Rprofile in my $HOME directory and .Rprofile in the working directory: .First - function(){ source(file.path(Sys.getenv(HOME), R, functions,standard.r)) cat(Actually read your file) } - but no luck. I'm using a Mac (OS 10.4). It never runs (the file is not sourced and the text does not appear). Does anyone have any suggestions? Ideally, I would like to have a directory and source all the files in the directory in startup, but this is just a first step (is that possible?). Thanks in advance!! -Susan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coloring Stripchart Points, or Better, Lattice Equivalent
If anyone remains interested, the solution in base graphics is to modify stripchart.default, the last couple of lines where the coloring of points defaults in a way that depends on groups. In my example, the groups are being handled collectively with the coloring. Code is below. Deepayan has noted that stacking of this type is not possible in Lattice graphics, and would have to be coded directly (probably not too much of a modification of what I give here, but I'm a novice!). Thanks, Bryan stripchart.colsym - function(x, method=overplot, jitter=0.1, offset=1/3, vertical=FALSE, group.names, add = FALSE, at = NULL, xlim=NULL, ylim=NULL, ylab=NULL, xlab=NULL, dlab=, glab=, log=, pch=0, col=par(fg), cex=par(cex), axes=TRUE, frame.plot=axes, ...) { method - pmatch(method, c(overplot, jitter, stack))[1] if(is.na(method) || method==0) stop(invalid plotting method) groups - if(is.list(x)) x else if(is.numeric(x)) list(x) if(0 == (n - length(groups))) stop(invalid first argument) if(!missing(group.names)) attr(groups, names) - group.names else if(is.null(attr(groups, names))) attr(groups, names) - 1:n if(is.null(at)) at - 1:n else if(length(at) != n) stop(gettextf('at' must have length equal to the number %d of groups, n), domain = NA) if (is.null(dlab)) dlab - deparse(substitute(x)) if(!add) { dlim - c(NA, NA) for(i in groups) dlim - range(dlim, i[is.finite(i)], na.rm = TRUE) glim - c(1,n)# in any case, not range(at) if(method == 2) { # jitter glim - glim + jitter * if(n == 1) c(-5, 5) else c(-2, 2) } else if(method == 3) { # stack glim - glim + if(n == 1) c(-1,1) else c(0, 0.5) } if(is.null(xlim)) xlim - if(vertical) glim else dlim if(is.null(ylim)) ylim - if(vertical) dlim else glim plot(xlim, ylim, type=n, ann=FALSE, axes=FALSE, log=log, ...) if (frame.plot) box() if(vertical) { if (axes) { if(n 1) axis(1, at=at, labels=names(groups), ...) Axis(x, side = 2, ...) } if (is.null(ylab)) ylab - dlab if (is.null(xlab)) xlab - glab } else { if (axes) { Axis(x, side = 1, ...) if(n 1) axis(2, at=at, labels=names(groups), ...) } if (is.null(xlab)) xlab - dlab if (is.null(ylab)) ylab - glab } title(xlab=xlab, ylab=ylab) } csize - cex* if(vertical) xinch(par(cin)[1]) else yinch(par(cin)[2]) for(i in 1:n) { x - groups[[i]] y - rep.int(at[i], length(x)) if(method == 2) ## jitter y - y + stats::runif(length(y), -jitter, jitter) else if(method == 3) { ## stack xg - split(x, factor(x)) xo - lapply(xg, seq_along) x - unlist(xg, use.names=FALSE) y - rep.int(at[i], length(x)) + (unlist(xo, use.names=FALSE) - 1) * offset * csize } if(vertical) points(y, x, col=col, pch=pch, cex=cex) else points(x, y, col=col, pch=pch, cex=cex) } } samples - 100 # must be even index - round(runif(samples, 1, 100)) # set up data resp - rbinom(samples, 1, 0.5) yr - rep(c(2005, 2006), samples/2) all - data.frame(index, resp, yr) all$sym - ifelse(all$resp == 1, 3, 1) all$col - ifelse(all$yr == 2005, red, blue) all$count - rep(1, length(all$index)) all - all[order(all$index, all$yr, all$resp),] # for easier inspection row.names(all) - c(1:samples) # for easier inspection one - all[(all$yr == 2005 all$resp == 0),] # First 2005/0 at bottom two - all[(all$yr == 2005 all$resp == 1),] # Then 2005/1 three - all[(all$yr == 2006 all$resp == 0),] # Now 2006/0 four - all[(all$yr == 2006 all$resp == 1),] # Finally 2006/1 par(mfrow = c(5, 1)) par(plt = c(0.1, 0.9, 0.25, 0.75)) stripchart(one$index, method = stack, ylim = c(0,10), xlim = c(1,100), col = one$col, pch = one$sym) mtext(2005/0 only, side = 3) stripchart(two$index, method = stack, ylim = c(0,10), xlim = c(1,100), col = two$col, pch = two$sym) mtext(2005/1 only, side = 3) stripchart(three$index, method = stack, ylim = c(0,10), xlim = c(1,100), col = three$col, pch = three$sym) mtext(2006/0 only, side = 3) stripchart(four$index, method = stack, ylim = c(0,10), xlim = c(1,100), col = four$col, pch = four$sym) mtext(2006/1 only, side = 3) stripchart.colsym(all$index, method = stack, ylim = c(0,10), xlim = c(1,100), col = all$col, pch = all$sym) mtext(all data, colored and symbolized as above, side = 3) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coloring Stripchart Points, or Better, Lattice Equivalent
Thanks Deepayan. That's the conclusion I have gradually reaching! Bryan On 6/23/08 5:57 PM, Deepayan Sarkar [EMAIL PROTECTED] wrote: On 6/22/08, Bryan Hanson [EMAIL PROTECTED] wrote: Thanks Gabor, I'm getting closer. Is there a way to spread out resp values vertically for a given value of index? In base graphics, stripchart does this with method = stack. But in lattice, stack = TRUE does something rather different, and I don't see a combination of lattice arguments that does it like base graphics. Right, the default lattice panel function doesn't support stacking. I think your best best, if you want to retain vectorization of col and pch, is to compute the y-coordinates yourself and use xyplot() to plot. -Deepayan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coloring Stripchart Points, or Better, Lattice Equivalent
Below is a revised set of code that demonstrates my question a little more clearly, I hope. When plotting all the data (5th panel), col sym don't seem to be passed correctly, as the (random) first value for col sym are used for all points (run the code, then run it again, you'll see how the 5th panel changes depending upon col sym for the first data point). The 5th panel should ideally be the sum of the 4 panels above, keeping col sym intact. Also, I would rather have this in lattice or ggplot2, if anyone sees how to convert it. Thanks once again, several of you have made very useful suggestions off list. Bryan samples - 100 # must be even index - round(runif(samples, 1, 100)) # set up data resp - rbinom(samples, 1, 0.5) yr - rep(c(2005, 2006), samples/2) all - data.frame(index, resp, yr) all$sym - ifelse(all$resp == 1, 1, 3) all$col - ifelse(all$yr == 2005, red, blue) all$count - rep(1, length(all$index)) all - all[order(all$index, all$yr, all$resp),] # for easier inspection row.names(all) - c(1:samples) # for easier inspection one - all[(all$yr == 2005 all$resp == 0),] # First 2005/0 at top two - all[(all$yr == 2005 all$resp == 1),] # Then 2005/1 three - all[(all$yr == 2006 all$resp == 0),] # Now 2006/0 four - all[(all$yr == 2006 all$resp == 1),] # Finally 2006/1 par(mfrow = c(5, 1)) par(plt = c(0.1, 0.9, 0.25, 0.75)) stripchart(one$index, method = stack, ylim = c(0,10), xlim = c(1,100), col = one$col, pch = one$sym) mtext(2005/0, side = 3) stripchart(two$index, method = stack, ylim = c(0,10), xlim = c(1,100), col = two$col, pch = two$sym) mtext(2005/1, side = 3) stripchart(three$index, method = stack, ylim = c(0,10), xlim = c(1,100), col = three$col, pch = three$sym) mtext(2006/0, side = 3) stripchart(four$index, method = stack, ylim = c(0,10), xlim = c(1,100), col = four$col, pch = four$sym) mtext(2006/1, side = 3) stripchart(all$index, method = stack, ylim = c(0,10), xlim = c(1,100), col = all$col, pch = all$sym) mtext(col sym always taken from 1st data point when all data is plotted!, side = 3) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coloring Stripchart Points, or Better, Lattice Equivalent
Thanks Gabor, I'm getting closer. Is there a way to spread out resp values vertically for a given value of index? In base graphics, stripchart does this with method = stack. But in lattice, stack = TRUE does something rather different, and I don't see a combination of lattice arguments that does it like base graphics. Thanks, Bryan On 6/22/08 12:48 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: Actually I am not sure if my prior answer was correct. I think its ok with one panel but you might have to use a panel function is there are several. With one panel it seems ok: stripplot(~ index, all, col = all$col, pch = all$sym) On Sun, Jun 22, 2008 at 12:28 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: Try this: library(lattice) all$resp - as.factor(all$resp) stripplot(~ index | resp * yr, all, col = all$col, pch = all$sym, layout = c(1, 4)) On Sun, Jun 22, 2008 at 10:43 AM, Bryan Hanson [EMAIL PROTECTED] wrote: Below is a revised set of code that demonstrates my question a little more clearly, I hope. When plotting all the data (5th panel), col sym don't seem to be passed correctly, as the (random) first value for col sym are used for all points (run the code, then run it again, you'll see how the 5th panel changes depending upon col sym for the first data point). The 5th panel should ideally be the sum of the 4 panels above, keeping col sym intact. Also, I would rather have this in lattice or ggplot2, if anyone sees how to convert it. Thanks once again, several of you have made very useful suggestions off list. Bryan samples - 100 # must be even index - round(runif(samples, 1, 100)) # set up data resp - rbinom(samples, 1, 0.5) yr - rep(c(2005, 2006), samples/2) all - data.frame(index, resp, yr) all$sym - ifelse(all$resp == 1, 1, 3) all$col - ifelse(all$yr == 2005, red, blue) all$count - rep(1, length(all$index)) all - all[order(all$index, all$yr, all$resp),] # for easier inspection row.names(all) - c(1:samples) # for easier inspection one - all[(all$yr == 2005 all$resp == 0),] # First 2005/0 at top two - all[(all$yr == 2005 all$resp == 1),] # Then 2005/1 three - all[(all$yr == 2006 all$resp == 0),] # Now 2006/0 four - all[(all$yr == 2006 all$resp == 1),] # Finally 2006/1 par(mfrow = c(5, 1)) par(plt = c(0.1, 0.9, 0.25, 0.75)) stripchart(one$index, method = stack, ylim = c(0,10), xlim = c(1,100), col = one$col, pch = one$sym) mtext(2005/0, side = 3) stripchart(two$index, method = stack, ylim = c(0,10), xlim = c(1,100), col = two$col, pch = two$sym) mtext(2005/1, side = 3) stripchart(three$index, method = stack, ylim = c(0,10), xlim = c(1,100), col = three$col, pch = three$sym) mtext(2006/0, side = 3) stripchart(four$index, method = stack, ylim = c(0,10), xlim = c(1,100), col = four$col, pch = four$sym) mtext(2006/1, side = 3) stripchart(all$index, method = stack, ylim = c(0,10), xlim = c(1,100), col = all$col, pch = all$sym) mtext(col sym always taken from 1st data point when all data is plotted!, side = 3) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Coloring Stripchart Points, or Better, Lattice Equivalent
Hi All. I have the commands below to create a stripchart/plot. I was hoping to color the points in the plot by yr, and use a symbol that varied with resp. However, the outcome makes it appear as though the point by point col and pch data is not being passed properly. Any suggestions? And truthfully, I¹d rather be doing this with Lattice, but I¹ve tried several variations of stripplot and can¹t even get something with the general layout of the stripchart version. Thanks, Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA index - round(runif(100, 1, 100)) resp - rbinom(100, 1, 0.5) yr - rep(c(2005, 2006), 50) all - data.frame(index, resp, yr) for (n in 1:length(all$index)) { if (all$yr[n] == 2005) {all$col[n] - red} else {all$col[n] - blue} } for (n in 1:length(all$index)) { if (all$resp[n] == 1) {all$sym[n] - 1} else {all$sym[n] - 3} } stripchart(all$index, method = stack, ylim = c(0,10), col = all$col, pch = all$sym) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pointwise Confidence Bounds on Logistic Regression
[I've ommitted some of the conversation so far...] E.g. in a logistic model, with (say) eta = beta_0 + beta_1*x one may find, on the linear predictor scale, A and B (say) such that P(A = eta = B) = 0.95. Then P(expit(A) = expit(eta) = expit(B)) = 0.95, which is exactly what is wanted. I think I follow the above conceptually, but I don't know how to implement it, though I fooled around (unsuccessfully) with some of the variations on predict(). I'm trying to learn this in response to a biology colleague who did something similar in SigmaPlot. I can already tell that SigmaPlot did a lot of stuff for him in the background. The responses are 0/1 of a particular observation by date. The following code simulates what's going on (note that I didn't use 0/1 since this leads to a recognized condition/warning of fitting 1's and 0's. I've requested Brian's Pattern Recognition book so I know what the problem is and how to solve it). My colleague is looking at two populations in which the LD50 would differ. I'd like to be able to put the pointwise confidence bounds on each curve so he can tell if the populations are really different. By the way, this code does give a (minor?) error from glm (which you will see). Can you make a suggestion about how to get those confidence bounds on there? Also, is a probit link more appropriate here? Thanks, Bryan x - c(1:40) y1 - c(rep(0.1,10), rep(NA, 10), rep(0.9,20)) y2 - c(rep(0.1,15), rep(NA, 10), rep(0.9,15)) data - as.data.frame(cbind(x,y1,y2)) plot(x, y1, pch = 1, ylim = c(0,1), col = red) points(x, y2, pch = 3, col = blue) abline(h = 0.5, col = gray) fit1 - glm(y1~x, family = binomial(link = logit), data, na.action = na.omit) fit2 - glm(y2~x, family = binomial(link = logit), data, na.action = na.omit) lines(fit1$model$x, fit1$fitted.values, col = red) lines(fit2$model$x, fit2$fitted.values, col = blue) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pointwise Confidence Bounds on Logistic Regression
Thanks so much to all who offered assistance. I have to say it would have taken me a long time to figure this out, so I am most grateful. Plus, studying your examples greatly improves my understanding. As a follow up, the fit process gives the following error: Warning messages: 1: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! Is this something I should worry about? It doesn't arise from glm or glm.fit Thanks, Bryan For the record, a final complete working code is appended below. x - c(1:40) y1 - c(rep(0.1,10), rep(NA, 10), rep(0.9,20)) y2 - c(rep(0.1,15), rep(NA, 10), rep(0.9,15)) data - as.data.frame(cbind(x,y1,y2)) plot(x, y1, pch = 1, ylim = c(0,1), col = red, main = Logistic Regression w/Confidence Bounds, ylab = y values, xlab = x values) points(x, y2, pch = 3, col = blue) abline(h = 0.5, col = gray) fit1 - glm(y1~x, family = binomial(link = logit), data, na.action = na.omit) fit2 - glm(y2~x, family = binomial(link = logit), data, na.action = na.omit) lines(fit1$model$x, fit1$fitted.values, col = red) lines(fit2$model$x, fit2$fitted.values, col = blue) ## predictions on scale of link function pred1 - predict(fit1, se.fit = TRUE) pred2 - predict(fit2, se.fit = TRUE) ## constant for 95% confidence bands ## getting two t values is redundant here as fit1 and fit2 ## have same residual df, but the real world may be different res.df - c(fit1$df.residual, fit2$df.residual) ## 0.975 because we want 2.5% in upper and lower tail const - qt(0.975, df = res.df) ## confidence bands on scale of link function upper1 - pred1$fit + (const[1] * pred1$se.fit) lower1 - pred1$fit - (const[1] * pred1$se.fit) upper2 - pred2$fit + (const[2] * pred2$se.fit) lower2 - pred2$fit - (const[2] * pred2$se.fit) ## bind together into a data frame bands - data.frame(upper1, lower1, upper2, lower2) ## transform on to scale of response bands - data.frame(lapply(bands, binomial(link = logit)$linkinv)) ## plot confidence bands lines(fit1$model$x, bands$upper1, col = pink) lines(fit1$model$x, bands$lower1, col = pink) lines(fit2$model$x, bands$upper2, col = lightblue) lines(fit2$model$x, bands$lower2, col = lightblue) On 6/19/08 12:28 PM, Gavin Simpson [EMAIL PROTECTED] wrote: On Thu, 2008-06-19 at 10:42 -0400, Bryan Hanson wrote: [I've ommitted some of the conversation so far...] E.g. in a logistic model, with (say) eta = beta_0 + beta_1*x one may find, on the linear predictor scale, A and B (say) such that P(A = eta = B) = 0.95. Then P(expit(A) = expit(eta) = expit(B)) = 0.95, which is exactly what is wanted. I think I follow the above conceptually, but I don't know how to implement it, though I fooled around (unsuccessfully) with some of the variations on predict(). I'm trying to learn this in response to a biology colleague who did something similar in SigmaPlot. I can already tell that SigmaPlot did a lot of stuff for him in the background. The responses are 0/1 of a particular observation by date. The following code simulates what's going on (note that I didn't use 0/1 since this leads to a recognized condition/warning of fitting 1's and 0's. I've requested Brian's Pattern Recognition book so I know what the problem is and how to solve it). My colleague is looking at two populations in which the LD50 would differ. I'd like to be able to put the pointwise confidence bounds on each curve so he can tell if the populations are really different. By the way, this code does give a (minor?) error from glm (which you will see). Can you make a suggestion about how to get those confidence bounds on there? Also, is a probit link more appropriate here? Thanks, Bryan x - c(1:40) y1 - c(rep(0.1,10), rep(NA, 10), rep(0.9,20)) y2 - c(rep(0.1,15), rep(NA, 10), rep(0.9,15)) data - as.data.frame(cbind(x,y1,y2)) plot(x, y1, pch = 1, ylim = c(0,1), col = red) points(x, y2, pch = 3, col = blue) abline(h = 0.5, col = gray) fit1 - glm(y1~x, family = binomial(link = logit), data, na.action = na.omit) fit2 - glm(y2~x, family = binomial(link = logit), data, na.action = na.omit) lines(fit1$model$x, fit1$fitted.values, col = red) lines(fit2$model$x, fit2$fitted.values, col = blue) The point is to get predictions on the scale of the link function, generate 95% confidence bands in the normal way and then transform the confidence bands onto the scale of the response using the inverse of the link function used to fit the model. [note, am doing this from memory, so best to check this is right -- I'm sure someone will tell me very quickly if I have gone wrong anywhere!] ## predictions on scale of link function pred1 - predict(fit1, se.fit = TRUE) pred2 - predict(fit2, se.fit = TRUE) ## constant for 95% confidence bands ## getting two t values is redundant here as fit1 and fit2 ## have same residual df, but the real world may be different res.df - c(fit1$df.residual, fit2$df.residual) ## 0.975 because we want 2.5% in upper and lower tail
[R] Pointwise Confidence Bounds on Logistic Regression
Hi all. I hope I have my terminology right here... For a simple lm, one can add ³pointwise confidence bounds² to a fitted line using something like predict(results.lm, newdata = something, interval = confidence) (I'm following DAAG page 154-155 for this) I would like to do the same thing for a glm of the logistic regression type, for instance, the example in MASS pg 190-192 (available in the help page for predict.glm). However, predict.glm does not have the same kind of features as plain old predict, i.e. One cannot specify interval = confidence From what I've read, pointwise confidence bounds are computed from the SE's for each point. However, I don't see quite where to extract this information with a glm So, is there an existing function that does what I am describing for a glm, or can someone point me in the right direction to start writing my own? TIA as always, Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Suggestions: Terminology Pkgs for following spectra over time
Hi Folks... No code to troubleshoot here. I need some suggestions about the right terminology to use in further searching, and any suggestions about R pkgs that might be appropriate. I am in the planning stages of a project in which IR, NMR and other spectra (I'm a chemist) would be collected on various samples, and individual samples would be followed over time. The spectra will be feature rich/complex, so one can't see the changes by visual inspection. The spectra are basically 2D matrices: peaks as a function of frequencies. So the data set is in the form of spectra of a single sample over time, for multiple samples. I am wondering about methods R pkgs that can be used to analyze changes in the spectra over time. For instance, I would like to find specific peaks that are changing over time, sets of peaks that are changing in a correlated way over time etc. I'd like to do this in an efficient and statistically valid way. What I am thinking of is somewhat like a time series, somewhat like image analysis (but only 2D), but it's not quite either of those and I need to know what it's really called to investigate further. Any suggestions as to R pkgs and key words/phrases will be appreciated. TIA, Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle Indiana USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Definition of wrapper?
I think I more or less understand what a ³wrapper² is, but I¹d like to hear how more experienced R users define it, and especially I'd like to know if there is a formal definition. In my reading, it seems like there are a fairly wide range of meanings, but they are all conceptually similar. I've looked in a couple of the classic R texts, the extensions and developers' manuals, and R help archives, and didn't find a definition. Of course, I may have missed it. Thanks in advance. Bryan ** Bryan Hanson Professor of Chemistry Biochemistry DePauw University 602 S. College Avenue Greencastle, IN 46135 PHONE 765-658-4602 FAX 765-658-6084 [EMAIL PROTECTED] http://academic.depauw.edu/~hanson/deadpezsociety.html http://www.depauw.edu/acad/chemistry/ http://academic.depauw.edu/~hanson/UMP/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with graphics device in Mac OS X
For whatever reason, on the Mac, you have to open a new Quartz device window before making the graphics call. So, from the menu, pull down under Window to New Quartz Device Window. Then all graphics calls go to that (initially empty) window, and any further calls replace the previous contents of the window. This window doesn't print so well, but your students can divert their output to a pdf easily for really nice plots. BTW, people were reporting problems with OS 10.5 and R. These may have been fixed, but if you have trouble, it's discussed in the archives. Bryan On 12/10/07 3:37 PM, WAYNE KING [EMAIL PROTECTED] wrote: Hello List, I am teaching a basic course where students are encouraged to use R. There are a few students using Mac OS X. As a test we downloaded and installed the latest .dmg file (R-2.6.1.dmg) onto a intel Mac running 10.5.1. A device query yields getOption(device) quartz But any plot command does not bring up a plot (e.g. plot(), boxplot(), hist()). I found a thread concerning X11 windows under Mac OS X but I feel these users will most likely be just using the native quartz device. Invoking a call to quartz() first does not seem to help, e.g. quartz() plot(rnorm(100,0,1)) produces no output and no error message (Nothing happens). A call to dev.cur() seems to indicate a device is active. quartz() dev.cur() quartz 2 but again a plot command produces no figure. Sorry am I not a Mac OS user and I did check the archives but found mostly discussions on X11() under Mac OS X. Wayne __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Graphical Manova: Fails When There Are Three Factors
Hi R Gurus Lurkers... Thanks in advance to anyone who is willing to tackle this! Bryan I have been implementing the graphical manova method described in An Introduction to Ggobi (from the Ggobi web site). A stand alone working code is appended below. The code is almost the same as described in the Introduction document, with one bug fix. A quick summary of the code is that it takes one's data, and fits an ellipsoid to it at a requested confidence level, then color codes everything for display. If you don't have ggobi installed, remove the ggobi from the last line and just use graphic_manova(response,class) You will probably have to comment out the last 4 lines of the graphic_manova function as well to avoid trivial errors. Here's the R question: If the variable class has more than two levels (factors) in it, the code executes but runs into an error because cis = lapply(sub.groups, combined, cl=cl) creates cis with a bunch of NA's, which then cause havoc when one tries to do any matrix operations on it (not surprisingly). The NA's follow an interesting pattern: the ellipsoid points are generated for the first two dimensions (pc1 and pc2), but NA's are generated for the third dimension (pc3). So cis contains the 3 original data dimensions, 1000 added ellipsoid points to go with pc1, and 1000 added ellipsoid points to go with pc2, and 1000 NA's to go with pc3 I don't see why the third set of data is any different than the first two, and the first two execute correctly. # generate sample data pc1=rnorm(20, sd=1) pc2=rnorm(20, mean = 10, sd=2) pc3=rnorm(20, sd=3) class=factor(c(group 1,group 1,group 1,group 2,group 2,group 2,group 2,group 2,group 2,group 2,group 1,group 1,group 1,group 1,group 2,group 2,group 2,group 2,group 1,group 2), ordered=TRUE) response=cbind(pc1, pc2, pc3) # Now generate confidence ellipsoids using the method described # in An Introduction to RGGOBI with minor modifications # Define 3 functions to do the heavy lifting # First: a function that generates a random set of points on a sphere # centered on the mean of the passed data, skewed to match the variance # of the passed data (which turns the sphere into an ellipsoid), # and adjusted in size to match the desired confidence level. ellipse = function(data, npoints=1000, cl=0.95, mean=colMeans(data), cov=var(data),n=nrow(data)) { norm.vec = function(x) x/sqrt(sum(x^2)) p = length(mean) ev = eigen(cov) sphere = matrix(rnorm(npoints*p), ncol=p) cntr = t(apply(sphere, 1, norm.vec)) # normalized sphere cntr = cntr %*% diag(sqrt(ev$values)) %*% t(ev$vectors) # ellipsoid of correct shape Fcrit = qf(cl, p, n-p) scalefactor = sqrt((p*(n-1))/(n*(n-p)))*Fcrit cntr = cntr*scalefactor # ellipsoid of correct size if (!missing(data)) # only relevant when no data passed colnames(cntr) = colnames(data) cntr+rep(mean, each=npoints) } # Next a function that combines the original data with the generated ellipsoid combined = function(data, cl=0.95) { dm = data.matrix(data) ellipse = as.data.frame(ellipse(dm, npoints=1000, cl=cl)) both = rbind(data, ellipse) both$SIM = factor(rep(c(FALSE,TRUE),c(nrow(data),1000))) both } # Now a function to separate the dataset into categories graphic_manova = function(data, catvar, cl=0.68) { sub.groups = data.frame(cbind(data,catvar)) sub.groups = split(sub.groups,catvar) cis = lapply(sub.groups, combined, cl=cl) df = as.data.frame(do.call(rbind, cis)) df$var = factor(rep(names(cis), sapply(cis, nrow))) g = ggobi(df) glyph_type(g[1]) = c(6,1)[df$SIM] # makes dots of ellipsoids tiny glyph_color(g[1]) = df$var # properly colors the two groups invisible(g) } # Now actually do the computations plot the data! # ggobi(combined(response)) # This is a debugging check point ggobi(graphic_manova(response,class)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading a file with read.csv: two character rows not interpreted as I hope
OK, I fixed it myself! Here's the code. Of course, it mostly seems simple once one gets it working... Thanks Jim. Bryan sample.info = read.table(input.file.name, sep=,, as.is=TRUE, nrows=3) # get the first three lines with sample info in character format sample.names = sample.info[1,] sample.colors = sample.info[2,]; sample.colors = as.character(sample.colors[-1]) sample.class = sample.info[3,]; sample.class = as.character(sample.class[-1]) data = read.table(input.file.name, sep=,, skip=3) colnames(data) = sample.names On 10/30/07 10:53 PM, Bryan Hanson [EMAIL PROTECTED] wrote: Jim, thanks for the suggestion. There is still something subtle non-intuitive going on here. I adapted your code with minor changes as follows (I had to add the sep argument) but get different behavior: c.names - scan(file.csv, what='', nlines=1, sep=,) # read column names c.options - read.table(file.csv, as.is=TRUE, nrows=2, sep=,) # get lines 2-3 c.data - read.table(file.csv, sep=,) # rest of the data colnames(file.csv) - c.names Your code works perfectly (you knew that!). My adaptation runs, but c.options contains the first two lines, not lines 2 3, and c.data contains the contents of the entire file as *factors* (data type of c.names c.options is correct - character). How strange! Also, and this is an observation rather than a question: in your code, you call scan and get the first line as characters, then you do read.table which gets lines 2 3 presumably because the first line, from read.table's perspective is a hidden label (?), then the second time you use read.table the hidden first line is ignored, as are the two lines with character data. I really don't understand these behaviors, which is probably why I'm having trouble parsing the file! Thanks, Bryan On 10/30/07 8:40 PM, jim holtman [EMAIL PROTECTED] wrote: Here is one way. You will probably use 'file' instead of textConnection x.in - textConnection('wavelength SampleA SampleB SampleC SampleD + color green black black green + class Class 1 Class 2 Class 2 Class 1 + 403 1.94E-01 2.14E-01 2.11E-01 1.83E-01 + 409 1.92E-01 1.89E-01 2.00E-01 1.82E-01 + 415 1.70E-01 1.99E-01 1.94E-01 1.86E-01 + 420 1.59E-01 1.91E-01 2.16E-01 1.74E-01 + 426 1.50E-01 1.66E-01 1.72E-01 1.58E-01 + 432 1.42E-01 1.50E-01 1.62E-01 1.48E-01') c.names - scan(x.in, what='', nlines=1) # read column names Read 5 items c.options - read.table(x.in, as.is=TRUE, nrows=2) # get lines 2-3 c.data - read.table(x.in) # rest of the data colnames(c.data) - c.names close(x.in) c.options # here are lines 2-3 V1 V2 V3 V4 V5 1 color green black black green 2 class Class 1 Class 2 Class 2 Class 1 c.data # your data wavelength SampleA SampleB SampleC SampleD 1403 0.194 0.214 0.211 0.183 2409 0.192 0.189 0.200 0.182 3 415 0.170 0.199 0.194 0.186 4420 0.159 0.191 0.216 0.174 5426 0.150 0.166 0.172 0.158 6432 0.142 0.150 0.162 0.148 On 10/30/07, Bryan Hanson [EMAIL PROTECTED] wrote: Hi Folks... Œbeen playing with this for a while, with no luck, so I¹m hoping someone knows it off the top of their head... Difficult to find this nuance in the archives, as so many msgs deal with read.csv! I¹m trying to read a data file with the following structure (a little piece of the actual data, they are actually csv just didn¹t paste with the commas): wavelength SampleA SampleB SampleC SampleD color green black black green class Class 1 Class 2 Class 2 Class 1 403 1.94E-01 2.14E-01 2.11E-01 1.83E-01 409 1.92E-01 1.89E-01 2.00E-01 1.82E-01 415 1.70E-01 1.99E-01 1.94E-01 1.86E-01 420 1.59E-01 1.91E-01 2.16E-01 1.74E-01 426 1.50E-01 1.66E-01 1.72E-01 1.58E-01 432 1.42E-01 1.50E-01 1.62E-01 1.48E-01 Columns after the first one are sample names. 2nd row is the list of colors to use in later plotting. 3rd row is the class for later manova. The rest of it is x data in the first column with y1, y2...following for plotting. I can read the file w/o the color or class rows with read.csv just fine, makes a nice data frame with proper data types. The problem comes when parsing the 2nd and 3rd rows. Here¹s the code: data = read.csv(filename, header=TRUE) # read in data color = data[1,]; color = data[-1] # capture color info throw out 1st value class = data[2,]; class = class[-1] # capture category info throw out 1st value cleaned.data = data[-1,] # remove color category info for matrix operations cleaned.data = data[-1,] freq = data[,1] # capture frequency info What happens is that freq is parsed as factors, and the color and class are parsed as a data frames of factors. I need color and class to be characters which I can pass to functions in the typical way one uses colors and levels. I need the freq the cleaned.data info as numeric
[R] Reading a file with read.csv: two character rows not interpreted as I hope
Hi Folks... been playing with this for a while, with no luck, so I¹m hoping someone knows it off the top of their head... Difficult to find this nuance in the archives, as so many msgs deal with read.csv! I¹m trying to read a data file with the following structure (a little piece of the actual data, they are actually csv just didn¹t paste with the commas): wavelength SampleA SampleB SampleC SampleD color green black black green class Class 1 Class 2 Class 2 Class 1 403 1.94E-01 2.14E-01 2.11E-01 1.83E-01 409 1.92E-01 1.89E-01 2.00E-01 1.82E-01 415 1.70E-01 1.99E-01 1.94E-01 1.86E-01 420 1.59E-01 1.91E-01 2.16E-01 1.74E-01 426 1.50E-01 1.66E-01 1.72E-01 1.58E-01 432 1.42E-01 1.50E-01 1.62E-01 1.48E-01 Columns after the first one are sample names. 2nd row is the list of colors to use in later plotting. 3rd row is the class for later manova. The rest of it is x data in the first column with y1, y2...following for plotting. I can read the file w/o the color or class rows with read.csv just fine, makes a nice data frame with proper data types. The problem comes when parsing the 2nd and 3rd rows. Here¹s the code: data = read.csv(filename, header=TRUE) # read in data color = data[1,]; color = data[-1] # capture color info throw out 1st value class = data[2,]; class = class[-1] # capture category info throw out 1st value cleaned.data = data[-1,] # remove color category info for matrix operations cleaned.data = data[-1,] freq = data[,1] # capture frequency info What happens is that freq is parsed as factors, and the color and class are parsed as a data frames of factors. I need color and class to be characters which I can pass to functions in the typical way one uses colors and levels. I need the freq the cleaned.data info as numeric for plotting. I don¹t feel I¹m far off from things working, but that¹s where you all come in! Seems like an argument of as.something is needed, but the ones I¹ve tried don¹t work. Would it help to put color and class above the x,y data in the file, then clean it off? Btw, I¹m on a Mac using R 2.6.0. Thanks in advance, Bryan * Bryan Hanson Professor of Chemistry Biochemistry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading a file with read.csv: two character rows not interpreted as I hope
Jim, thanks for the suggestion. There is still something subtle non-intuitive going on here. I adapted your code with minor changes as follows (I had to add the sep argument) but get different behavior: c.names - scan(file.csv, what='', nlines=1, sep=,) # read column names c.options - read.table(file.csv, as.is=TRUE, nrows=2, sep=,) # get lines 2-3 c.data - read.table(file.csv, sep=,) # rest of the data colnames(file.csv) - c.names Your code works perfectly (you knew that!). My adaptation runs, but c.options contains the first two lines, not lines 2 3, and c.data contains the contents of the entire file as *factors* (data type of c.names c.options is correct - character). How strange! Also, and this is an observation rather than a question: in your code, you call scan and get the first line as characters, then you do read.table which gets lines 2 3 presumably because the first line, from read.table's perspective is a hidden label (?), then the second time you use read.table the hidden first line is ignored, as are the two lines with character data. I really don't understand these behaviors, which is probably why I'm having trouble parsing the file! Thanks, Bryan On 10/30/07 8:40 PM, jim holtman [EMAIL PROTECTED] wrote: Here is one way. You will probably use 'file' instead of textConnection x.in - textConnection('wavelength SampleA SampleB SampleC SampleD + color green black black green + class Class 1 Class 2 Class 2 Class 1 + 403 1.94E-01 2.14E-01 2.11E-01 1.83E-01 + 409 1.92E-01 1.89E-01 2.00E-01 1.82E-01 + 415 1.70E-01 1.99E-01 1.94E-01 1.86E-01 + 420 1.59E-01 1.91E-01 2.16E-01 1.74E-01 + 426 1.50E-01 1.66E-01 1.72E-01 1.58E-01 + 432 1.42E-01 1.50E-01 1.62E-01 1.48E-01') c.names - scan(x.in, what='', nlines=1) # read column names Read 5 items c.options - read.table(x.in, as.is=TRUE, nrows=2) # get lines 2-3 c.data - read.table(x.in) # rest of the data colnames(c.data) - c.names close(x.in) c.options # here are lines 2-3 V1 V2 V3 V4 V5 1 color green black black green 2 class Class 1 Class 2 Class 2 Class 1 c.data # your data wavelength SampleA SampleB SampleC SampleD 1403 0.194 0.214 0.211 0.183 2409 0.192 0.189 0.200 0.182 3415 0.170 0.199 0.194 0.186 4420 0.159 0.191 0.216 0.174 5426 0.150 0.166 0.172 0.158 6432 0.142 0.150 0.162 0.148 On 10/30/07, Bryan Hanson [EMAIL PROTECTED] wrote: Hi Folks... Œbeen playing with this for a while, with no luck, so I¹m hoping someone knows it off the top of their head... Difficult to find this nuance in the archives, as so many msgs deal with read.csv! I¹m trying to read a data file with the following structure (a little piece of the actual data, they are actually csv just didn¹t paste with the commas): wavelength SampleA SampleB SampleC SampleD color green black black green class Class 1 Class 2 Class 2 Class 1 403 1.94E-01 2.14E-01 2.11E-01 1.83E-01 409 1.92E-01 1.89E-01 2.00E-01 1.82E-01 415 1.70E-01 1.99E-01 1.94E-01 1.86E-01 420 1.59E-01 1.91E-01 2.16E-01 1.74E-01 426 1.50E-01 1.66E-01 1.72E-01 1.58E-01 432 1.42E-01 1.50E-01 1.62E-01 1.48E-01 Columns after the first one are sample names. 2nd row is the list of colors to use in later plotting. 3rd row is the class for later manova. The rest of it is x data in the first column with y1, y2...following for plotting. I can read the file w/o the color or class rows with read.csv just fine, makes a nice data frame with proper data types. The problem comes when parsing the 2nd and 3rd rows. Here¹s the code: data = read.csv(filename, header=TRUE) # read in data color = data[1,]; color = data[-1] # capture color info throw out 1st value class = data[2,]; class = class[-1] # capture category info throw out 1st value cleaned.data = data[-1,] # remove color category info for matrix operations cleaned.data = data[-1,] freq = data[,1] # capture frequency info What happens is that freq is parsed as factors, and the color and class are parsed as a data frames of factors. I need color and class to be characters which I can pass to functions in the typical way one uses colors and levels. I need the freq the cleaned.data info as numeric for plotting. I don¹t feel I¹m far off from things working, but that¹s where you all come in! Seems like an argument of as.something is needed, but the ones I¹ve tried don¹t work. Would it help to put color and class above the x,y data in the file, then clean it off? Btw, I¹m on a Mac using R 2.6.0. Thanks in advance, Bryan * Bryan Hanson Professor of Chemistry Biochemistry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R
[R] Accessing scripts in a different directory on a Mac
Hi all. A question for knowledgeable folks using R on an Intel Mac running OS X 10.4.10 For ease of maintenance, I have broken a large R script into a main script which ³oversees² things by calling other scripts, using ³source². Let¹s call the secondary scripts ³sub-scripts.² I¹d like for the sub-scripts to reside in a different directory (again, for ease of maintenance, and so I can access them from many other directories). I¹ve looked all over the documentation about paths and filenames, but I¹m having trouble deciding which of the many functions is the one I need. As a more specific example, my main script currently contains source(³test.R²) and I need to do something like source(pathtest.R). Ideally, I'd like to specify path early in the file one time, and have it apply automatically later. Stuff in the documentation only seems to tease! Thanks in advance, Bryan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.