Re: [R] How to read plain text documents into a vector?
Richard Liu wrote: There are actually two vignettes. Both have examples of a vector of characters being made into a tm corpus, but neither shows how to read documents on the file system into the vectors. I tried the other two suggestions, but paste seemed not to glue the separate lines together into one character string. Perhaps I missed something (collapse?). Perhaps I'll have another look. I admit, an example to read in external data is missing. Maybe inform the author. Try if this works; I have not use the special functions in tm, so there might be another problem, but readPlain looks like a good place to continue Dieter library(tm) filenames = list.files(path=.,pattern=\\.txt) docs = for (filename in filenames){ docs = c(docs,paste(readLines(file(filename)),collapse=\n)) } docs ## continue as in example vs = VectorSource(docs) -- View this message in context: http://www.nabble.com/How-to-read-plain-text-documents-into-a-vector--tp25867792p25886104.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error from termplot() with make.panel.svysmooth() for complex survey data
Greetings, I am using library(survey) to analyze some complex sample data. After fitting a model I tried to use termplot() with make.panel.svysmooth(), but I received an error (see below). Could someone help me interpret the error message so I can make the necessary corrections? The make.panel.svysmooth() function seems to work fine, and termplot() worked fine after I dropped the smoother. This led me to believe that the error is coming from panel.smooth(), but the code for that function does not contain the rowsum() and findInterval() functions shown in the error message. I should also note that error does not depend on the inclusion or omission of missing data, as I tried it both ways. I'm analyzing private data, so I can't provide a reproducible example, but here's the output: design - svydesign(ids=~PSU, weights=~W, strat=~STR, nest=T, data=data) model - svyglm(fmla, design=design) #works fine as evidenced by summary (not shown) termplot(model, data=model.frame(design), partial.resid=T, se=T, smooth=make.panel.svysmooth(design)) Waiting to confirm page change... Error in rowsum.default(c(rep(0, ngrid), w), c(1:ngrid, findInterval(mm[, : incorrect length for 'group' make.panel.svysmooth(design) function (x, y, span = 2/3, col.smooth = red, col = par(col), bg = NA, pch = par(pch), cex = 1, ...) { if (is.null(bandwidth)) bandwidth - range(x) * span/3 s - svysmooth(y ~ x, design = design, bandwidth = bandwidth) points(x, y, pch = pch, bg = bg, col = col) lines(s[[1]], col = col.smooth, ...) } environment: 0x0225a5d4 termplot(model, data=model.frame(design), partial.resid=T, se=T) #works but without smoothing panel.smooth function (x, y, col = par(col), bg = NA, pch = par(pch), cex = 1, col.smooth = red, span = 2/3, iter = 3, ...) { points(x, y, pch = pch, col = col, bg = bg, cex = cex) ok - is.finite(x) is.finite(y) if (any(ok)) lines(stats::lowess(x[ok], y[ok], f = span, iter = iter), col = col.smooth, ...) } environment: namespace:graphics sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] quantreg_4.38 SparseM_0.80 KernSmooth_2.23-3 survey_3.16 car_1.2-15 foreign_0.8-37 Thanks in advance for any help you can provide. Regards, Chris -- Christopher Moore, M.P.P. Doctoral Student Quantitative Methods in Education University of Minnesota 44.9785°N, 93.2396°W moor0...@umn.edu http://umn.edu/~moor0554 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] attach
Christophe Dutang1 wrote: I would like to know what happens on the memory side when I use attach(inputdata) Is there a second allocation of memory for inputdata? Not, it just guides the syntax. Christophe Dutang1 wrote: Is it better not to use attach function? A qualified yes in the sense of do not use it. I think it is used to much in old documentation, presumably because some S eggshells. I use with() if I have a nasty formula to unclutter; it acts locally only and you don't get unwanted side effects. Dieter -- View this message in context: http://www.nabble.com/attach-tp25885494p25886141.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reference on permutation test
It's always worthwhile to look at the articles by Pitman (and maybe the textbook by Fisher, if you have access to it); Welch is a nice paper, too, but might be pretty technical to learn about the area. I don't know any of the textbooks except Edgington (which is in its 4th edition now with co-author P Onghena), a book I can wholeheartedly recommend. The authors explain the basic concepts in a way that should even be accessible to non-statisticians, I believe. They also cover a lot of special cases, and give exhaustive theoretical background for those interested (you can easily skip these parts if you are not). It goes along with a CD with some programs to run these test, though I did not use it so far -- you can do it yourself in R, of course :-) I am not sure whether they cover any other resampling methods at all, but if so, only very briefly, so you'd need another reference for that. Efron Tibshirani is a classical, while I heard some people recommending Davison Hinkley -- I didn't find the time yet to look more closely into the latter. Just my 2 cents, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Peng Yu Sent: Mittwoch, 14. Oktober 2009 04:58 To: r-h...@stat.math.ethz.ch Subject: [R] reference on permutation test I want learn permutation test and resampleing, etc. There are a few references listed below. I'm wondering what is the best book on this topic. Can somebody give me some advice. Thank you! http://en.wikipedia.org/wiki/Resampling_%28statistics%29#Permutation_tes t __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] heatmap.2
John Celniker wrote: ... when I try to change the breaks to: br2 [1] 0.000 0.5337751 1.0675502 1.6013253 2.1351003 3.000 3.500 4.000 4.500 [10] 4.8039758 I get the correct heatmap representation but the color key does not update correctly to reflect changes in breaks even though the superimposed histogram is correct. This looks like the notorious 7.31 FAQ, and has been asked in the same context already on this list (but the author did not believe it). 7.31 Why doesn't R think these numbers are equal? Try to replace your 3.5 and 4.5000 by 3.5001 and 4.5001 or (3.4999 and 4.4999); if things work after that, 7.31 hit again. If not, my guess was wrong. Dieter -- View this message in context: http://www.nabble.com/heatmap.2-tp25884655p25886190.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] update.formula drop interaction terms
Eleni Rapsomaniki-2 wrote: How do I drop multiplication terms from a formula using update? e.g. forml=as.formula(Surv(time, status) ~ x1+x2+A*x3+A*x4+B*x5+strata(sex)) The easiest way is to write the formula again without the A's. Dieter -- View this message in context: http://www.nabble.com/update.formula-drop-interaction-terms-tp25873249p25886245.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] post-hoc test with kruskal.test()
Dear R users, I would like to know if there is a way in R to execute a post-hoc test (factor levels comparison, like Tukey for ANOVA) of a non-parametric analysis of variance with kruskal.test() function. I am comparing three different groups. The preliminary analysis using the kruskal-wallis-test show significance, but I still don't know the relationship and the significance level between each group? Do you have any suggestion? Many thanks in advance! Robert ___ Robert M. Kalicki, MD Postdoctoral Fellow Department of Nephrology and Hypertension Inselspital University of Bern Switzerland Address: Klinik und Poliklinik für Nephrologie und Hypertonie KiKl G6 Freiburgstrasse 15 CH-3010 Inselspital Bern Tel +41(0)31 632 96 63 Fax+41(0)31 632 14 58 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] histogram
On 10/13/2009 10:06 PM, Dmitry Gospodaryov wrote: Dear R developers,How I can build a histogram from matrix: 0 0.5 1 0.25 34 43 65 1 23 35 54 4 22 29 42 10 21 22 29 20 15 17 20 (first string is represented names of columns, first column is represented names of rows) where names of columns should be x-axis labels; respectively to this, I want to have three groups of bars (5 bars in each group)? Y values should be represented by values given in the core of matrix. Names of the rows should be in a legend, and should represent the each of 5 bars (in group) name. dgd-read.table(dg.dat,header=TRUE) names(dgd)-c(blank,0,0.5,1) library(plotrix) barp(dgd[,2:4],names.arg=names(dgd)[2:4],col=2:6) legend(1,65,dgd[,1],fill=2:6) I would also try to build filled contour, however, i can't ask the program to consider column and rownames like true values, not only like labels. So, column names should be the y-values, while row names should be the x-values. Values placed in the core of matrix should be z-values. filled.contour(dgd[,1],as.numeric(names(dgd)[2:4]), as.matrix(dgd[,2:4])) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] svy / weighted regression
Dear Thomas David That makes sense! If I wanted to use survey on the summarized data, I suppose that I could 'de-summarize' or 're-individualize' the data to give the design object the correct information on the number of observations. Or I could revert to using the actual individual-level data. Thanks a lot, your input has been very helpful. Laust Post doc. Laust Mortensen, PhD Epidemiology Unit University of Southern Denmark 2009/10/13 Thomas Lumley tlum...@u.washington.edu: I think there is a much simpler explanation. The survey design object has eight observations, two per country. With a sample size of two per country it is hardly surprising that country-specific estimates are not very precise. The actual data has hundreds of thousands of observations per country, so it will have more precise estimates. Grouping the data doesn't make a difference for model-based glm estimation, where it is simply a computational convenience. It *does* make a difference for design-based estimation, because it changes the design. -thomas On Tue, 13 Oct 2009, Laust wrote: Dear David, Thanks again for your input! I realize that I did a bad job of explaining this in my first email, but the setup is that in Finland persons who die are sampled with a different probability (1) from those who live (.5). This was done by the Finnish data protection authorities to protect individuals against identification. In the rest of the countries everyone is sampled with a probability of 1. The data that I am supplying to R is summarized data for each country stratified by case status. Another way of organizing the data would be: # creating data listc - c(Denmark,Finland,Norway,Sweden) listw - c(1,2,1,1) listd - c(1000,1000,1000,2000) listt - c(755000,505000,905000,191) list.cwdt - c(listc, listw, listd, listt) country2 - data.frame(country=listc,weight=listw,deaths=listd,time=listt) I hope that it is clearer now that for no value of the independent variable 'country' is the rate going to be zero. I think this was also not the case in my original example, but this was obscured by my poor communication- R-skills. But if data is organized this way then sampling weight of 2 for Finland should only be applied to the time-variable that contains person years at risk and *not* to the number of deaths, which would complicate matters further. I would know how to get this to work in R or in any other statistical package. Perhaps it is - as Peter Dalgaard suggested - the estimation of the dispersion parameter by the survey package that is causing trouble, not the data example eo ipso. Or perhaps I am just using survey in a wrong way. Best Laust Post doc. Laust Mortensen, PhD Epidemiology Unit University of Southern Denmark On Mon, Oct 12, 2009 at 3:32 PM, David Winsemius dwinsem...@comcast.net wrote: I think you are missing the point. You have 4 zero death counts associated with much higher person years of exposure followed by 4 death counts in the thousands associated with lower degrees of exposures. It seems unlikely that these are real data as there are not cohorts that would exhibit such lower death-rates. So it appears that in setting up your test case, you have created an impossibly unrealistic test problem. -- David On Oct 12, 2009, at 9:12 AM, Laust wrote: Dear Peter, Thanks for the input. The zero rates in some strata occurs because sampling depended on case status: In Finland only 50% of the non-cases were sampled, while all others were sampled with 100% probability. Best Laust On Sat, Oct 10, 2009 at 11:02 AM, Peter Dalgaard p.dalga...@biostat.ku.dk wrote: Sorry, forgot to reply all... Laust wrote: Dear list, I am trying to set up a propensity-weighted regression using the survey package. Most of my population is sampled with a sampling probability of one (that is, I have the full population). However, for a subset of the data I have only a 50% sample of the full population. In previous work on the data, I analyzed these data using SAS and STATA. In those packages I used a propensity weight of 1/[sampling probability] in various generalized linear regression-procedures, but I am having trouble setting this up. I bet the solution is simple, but I’m a R newbie. Code to illustrate my problem below. Hi Laust, You probably need the package author to explain fully, but as far as I can see, the crux is that a dispersion parameter is being used, based on Pearson residuals, even in the Poisson case (i.e. you effectively get the same result as with quasipoisson()). I don't know what the rationale is for this, but it is clear that with your data, an estimated dispersion parameter is going to be large. E.g. the data has both 0 cases in 75 person-years and 1000 cases in 5000 person-years for Denmark, and in your model they are supposed to have the same Poisson rate. summary.svyglm starts off with
Re: [R] S4 tutorial
Peng the Brobdingnag package includes a vignette that gives a step-by-step guide to creating a simple package that uses S4. best wishes Robin Peng Yu wrote: I'm looking for some tutorial on S4. I only find the following one, which is not in English. Can somebody let me know if there is any introductory material? I'm very familiar with OO and C++. If there is some material that suits my background, it will be great. https://stat.ethz.ch/pipermail/r-help/2009-January/184108.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Robin K. S. Hankin Uncertainty Analyst University of Cambridge 19 Silver Street Cambridge CB3 9EP 01223-764877 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read plain text documents into a vector?
kenhorvath wrote: Dieter Menne wrote: library(tm) filenames = list.files(path=.,pattern=\\.txt) docs = for (filename in filenames){ docs = c(docs,paste(readLines(file(filename)),collapse=\n)) } docs ## continue as in example vs = VectorSource(docs) If in any way possible I would recommend to do the whole procedure via lists, not recursively. Ken While I agree that the appending could be more efficiently be done by a list as an intermediate, the docs = c(doc, ljljljl) construct is not recursive, even if not efficient. Dieer -- View this message in context: http://www.nabble.com/How-to-read-plain-text-documents-into-a-vector--tp25867792p25887088.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] post-hoc test with kruskal.test()
Robert, you can do the corresponding paired comparisons using wilcox.test. As far as I know, there is no such general correction as Tukey's HSD for the Kruskal-Wallis-Test. However, if you have indeed only 3 groups (resulting in 3 paired comparisons), the intersection-union principle and the theory of closed test procedures should allow you to do these test without further correction, given the global test was statistically significant. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Robert Kalicki Sent: Mittwoch, 14. Oktober 2009 09:17 To: r-help@r-project.org Subject: [R] post-hoc test with kruskal.test() Dear R users, I would like to know if there is a way in R to execute a post-hoc test (factor levels comparison, like Tukey for ANOVA) of a non-parametric analysis of variance with kruskal.test() function. I am comparing three different groups. The preliminary analysis using the kruskal-wallis-test show significance, but I still don't know the relationship and the significance level between each group? Do you have any suggestion? Many thanks in advance! Robert ___ Robert M. Kalicki, MD Postdoctoral Fellow Department of Nephrology and Hypertension Inselspital University of Bern Switzerland Address: Klinik und Poliklinik für Nephrologie und Hypertonie KiKl G6 Freiburgstrasse 15 CH-3010 Inselspital Bern Tel +41(0)31 632 96 63 Fax+41(0)31 632 14 58 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] update.formula drop interaction terms
Eleni Rapsomaniki wrote: Dear R users, How do I drop multiplication terms from a formula using update? e.g. forml=as.formula(Surv(time, status) ~ x1+x2+A*x3+A*x4+B*x5+strata(sex)) #I would like to drop all instances of variable A (the main effect and its interactions). The following: updated.forml=update(forml, ~ . -A) to drop all terms with A: update(lmo, . ~ . - A:.) Uwe Ligges #gives me this: #Surv(time, status) ~ x1 + x2 + x3 + x4 + B + x5 + strata(sex) + A:x3 + A:x4 + B:x5 #but I want this: #updated.forml=as.formula(Surv(time, status) ~ x1+x2+x3+x4+B*x5+strata(sex)) Any ideas? Thanks in advance Eleni Rapsomaniki Research Associate Strangeways Research Laboratory Department of Public Health and Primary Care University of Cambridge __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Request update on A (Not So) Short Introduction to S4
Peng Yu wrote: There are several '?'s on the last page of the following document. Apparently, they are not correct. Could somebody correct it? cran.r-project.org/doc/contrib/Genolini-S4tutorialV0-5en.pdf Please ask the author. Best, Uwe Ligges __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Linear Regression Question
Dear Sir or Madam, I am a student at MSc Probability and Finance at Paris 6 University/ Ecole Polytechnique. I am using R and I can't find an answer to the following question. I will be very thankful if you can answer it. I have two vectors rendements_CAC40 and rendements_AlcatelLucent. I use the lm function as follows, and then the sumarry function: regression=lm(rendements_CAC40 ~ rendements_AlcatelLucent); sum=summarry(regression); I obtain: Call: lm(formula = rendements_CAC40 ~ rendements_AlcatelLucent) Residuals: Min 1Q Median 3Q Max -6.43940 -0.84170 -0.01124 0.76235 9.08087 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -0.035790.07113 -0.5030.615 rendements_AlcatelLucent 0.339510.01732 19.608 2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 1.617 on 515 degrees of freedom Multiple R-squared: 0.4274, Adjusted R-squared: 0.4263 F-statistic: 384.5 on 1 and 515 DF, p-value: 2.2e-16 I would like to access to the p-value field, but I can't find the name of it, as we can see it below: names(sum) [1] call terms residuals coefficients aliased sigma dfr.squared [9] adj.r.squared fstatisticcov.unscaled I thought that I could find it in the fstatistic field, but it is not: sum$fstatistic valuenumdfdendf 384.4675 1. 515. Thank in advance for your time, Kind regards, Alexandre Cohen [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R commander.
thank u so much for ur help,,i have tried it before,,and succeded to do so,,the problem is that, in the next stage of the package i am facing a problem which i dont know how to fix,,,i am useing cnvpack ( http://www.meb.ki.se/~yudpaw/) and in the following command i get an error,,which is.. cnvr-setreg(out,anno.list= myann,pheno.data= mypheno,high.conf= NA,LIM= 2,method= COVER, cnv.abnormality= both) Error in setreg(out, anno.list = myann, pheno.data = mypheno, high.conf = NA, : NAs in foreign function call (arg 4) i then manipulate my data (out) and made high.conf=0, then it shows me this error... cnvr-setreg(out,anno.list= myann,pheno.data= mypheno,high.conf=0,LIM= 2,method= COVER, cnv.abnormality= both) Error in st[ii]:en[ii] : NA/NaN argument Ilyas On Tue, Oct 13, 2009 at 1:32 AM, joris meys jorism...@gmail.com wrote: As the error says, you have different row numbers in your variables. The variable $Chromosome has no values. try : ann - data.frame( ann [-3] ) Cheers Joris On Mon, Oct 12, 2009 at 8:29 AM, Ilyas . mykh...@gmail.com wrote: i have two RData files,,i want to print them to check the format of the tables in these files,,,i can load both the files and can read it as well load('ann.RData') str(ann) List of 4 $ Name : chr [1:561466] rs3094315 rs12562034 rs3934834 rs9442372 ... $ Position : int [1:561466] 742429 758311 995669 1008567 1011278 1011521 1020428 1021403 1038818 1039813 ... $ Chromosome: chr(0) $ Chr.num : num [1:561466] 1 1 1 1 1 1 1 1 1 1 ... but when i try to display all the table by using the R commander.i have got the display of 'pheno.RData' file,,but the other file 'ann.RData' show me an error i.e. ann - as.data.frame(ann) Error in data.frame(Name = c(rs3094315, rs12562034, rs3934834, rs9442372, : arguments imply differing number of rows: 561466, 0 i am sending you both files,,,hope u will help me solve this problm Ilyas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2 scale_shape question
Is there a way to have some points solid and some points hollow? I have two classes of points and there are so many points, that it's hard to see just the difference in shapes. I'd like to have one of the classes be hollow in addition to being a different shape. Any help would be grand. Thanks, Jon -- View this message in context: http://www.nabble.com/ggplot2-scale_shape-question-tp25882277p25882277.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read plain text documents into a vector?
Richard Liu wrote: I tried the other two suggestions, but paste seemed not to glue the separate lines together into one character string. Perhaps I missed something (collapse?). Perhaps I'll have another look. Yes, that is what 'collapse' should do! If you read text using readLines R makes every line of the original document into an element of a character vector, so a text with 30 lines would end up as vector with 30 elements. To have one vector element per document, you need to collapse these, say, 30 elements into a single one - that is what collapse does. The value you assign to collapse is the character (sequence) R puts between the single elements. If you do not need to preserve paragraph structure, a single white space is the logical choice (collapse = ). (Paste just turns an object into a character object - so using paste alone on the vector produced by readLines would be meaningless, using collapse is the whole point here.) Worked fine with me - did you get an error message or did it just not yield the result you'd expected? Dieter Menne wrote: library(tm) filenames = list.files(path=.,pattern=\\.txt) docs = for (filename in filenames){ docs = c(docs,paste(readLines(file(filename)),collapse=\n)) } docs ## continue as in example vs = VectorSource(docs) If in any way possible I would recommend to do the whole procedure via lists, not recursively. Since readLines produces a vector and a list is, in this case, a vector of vectors, it should be no problem. Ken -- View this message in context: http://www.nabble.com/How-to-read-plain-text-documents-into-a-vector--tp25867792p25886956.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read plain text documents into a vector?
Dieter Menne wrote: While I agree that the appending could be more efficiently be done by a list as an intermediate, the docs = c(doc, ljljljl) construct is not recursive, even if not efficient. Yes, of course, that was hastily written, sorry ... but from my experience list is really more efficient. Ken -- View this message in context: http://www.nabble.com/How-to-read-plain-text-documents-into-a-vector--tp25867792p25887181.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] default borders in boxplot and barplot
This is my first post so hopefully I haven't mucked up the rules. I'm trying to change the default borders in either boxplot or barplot so that, at the request of a journal, all of my figures have the same type of border. I've successfully used par(bty=o) using plot(1:10, bty=o), but it seems that barplot and boxplot have their own defaults that override this. I've tried both par( bty=o) barplot(stuff) and barplot(stuff, bty=o) Does anyone know a trick that doesn't involve using abline() to force borders? Thanks Jen Young, MSc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] currency conversion function?
Dear all Is there any R function that would perform currency conversion using up-to-date exchange rates? I would be looking for a function that allows to download recent exchange rates (say, from Yahoo!) and then use these in converting currencies (say, USD to EUR). I am not sure whether r-sig-finance would be more appropriate, but the (off-)topic feels general enough to me. Thank you Liviu -- Do you know how to read? http://www.alienetworks.com/srtest.cfm Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] default borders in boxplot and barplot
On Wed, Oct 14, 2009 at 2:21 AM, Jennifer Young jennifer.yo...@math.mcmaster.ca wrote: This is my first post so hopefully I haven't mucked up the rules. I'm trying to change the default borders in either boxplot or barplot so that, at the request of a journal, all of my figures have the same type of border. I've successfully used par(bty=o) using plot(1:10, bty=o), but it seems that barplot and boxplot have their own defaults that override this. I've tried both par( bty=o) barplot(stuff) and barplot(stuff, bty=o) Does anyone know a trick that doesn't involve using abline() to force borders? Just do box() to draw a box round your plot area? Using the example from ?barplot require(grDevices) # for colours tN - table(Ni - stats::rpois(100, lambda=5)) r - barplot(tN, col=rainbow(20)) box() Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Regression Question
On Tue, Oct 13, 2009 at 11:17:11PM +0200, Alexandre Cohen wrote: I have two vectors rendements_CAC40 and rendements_AlcatelLucent. I use the lm function as follows, and then the sumarry function: regression=lm(rendements_CAC40 ~ rendements_AlcatelLucent); sum=summarry(regression); [...] I would like to access to the p-value field, but I can't find the name of it, as we can see it below: [...] sum is the name of an R function, so in the example below I'll use another name: x - summary(regression) pf(x$fstatistic[1], x$fstatistic[2], x$fstatistic[3], lower.tail = FALSE) Reference: https://stat.ethz.ch/pipermail/r-help/2009-April/194123.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Regression Question
On 13-Oct-09 21:17:11, Alexandre Cohen wrote: Dear Sir or Madam, I am a student at MSc Probability and Finance at Paris 6 University/ Ecole Polytechnique. I am using R and I can't find an answer to the following question. I will be very thankful if you can answer it. I have two vectors rendements_CAC40 and rendements_AlcatelLucent. I use the lm function as follows, and then the sumarry function: regression=lm(rendements_CAC40 ~ rendements_AlcatelLucent); sum=summarry(regression); I obtain: Call: lm(formula = rendements_CAC40 ~ rendements_AlcatelLucent) Residuals: Min 1Q Median 3Q Max -6.43940 -0.84170 -0.01124 0.76235 9.08087 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -0.035790.07113 -0.5030.615 rendements_AlcatelLucent 0.339510.01732 19.608 2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 1.617 on 515 degrees of freedom Multiple R-squared: 0.4274, Adjusted R-squared: 0.4263 F-statistic: 384.5 on 1 and 515 DF, p-value: 2.2e-16 I would like to access to the p-value field, but I can't find the name of it, as we can see it below: names(sum) [1] call terms residuals coefficients aliased sigma dfr.squared [9] adj.r.squared fstatisticcov.unscaled I thought that I could find it in the fstatistic field, but it is not: sum$fstatistic valuenumdfdendf 384.4675 1. 515. Thank in advance for your time, Kind regards, Alexandre Cohen Assuming you gave executed your code with summary correctly spelled (i.e. not summarry or sumarry as you have written above), then the information you require can be found in sum$coefficients which you can as well write as sum$coef You will find that sum$coef is an array with 4 columns (Estimate, Std. Error, t value and Pr(|t|)), so the P-values are in the final column sum$coef[,4]. Emulating your calculation above with toy regression data: X - (0:10) ; Y - 1.0 + 0.25*X + 2.5*rnorm(11) regression - lm(Y~X) sum - summary(regression) sum # Call: # lm(formula = Y ~ X) # Residuals: # Min 1Q Median 3Q Max # -5.7182 -1.5383 0.2989 1.9806 3.9364 # Coefficients: # Estimate Std. Error t value Pr(|t|) # (Intercept) 2.100351.81418 1.1580.277 # X -0.031470.30665 -0.1030.921 # # Residual standard error: 3.216 on 9 degrees of freedom # Multiple R-squared: 0.001169, Adjusted R-squared: -0.1098 # F-statistic: 0.01053 on 1 and 9 DF, p-value: 0.9205 sum$coef # Estimate Std. Errort value Pr(|t|) # (Intercept) 2.1003505 1.8141796 1.1577412 0.2767698 # X -0.0314672 0.3066523 -0.1026152 0.9205184 sum$coef[,4] # (Intercept) X # 0.2767698 0.9205184 [And, by the way, although it in fact works, it is not a good idea to use a function name (sum) as the name of a variable.] Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 14-Oct-09 Time: 10:53:28 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] General means of matching a color specification to an official R color name
Hi Bryan, You can get a near match with the color.id function in the plotrix package. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] General means of matching a color specification to an official R color name
On 10/14/2009 12:05 AM, Barry Rowlingson wrote: On Tue, Oct 13, 2009 at 10:58 PM, Bryan Hansonhan...@depauw.edu wrote: Works perfectly! Thanks Barry. I had actually seen some suggestions on using a distance, but by then I was thinking about hcl spaces and distance isn't so as simple there. I'm too tired I think. Anyway, you've got me running again! Thanks, Bryan There's a CPAN module for Perl that does hcl colour similarity: http://search.cpan.org/~mbarbon/Color-Similarity-HCL-0.04/lib/Color/Similarity/HCL.pm the Perl code is pretty neat, looks easy to R-ify - released under the perl license. Barry There are a few unexported functions in the xterm256 package to deal with this. colors()[ xterm256:::closest.character( #aabbcc ) ] [1] gold4 The package pretends it writes the text using the background of foreground color as usually represented in R, but it actually first grabs the closest color (in the RGB space according to the euclidian metric, I have no idea whether a different space or a different metric would be better) This presentation might give a clue : http://www.agrocampus-ouest.fr/math/useR-2009//slides/Zeileis+Hornik+Murrell.pdf Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://tr.im/BcPw : celebrating R commit #5 |- http://tr.im/ztCu : RGG #158:161: examples of package IDPmisc `- http://tr.im/yw8E : New R package : sos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] General means of matching a color specification to an official R color name
On 10/14/2009 12:26 PM, Romain Francois wrote: On 10/14/2009 12:05 AM, Barry Rowlingson wrote: On Tue, Oct 13, 2009 at 10:58 PM, Bryan Hansonhan...@depauw.edu wrote: Works perfectly! Thanks Barry. I had actually seen some suggestions on using a distance, but by then I was thinking about hcl spaces and distance isn't so as simple there. I'm too tired I think. Anyway, you've got me running again! Thanks, Bryan There's a CPAN module for Perl that does hcl colour similarity: http://search.cpan.org/~mbarbon/Color-Similarity-HCL-0.04/lib/Color/Similarity/HCL.pm the Perl code is pretty neat, looks easy to R-ify - released under the perl license. Barry There are a few unexported functions in the xterm256 package to deal with this. colors()[ xterm256:::closest.character( #aabbcc ) ] [1] gold4 Actually that is wrong closest.character gives you the index of the closest xterm256 color, as in : http://frexx.de/xterm-256-notes/ sorry for the misleading answer ... The package pretends it writes the text using the background of foreground color as usually represented in R, but it actually first grabs the closest color (in the RGB space according to the euclidian metric, I have no idea whether a different space or a different metric would be better) This presentation might give a clue : http://www.agrocampus-ouest.fr/math/useR-2009//slides/Zeileis+Hornik+Murrell.pdf Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://tr.im/BcPw : celebrating R commit #5 |- http://tr.im/ztCu : RGG #158:161: examples of package IDPmisc `- http://tr.im/yw8E : New R package : sos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rect function
Dear all, I have a question about how to load data or (entering data )to each cell of a rectangle created by rect . e.g. I have a matrix rbind(1:2,1:2) I have created a 2x2 rectangle by using: a-0:1/10 b-0:1/10 kk-expand.grid(a,b) plot.new() rect(kk[, 1], kk[, 2], kk[, 1] + .1,kk[, 2] + .1) so how do we put the value of rbind(1:2,1:2) into the relevant cell of this rectangle created above? If it is not possible to do so, is there any way to plot the matrix table with grid. Thanks million times! Rene [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rect function
On 10/14/2009 10:20 PM, Rene wrote: Dear all, I have a question about how to load data or (entering data )to each cell of a rectangle created by rect . e.g. I have a matrix rbind(1:2,1:2) I have created a 2x2 rectangle by using: a-0:1/10 b-0:1/10 kk-expand.grid(a,b) plot.new() rect(kk[, 1], kk[, 2], kk[, 1] + .1,kk[, 2] + .1) so how do we put the value of rbind(1:2,1:2) into the relevant cell of this rectangle created above? If it is not possible to do so, is there any way to plot the matrix table with grid. Hi Rene, Have a look at the color2D.matplot function in the plotrix package, in particular the show.values argument. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RCMD Rdconv debugging output
I am trying (on Windows XP, with R 2.10.0beta) to use RCMD Rdconv -t html myfile.Rd myfile.html to convert some Rd files to html. I get a message that there are warning. How can I tell Rdconv to show me these warnings? -- Erich Neuwirth, University of Vienna Faculty of Computer Science Computer Supported Didactics Working Group Visit our SunSITE at http://sunsite.univie.ac.at Phone: +43-1-4277-39464 Fax: +43-1-4277-39459 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting initial numerals
On Tue, Oct 13, 2009 at 6:48 PM, PDXRugger j_r...@hotmail.com wrote: II just want to create a new object with the first two numerals of the data. Not sure why this isnt working, consider the following: EmpEst$naics=c(238321, 624410, 484121 ,238911, 81, 531110, 621399, 541613, 524210 ,236115 ,811121 ,236115 ,236115 ,621610 ,814110 ,812320) EmpEst$naics2-formatC(EmpEst$naics %% 1e2, width=2, flag=, mode =integer) #RESULT:Warning message: #In Ops.factor(EmpEst$naics, 100) : %% not meaningful for factors Wild guess : you get this warning EmpEst$naics is a factor? Quite some errors and warnings mean mostly what they say. If you see similar errors or warnings, please use the function str() first to check your data structure. For example : str(EmpEst$naics) You should also make sure you provide us with self contained, reproducible code. As we don't have the dataframe EmpEst, I cannot run the code you sent. If I change it, I don't get the error. Below a few code snippets to illustrate how the problem arises, and how to get it away : naics=c(238321, 624410, 484121 ,238911, 81, 531110, 621399,541613, + 524210 ,236115 ,811121 ,236115 ,236115 ,621610 ,814110 ,812320) naics2-formatC(naics %% 1e2, width=2, flag=, mode + =integer) naics2 [1] 21 10 21 11 11 10 99 13 10 15 21 15 15 10 10 [16] 20 No error, as vector naics is a numerical vector. I make it a factor : naics=factor(c(238321, 624410, 484121 ,238911, 81, 531110, 621399,541613, + 524210 ,236115 ,811121 ,236115 ,236115 ,621610 ,814110 ,812320)) naics2-formatC(naics %% 1e2, width=2, flag=, mode + =integer) Warning message: In Ops.factor(naics, 100) : %% not meaningful for factors naics2 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [16] NA Which is what you see. You can transform a factor to a numerical vector with a combination of as.numeric(as.character()). This is necessary as you would otherwise get the internal values for the factor levels (i.e. the numbers 1, 2, ... n with n being the number of levels.) naics=factor(c(238321, 624410, 484121 ,238911, 81, 531110, 621399,541613, + 524210 ,236115 ,811121 ,236115 ,236115 ,621610 ,814110 ,812320)) naics2-formatC(as.numeric(as.character(naics)) %% 1e2, width=2, flag=, mode + =integer) naics2 [1] 21 10 21 11 11 10 99 13 10 15 21 15 15 10 10 [16] 20 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RCMD Rdconv debugging output
On 10/14/2009 7:45 AM, Erich Neuwirth wrote: I am trying (on Windows XP, with R 2.10.0beta) to use RCMD Rdconv -t html myfile.Rd myfile.html to convert some Rd files to html. I get a message that there are warning. How can I tell Rdconv to show me these warnings? You can do the same conversion within R as library(tools) Rd2HTML(myfile.Rd, out=myfile.html) and any warnings will show up in the usual way in the console. For more extensive checks, you can use checkRd(myfile.Rd) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] currency conversion function?
You can try something about like this: foo - function(from, to, date){ url - http://www.oanda.com/convert/classic?script=..%2Fconvert%2Fclassiclanguage=envalue=1; params - sprintf(%sdate=%sexch=%sexch2=margin_fixed=0expr=%sexpr2=SUBMIT=Convert+Nowlang=endate_fmt=us, url, format(as.Date(date), %m/%d/%y), from, to) Lines - readLines(params) value - gsub(.*([0-9]\\.+[0-9]+).*, \\1, grep(nl, grep(from, grep(to, Lines, value = TRUE), value = TRUE), value = TRUE)) as.numeric(value) } foo('BRL', 'USD', '2009-10-14') On Wed, Oct 14, 2009 at 6:40 AM, Liviu Andronic landronim...@gmail.com wrote: Dear all Is there any R function that would perform currency conversion using up-to-date exchange rates? I would be looking for a function that allows to download recent exchange rates (say, from Yahoo!) and then use these in converting currencies (say, USD to EUR). I am not sure whether r-sig-finance would be more appropriate, but the (off-)topic feels general enough to me. Thank you Liviu -- Do you know how to read? http://www.alienetworks.com/srtest.cfm Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cdf
On Oct 13, 2009, at 6:53 PM, Duncan Murdoch wrote: On 13/10/2009 6:43 PM, David Winsemius wrote: On Oct 13, 2009, at 5:12 PM, maram salem wrote: Dear all, I have the cdf of the following power fuction distribution: F(y)=(y/350)^a ,0y350, where a is some parameter with range a0. I want to use it as the argument of the discretize function of the actuar package. So I think I need to define this function to R so that if I entered a=1, I get the following F(y)=(y/350) and if I entered a=4.5, I get the following F(y) =(y/350)^4.5 ... and so on I've tried a-vector(mode=numeric,length=1) powercdf-function(a,y) (y/350)^a But when I typed: powercdf(10,y) instead of getting : (y/350)^10 (which is what I want) I got : object y not found ?? I want y to remain as it is, a continous variable, not for example seq(0,350). Thank you in advance. If you want symbolic algebra then use a system designed for such. If you invoke a function in R you need to give it arguments for evaluation ... to numerical values. If you want a function that returns a function, that is also possible. cdffn - function(y, arg) return( function(y) {y^arg} ) But don't do it like that. If you do, you'll see things like this: power - 10 cdf10 - cdffn(arg=power) # don't need y as an argument. power - 1 cdf10(1:10) [1] 1 2 3 4 5 6 7 8 9 10 See my other post for a correct implementation using force(). Thank you, Duncan. I had seen your post (after hitting send) but had not realized how far out of itself a function might look for arguments. You did mention the crucial aspect of force but I didn't really get it until this further clue. Sometimes I'm a bit dense. -- David Duncan Murdoch cdf10 - cdffn(y, 10) cdf10(1:10) [1] 11024 59049 1048576 9765625 60466176 282475249 1073741824 [9] 3486784401 100 David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plot discriminant analysis
I'm confused on how is the right way to plot a discriminant analysis made by lda function (MASS package). (I had attached my data fro reproduction). When I plot a lda object : X - read.table(data, header=T) lda_analysis - lda(formula(X), data=X) plot(lda_analysis) #the above plot is completely different to: plot(predict(lda_analysis)$x, col=palette()[predict(lda_analysis)$class]) that should be the same graph than the first? In the second case, I use predict function to obtain the LD1 and LD2 coordinates of lda_analysis (predict(lda_analysis)$x) and it's respective class (predict(lda_analysis)$class), but it seems that the classes are different: table(X$G3, predict(lda_analysis)$class) BG M B 2903 G0 26 2 M 40 46 any clues? Regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Strange characters that block import
Dear useRs, I try to import a text file that contain some strange characters coming from the misinterpretation of foreign language characters by another software (see below). Here is an example of text with a line containing characters that bug the import name;number zdsfbg;2 ;3 dtryjh;4 R do not want to import lines after those strange characters (i.e. import only the first two lines, one is the header, the second the first line of data). I already try to import using other encoding such as latin1 or UTF-8 but it does not solve the problem. Replacing those character in a text editor before importing solve the solution, but I want that the user of my script do not have to edit the text before the analysis in R. Any hint ?? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] post-hoc test with kruskal.test()
There is a post hoc test along the lines of the Kruskal-Wallis test. It is implemented on the help page of oneway_test from package coin. The authors of the package, Hothorn, Hornik, van de Wiel, and Zeileis, cite Hollander and Wolfe (1999) for details and say it is called the NemenyiDDamico-Wolfe-Dunn test. Or see nparcomp function in package nparcomp. There is also a post hoc test for the situation where a Friedman test has been done, and that is seen on the help page for SymmetryTests in package coin: the Wilcoxon-Nemenyi-McDonald-Thompson test: http://finzi.psych.upenn.edu/R/library/coin/html/SymmetryTests.html There is also an option of using the MTP function in the multtest package. http://finzi.psych.upenn.edu/R/library/multtest/html/MTP.html -- David Winsemius On Oct 14, 2009, at 3:17 AM, Robert Kalicki wrote: Dear R users, I would like to know if there is a way in R to execute a post-hoc test (factor levels comparison, like Tukey for ANOVA) of a non-parametric analysis of variance with kruskal.test() function. I am comparing three different groups. The preliminary analysis using the kruskal-wallis- test show significance, but I still don't know the relationship and the significance level between each group? Do you have any suggestion? -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] pairs
Dear all, I have two sets of data (say set1 and set2) as follow: set1 x1 x2 x3 0.30 0.43 3.88 0.38 0.59 3.53 0.30 0.42 2.12 0.33 0.53 2.12 0.30 0.47 3.76 set2 y1 y2 y3 0.32 0.47 5.18 0.23 0.26 1.06 0.42 0.65 3.88 0.28 0.38 3.76 0.35 0.47 1.41 The pairs function (such as pairs(~x1+x2+x3 data=set1, main=Simple Scatterplot Matrix) ) is producing scatterplot matrix where lower and upper diagonals have scatter plots of set1 variables. I want to produce a scatterplot matrix where in upper panel (diagonal) I should have plots from set1 variables and in lower panel (diagonal) I should have plots from set2 variables. Is there a way that I can do this? Any help is deeply appreciated. Kind Regards Seyit Ali -- Dr. Seyit Ali KAYIS Selcuk University Faculty of Agriculture Kampus, Konya, TURKEY s_a_ka...@yahoo.com,s_a_ka...@hotmail.com Tell: +90 332 223 2830 Mobile: +90 535 587 1139 Fax: +90 332 241 0108 Greetings from Konya, TURKEY http://www.ziraat.selcuk.edu.tr/skayis/ -- _ Facebook. k-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:en-nz:SI_SB_2:092010 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] post-hoc test with kruskal.test()
Robert, What do you mean by not symmetric? If you mean unbalanced in terms of sample size, that's not a problem if you choose the right specifications for wilcox.test. The Kruskal-Wallis-Test is a generalization of the unpaired Wilcoxon test for more than two groups. Not sure whether kruskal.test works with just two groups, but if so, it should give the same results as wilcox.test if you set the arguments accordingly. Having said that, I should mention that unlike some normality-based post-hoc tests, the proposed approch is not based on a common error term. The paired comparisons will ignore the fact that you had a third group, and this will in particular result in (possibly quite) different power of the three comparisons, depending on the sample sizes and the noise given in just these two groups. I wouldn't know what to do about that, though. Michael -Original Message- From: Robert Kalicki Sent: Mittwoch, 14. Oktober 2009 14:11 To: Meyners,Michael,LAUSANNE,AppliedMathematics Subject: RE: [R] post-hoc test with kruskal.test() Hi Michael, Thank you very much for your clear and prompt answer. Is it still valid if I use an unpaired comparison with wilcox.test() since my groups are not symmetric. Many thanks Robert -Message d'origine- De : Meyners,Michael,LAUSANNE,AppliedMathematics Envoyé : mercredi 14 octobre 2009 10:30 À : Robert Kalicki; r-help@r-project.org Objet : RE: [R] post-hoc test with kruskal.test() Robert, you can do the corresponding paired comparisons using wilcox.test. As far as I know, there is no such general correction as Tukey's HSD for the Kruskal-Wallis-Test. However, if you have indeed only 3 groups (resulting in 3 paired comparisons), the intersection-union principle and the theory of closed test procedures should allow you to do these test without further correction, given the global test was statistically significant. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Robert Kalicki Sent: Mittwoch, 14. Oktober 2009 09:17 To: r-help@r-project.org Subject: [R] post-hoc test with kruskal.test() Dear R users, I would like to know if there is a way in R to execute a post-hoc test (factor levels comparison, like Tukey for ANOVA) of a non-parametric analysis of variance with kruskal.test() function. I am comparing three different groups. The preliminary analysis using the kruskal-wallis-test show significance, but I still don't know the relationship and the significance level between each group? Do you have any suggestion? Many thanks in advance! Robert ___ Robert M. Kalicki, MD Postdoctoral Fellow Department of Nephrology and Hypertension Inselspital University of Bern Switzerland Address: Klinik und Poliklinik für Nephrologie und Hypertonie KiKl G6 Freiburgstrasse 15 CH-3010 Inselspital Bern Tel +41(0)31 632 96 63 Fax+41(0)31 632 14 58 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RCMD Rdconv debugging output
Thanks Duncan, this solved my problem. Here is another thingy I noticed \title{RExcel - Using \R from within Excel} produces RExcel - Using list() from within Excel So the \R macro cannot be used in titles. Is this intentional? Duncan Murdoch wrote: On 10/14/2009 7:45 AM, Erich Neuwirth wrote: I am trying (on Windows XP, with R 2.10.0beta) to use library(tools) Rd2HTML(myfile.Rd, out=myfile.html) and any warnings will show up in the usual way in the console. For more extensive checks, you can use checkRd(myfile.Rd) Duncan Murdoch -- Erich Neuwirth, University of Vienna Faculty of Computer Science Computer Supported Didactics Working Group Visit our SunSITE at http://sunsite.univie.ac.at Phone: +43-1-4277-39464 Fax: +43-1-4277-39459 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SPSS long variable names
The problem is the limit of 8 characters long on variable names. And again, my answer is that one approach would be to map SHORT names to long variable LABELS. This was a common use of labels before variable names supported 64 bytes. After reading into R with read.spss() you and easily replace the short R names with the long LABELS to form long R names. If for for some reason you are unwilling to give up some existing LABELS that are not, you could create some dummy variables for just this mapping purpose. - Original Message - From: Orvalho Augusto orvaq...@gmail.com To: Robert Baer rb...@atsu.edu Cc: r-help@r-project.org Sent: Tuesday, October 13, 2009 10:39 AM Subject: Re: [R] SPSS long variable names No! That is variable labels. Caveman On Tue, Oct 13, 2009 at 4:52 PM, Robert Baer rb...@atsu.edu wrote: I am wondering if there is a patch for the SPSS reading code on the foreign package, in order to be able to read long variable names. Right now read.spss() just trunc the names to 8 characters. This sequence seems to access the long filenames for me if I know what you are asking for: library('foreign') a-read.spss('fil.sav') lnames - attr(a,variable.labels,exact=FALSE) Rob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] currency conversion function?
There was such a function in the form of getFX in package=quantmod , but testing it makes me think there might have been a change in how the website on which it depended might be expecting to get requests. library(quantmod) getFX(EUR/USD,from=as.Date(2008-01-01)) Error: oanda.com limits data to 500 days per request You might look at how they implemented it and see if it could be modified to work with your selected target web-server. Or you could see if Rowlingson's reply to james in the archives was helpful: http://finzi.psych.upenn.edu/Rhelp08/2009-June/202979.html -- David On Oct 14, 2009, at 5:40 AM, Liviu Andronic wrote: Dear all Is there any R function that would perform currency conversion using up-to-date exchange rates? I would be looking for a function that allows to download recent exchange rates (say, from Yahoo!) and then use these in converting currencies (say, USD to EUR). I am not sure whether r-sig-finance would be more appropriate, but the (off-)topic feels general enough to me. Thank you Liviu -- Do you know how to read? http://www.alienetworks.com/srtest.cfm Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 scale_shape question
Is there a way to have some points solid and some points hollow? I have two classes of points and there are so many points, that it's hard to see just the difference in shapes. I'd like to have one of the classes be hollow in addition to being a different shape. Any help would be grand. You'll need to do it yourself with scale_shape_manual - see the appendix (http://had.co.nz/ggplot2/book/appendices.pdf) for specification of point shapes. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot discriminant analysis
Hi, I did it with Iris - data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]), Sp = rep(c(s,c,v), rep(50,3))) train - sample(1:150, 75) table(Iris$Sp[train]) z - lda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train) Then I did plot(z,xlim=c(-10,10),ylim=c(-10,10)) before drawing points(predict(z)$x, col=palette()[predict(z)$class],xlim=c(-10,10),ylim=c(-10,10)) and all the points are superimposed. The only difference I found was the different x- and y-axis when I drew them separately, i.e. plot(z) plot(predict(z)$x, col=palette()[predict(z)$class]) Alain Alejo C.S. wrote: I'm confused on how is the right way to plot a discriminant analysis made by lda function (MASS package). (I had attached my data fro reproduction). When I plot a lda object : X - read.table(data, header=T) lda_analysis - lda(formula(X), data=X) plot(lda_analysis) #the above plot is completely different to: plot(predict(lda_analysis)$x, col=palette()[predict(lda_analysis)$class]) that should be the same graph than the first? In the second case, I use predict function to obtain the LD1 and LD2 coordinates of lda_analysis (predict(lda_analysis)$x) and it's respective class (predict(lda_analysis)$class), but it seems that the classes are different: table(X$G3, predict(lda_analysis)$class) BG M B 2903 G0 26 2 M 40 46 any clues? Regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Alain Guillet Statistician and Computer Scientist SMCS - Institut de statistique - Université catholique de Louvain Bureau c.316 Voie du Roman Pays, 20 B-1348 Louvain-la-Neuve Belgium tel: +32 10 47 30 50 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using mapply to avoid loops
On Oct 14, 2009, at 12:58 AM, Stephen Samaha wrote: Hello, I would like to use mapply to avoid using a loop but for some reason, I can't seem to get it to work. I've included copies of my code below. The first set of code uses a loop (and it works fine), and the second set of code attempts to use mapply but I get a subscript out of bounds error. Any guidance would be greatly appreciated. Xj, Yj, and Wj are also lists, and s2, TAU, and GAMMA are scalars. Thank You. # THIS WORKS USING THE LOOP for (j in 1:J) { V.tilde.j - solve((1/s2)*t(Xj[[j]])%*%Xj[[j]] + solve(TAU)) # Not singular case: if(round(det(t(Xj[[j]])%*%Xj[[j]]),8)!=0) { Beta.hat.j - solve(t(Xj[[j]])%*%Xj[[j]])%*%t(Xj[[j]])%*%Yj[[j]] V.j - s2*solve(t(Xj[[j]])%*%Xj[[j]]) Lambda.j - solve(solve(V.j) + solve(TAU))%*%solve(V.j) Beta.tilde.j - Lambda.j%*%Beta.hat.j + (diag(P) - Lambda.j)%* %Wj[[j]]%*%GAMMA } # Singular case else { Beta.tilde.j - V.tilde.j%*%((1/s2)*t(Xj[[j]])%*%Yj[[j]] + solve(TAU)%*%Wj[[j]]%*%GAMMA) } BETA.Js[[j]] - t(rmnorm(1, mean=as.vector(Beta.tilde.j), V.tilde.j)) } # THIS DOESN'T WORK USING MAPPLY update.betas - function(s2,Xj,Yj,TAU,Wj,GAMMA) { V.tilde.j - solve((1/s2)*t(Xj[[j]])%*%Xj[[j]] + solve(TAU)) # Not singular case: if(round(det(t(Xj[[j]])%*%Xj[[j]]),8)!=0) { Beta.hat.j - solve(t(Xj[[j]])%*%Xj[[j]])%*%t(Xj[[j]])%*%Yj[[j]] V.j - s2*solve(t(Xj[[j]])%*%Xj[[j]]) Lambda.j - solve(solve(V.j) + solve(TAU))%*%solve(V.j) Beta.tilde.j - Lambda.j%*%Beta.hat.j + (diag(P) - Lambda.j)%* %Wj[[j]]%*%GAMMA } # Singular case else { Beta.tilde.j - V.tilde.j%*%((1/s2)*t(Xj[[j]])%*%Yj[[j]] + solve(TAU)%*%Wj[[j]]%*%GAMMA) } BETA.Js[[j]] - t(rmnorm(1, mean=as.vector(Beta.tilde.j), V.tilde.j)) return(Beta.tilde.j) } BETA.Js - mapply(update.betas,s2,Xj,Yj,TAU,Wj,GAMMA,SIMPLIFY=FALSE) (It would be more courteous to offer the error messages you are currently keeping secret.) Could it be because you are offering a mix of vectors and scalars to mapply without properly segregating them? See the help page for mapply and the moreArgs argument. -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] currency conversion function?
Hi Liviu, try yahooSeries from fImport example: library(fImport) yahooSeries(EURUSD=X) Best, andreas -- www.er.ethz.ch/people/ahuesler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] time grid for survfit Survival function outputs
... is it possible we could make survival function outputs on the pre-specified time grid with fixed increment and fixed length. Look at the help file for summary.survfit. Interpolating the raw data is somewhat harder than you might think for the number at risk component. fit - survfit( summary(fit, times=c(0,10,20,30,... Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Handle lot of variables - Regression
Hey, I've got a data set (e.g. named Data) which contains a lot of variables, for example: s1, s2, ..., s50 My first question is: It is possible to do this: Data$s1 But is it also possible to do something like this: Data$s1:s50 (I've tried a lot of versions of those without a result) My second question: I want to do a stepwise logistic regression. For this purpose I use the following procedures: result-glm(...) step(result, direction=forward) Now the problem I have, is, that I have to include all my 50 variables (s1-s50), but I don't want to write them all down like y~s1+s2+s3+s4... (furthermore it has to be implemented in a loop, so I really need it). I've tried do store the 50 variables in a list (e.g. list[[1]]) and tried this: result-glm(y ~ list[[1]], ...) This works! But if I try to do it stepwise result2-step(result) I always get the same results as from glm without a stepwise approach. So obviously R can't handle this if you put a list in. How can I make this work? Thanks in advance, Anna -- View this message in context: http://www.nabble.com/Handle-lot-of-variables---Regression-tp25889056p25889056.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem when resizing graphics devices
Dear R users, When I try to resize a graphics device in R, I often get this warning message (mostly when I already have several other graphics devices open) : Not enough memory to modify the size. Alpha blending is desactivated (translation from the french message: Mémoire insuffisante pour modifier la taille. L'alpha blending est désactivé). Following this message I sometimes succeed in trying to resize the device a second time, but sometimes I then receive this new message: Not enough memory to modify the size. The device will be closed. (translation from the french message: Mémoire insuffisante pour modifier la taille. Le périphérique va être fermé) and then the graphics device shut, followed by the shuting of my whole R session! I'm currently working with the R version 2.9.0 on windows (but I had the same problem with my previous R version) Do you know what is the problem and how I can fix it? Thanks a lot for your help Lucie -- Lucie Büchi PhD Student Department of Ecology and Evolution University of Lausanne 1015 Dorigny Switzerland _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R version of MATLAB symbolic toolbox (variable substitution)
I'm translating some MATLAB code into R and have not found a simple equivalent of the function R = subs(S,old,new). I have, for example, a matrix such as this mx- function(){ matrix( c(0, f1, f2, s1, 0, 0, 0, s2, 0), 3,3, byrow=T) } and a matrix of data dat-matrix(c(1,2,3,4,2,3,4,5),2,4, byrow=T, dimnames-list(NULL, c(f1,f2,s1,s2))) I want to do two things with this matrix that seem to require different formats. 1. evaluate this matrix many times using data from a matrix (for stochastic simulation). In the function form above, I can use attach(as.data.frame(dat)) and the correct variables are fed to mx, but I'd rather avoid using attach if possible. 2. I also want to manipulate the matrix (i.e., take the derivative of each element with respect to a certain parameter). If I use mx-c(0, expression(f1), expression(f2)) etc then I can use deriv(mx[2], c(f1,f2)) etc to take the derivatives. BUT, I can't find how to then evaluate this version (in one line) for a row of data in dat. f1-2 f2-4 eval(mx) gives the scalar 4 (the last element of mx) rather than the vector. I haven't come up with a form for mx that achieves both goals, while the symbolic toolbox can do each in one line of code. Does a clone package exist? I didn't see anything useful in R's Matlab package. In lieu of such a package I'll settle for being able to evaluate a vector of expressions. Probably I'm missing simple syntax here. Thanks in advance, Jen Young __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem when resizing graphics devices
Can you give us a reproducible example of R commands that cause these messages on your system? Like by creating simulated data.frames, opening several devices, etc? -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Lucie Buchi Sent: Wednesday, October 14, 2009 6:08 AM To: r-help@r-project.org Subject: [R] problem when resizing graphics devices Dear R users, When I try to resize a graphics device in R, I often get this warning message (mostly when I already have several other graphics devices open) : Not enough memory to modify the size. Alpha blending is desactivated (translation from the french message: Mémoire insuffisante pour modifier la taille. L'alpha blending est désactivé). Following this message I sometimes succeed in trying to resize the device a second time, but sometimes I then receive this new message: Not enough memory to modify the size. The device will be closed. (translation from the french message: Mémoire insuffisante pour modifier la taille. Le périphérique va être fermé) and then the graphics device shut, followed by the shuting of my whole R session! I'm currently working with the R version 2.9.0 on windows (but I had the same problem with my previous R version) Do you know what is the problem and how I can fix it? Thanks a lot for your help Lucie -- Lucie Büchi PhD Student Department of Ecology and Evolution University of Lausanne 1015 Dorigny Switzerland _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Handle lot of variables - Regression
anna0102 wrote: I've got a data set (e.g. named Data) which contains a lot of variables, for example: s1, s2, ..., s50 My first question is: It is possible to do this: Data$s1 But is it also possible to do something like this: Data$s1:s50 (I've tried a lot of versions of those without a result) Use the [] notation. For example Data[,c(s1,s2,s3)] or even better Data[,grep(s.*,names(a),value=TRUE)] anna0102 wrote: I want to do a stepwise logistic regression. For this purpose I use the following procedures: result-glm(...) step(result, direction=forward) Now the problem I have, is, that I have to include all my 50 variables (s1-s50), but I don't want to write them all down like y~s1+s2+s3+s4... (furthermore it has to be implemented in a loop, so I really need it). Construct the formula dynamically. But please, start with only 3 or 4 variables and try if it work. Sometimes deep inside functions things can go wrong with this method, requiring Ripley's game-like workarounds. See http://finzi.psych.upenn.edu/R/Rhelp02a/archive/16599.html a=data.frame(s=1:10,s2=1:10,s4=1:10) form = paste(z~,grep(s.*,names(a),value=TRUE),collapse=+) glm(form,) And be aware of the nonsense you can (replace by will certainly) get with stepwise regression and so many parameters. If I were to be treated by a cure created by stepwise regression, I would prefer voodoo. Search for Harrell stepwise read Frank's well justified soapboxes. Dieter -- View this message in context: http://www.nabble.com/Handle-lot-of-variables---Regression-tp25889056p25892047.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange characters that block import
On 10/14/2009 8:25 AM, arnaud Mosnier wrote: Dear useRs, I try to import a text file that contain some strange characters coming from the misinterpretation of foreign language characters by another software (see below). Here is an example of text with a line containing characters that bug the import name;number zdsfbg;2 ;3 dtryjh;4 R do not want to import lines after those strange characters (i.e. import only the first two lines, one is the header, the second the first line of data). I already try to import using other encoding such as latin1 or UTF-8 but it does not solve the problem. Replacing those character in a text editor before importing solve the solution, but I want that the user of my script do not have to edit the text before the analysis in R. Any hint ?? Those funny characters are octal 032, Ctrl-Z. Years ago that was defined on DOS/Windows as an end of file marker, and I guess our code still honours that. You can work around it by stating that you're reading from a binary file, not a text file: f - file(text.txt, rb) Then read.csv2(f) fails, but readLines(f) succeeds, so this works: f - file(c:/temp/test.txt, rb) read.csv2(textConnection(readLines(f))) name number 1zdsfbg 2 2 \032\032 \032\032 3 3dtryjh 4 close(f) I don't know if there are any characters that would cause readLines to fail, but there might be, so I'd suggest replacing the buggy software that caused all the problems in the first place. Duncan Murdoch Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RCMD Rdconv debugging output
On 10/14/2009 8:42 AM, Erich Neuwirth wrote: Thanks Duncan, this solved my problem. Here is another thingy I noticed \title{RExcel - Using \R from within Excel} produces RExcel - Using list() from within Excel So the \R macro cannot be used in titles. Is this intentional? Yes, and it is documented that way. There was some talk about removing that restriction, but I don't think it will make it into 2.10.0. Duncan Murdoch Duncan Murdoch wrote: On 10/14/2009 7:45 AM, Erich Neuwirth wrote: I am trying (on Windows XP, with R 2.10.0beta) to use library(tools) Rd2HTML(myfile.Rd, out=myfile.html) and any warnings will show up in the usual way in the console. For more extensive checks, you can use checkRd(myfile.Rd) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Taking specific/timed differences in a zoo timeseries
Hello everyone. I have a specific problem that I have difficulties to solve. Assume I have a zoo object: set.seed(12345) data - round(runif(27)*10+runif(27)*5, 0) dates - as.Date(c(09/03/09, 09/04/09, 09/07/09, 09/09/09, 09/10/09, 09/11/09, 09/14/09, 09/16/09, 09/17/09, 09/18/09, 09/21/09, 09/22/09, 09/23/09, 09/24/09, 09/25/09, 09/28/09, 09/29/09, 09/30/09, 10/01/09, 10/02/09, 10/05/09, 10/06/09, 10/07/09, 10/08/09, 10/09/09, 10/13/09, 10/14/09), %m/%d/%y) temp - zoo(data, order.by=dates) What I need to do is to take differences between say October 14th and September 14, then October 13th and September 13th, that is 1 month difference independent of number of days inbetween. And when there is no matching date in an earlier month, like here where there is no September 13th, the date should be the first preceding date, that is September 11th in this example. How can I do that? The above is just an example, my zoo object is very big and I need to take differences between years, that is between October 14th, 2009 and October 14th, 2008, then Oct.13, 2009 and Oct.13, 2008, and so on. Also, the time index of my zoo object has format 10/14/09 (that is Oct.14, 2009), and that is the format I need to operate with and do not want to change. In the example I reformated just so that I can create a zoo object. Could some friendly person please show me how to do such a calculation? Thank you in advance! Best, Sergey __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Regression Question
Alexandre, Let me add two small points to Ted's exposition: 1. you can use the extractor function coefficients(), or just coef() on the summary: coef(summary(regression)) which will also give you the matrix of estimates, etc. 2. You will find that using the function str() often is is most helpful: str(summary(regression)) or str(coef(summary(regression))) -Peter Ehlers (Ted Harding) wrote: On 13-Oct-09 21:17:11, Alexandre Cohen wrote: Dear Sir or Madam, I am a student at MSc Probability and Finance at Paris 6 University/ Ecole Polytechnique. I am using R and I can't find an answer to the following question. I will be very thankful if you can answer it. I have two vectors rendements_CAC40 and rendements_AlcatelLucent. I use the lm function as follows, and then the sumarry function: regression=lm(rendements_CAC40 ~ rendements_AlcatelLucent); sum=summarry(regression); I obtain: Call: lm(formula = rendements_CAC40 ~ rendements_AlcatelLucent) Residuals: Min 1Q Median 3Q Max -6.43940 -0.84170 -0.01124 0.76235 9.08087 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -0.035790.07113 -0.5030.615 rendements_AlcatelLucent 0.339510.01732 19.608 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.617 on 515 degrees of freedom Multiple R-squared: 0.4274, Adjusted R-squared: 0.4263 F-statistic: 384.5 on 1 and 515 DF, p-value: 2.2e-16 I would like to access to the p-value field, but I can't find the name of it, as we can see it below: names(sum) [1] call terms residuals coefficients aliased sigma dfr.squared [9] adj.r.squared fstatisticcov.unscaled I thought that I could find it in the fstatistic field, but it is not: sum$fstatistic valuenumdfdendf 384.4675 1. 515. Thank in advance for your time, Kind regards, Alexandre Cohen Assuming you gave executed your code with summary correctly spelled (i.e. not summarry or sumarry as you have written above), then the information you require can be found in sum$coefficients which you can as well write as sum$coef You will find that sum$coef is an array with 4 columns (Estimate, Std. Error, t value and Pr(|t|)), so the P-values are in the final column sum$coef[,4]. Emulating your calculation above with toy regression data: X - (0:10) ; Y - 1.0 + 0.25*X + 2.5*rnorm(11) regression - lm(Y~X) sum - summary(regression) sum # Call: # lm(formula = Y ~ X) # Residuals: # Min 1Q Median 3Q Max # -5.7182 -1.5383 0.2989 1.9806 3.9364 # Coefficients: # Estimate Std. Error t value Pr(|t|) # (Intercept) 2.100351.81418 1.1580.277 # X -0.031470.30665 -0.1030.921 # # Residual standard error: 3.216 on 9 degrees of freedom # Multiple R-squared: 0.001169, Adjusted R-squared: -0.1098 # F-statistic: 0.01053 on 1 and 9 DF, p-value: 0.9205 sum$coef # Estimate Std. Errort value Pr(|t|) # (Intercept) 2.1003505 1.8141796 1.1577412 0.2767698 # X -0.0314672 0.3066523 -0.1026152 0.9205184 sum$coef[,4] # (Intercept) X # 0.2767698 0.9205184 [And, by the way, although it in fact works, it is not a good idea to use a function name (sum) as the name of a variable.] Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 14-Oct-09 Time: 10:53:28 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] metaMDS NMDS: use of alternative distances?
Dear r-helpers! How can I integrate other distances (in the form of a dist object) into function metaMDS? The problem: metaMDS needs the original data.frame for the calculation and only the default distances of function vegdist are allowed. Any suggestions are greatly appreciated! Thank you, Kim -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange characters that block import
On Wed, 14 Oct 2009, Duncan Murdoch wrote: On 10/14/2009 8:25 AM, arnaud Mosnier wrote: Dear useRs, I try to import a text file that contain some strange characters coming from the misinterpretation of foreign language characters by another software (see below). Here is an example of text with a line containing characters that bug the import name;number zdsfbg;2 ;3 dtryjh;4 R do not want to import lines after those strange characters (i.e. import only the first two lines, one is the header, the second the first line of data). I already try to import using other encoding such as latin1 or UTF-8 but it does not solve the problem. If these are control characters (that is ^Z is Ctrl-Z, but we've no real information) then those are the same in every encoding that uses bytes (or at least those known to iconv). Replacing those character in a text editor before importing solve the solution, but I want that the user of my script do not have to edit the text before the analysis in R. Any hint ?? Those funny characters are octal 032, Ctrl-Z. Years ago that was defined on DOS/Windows as an end of file marker, and I guess our code still honours that. More to the point, the Windows C run-time does (AFAIK Ctrl-Z is still current as EOF under Windows, and Wikipedia says so too), but nothing in the original posting mentioned this was on Windows, and ctrl-Z has no effect on the two other OSes I tried which read such a file successfully. So without a single piece of the 'at a minimum' information requested in the posting guide, we are guessing (and I am guessing your example was done under Windows, too). You can work around it by stating that you're reading from a binary file, not a text file: f - file(text.txt, rb) Then read.csv2(f) fails, but readLines(f) succeeds, so this works: f - file(c:/temp/test.txt, rb) read.csv2(textConnection(readLines(f))) name number 1zdsfbg 2 2 \032\032 \032\032 3 3dtryjh 4 close(f) I don't know if there are any characters that would cause readLines to fail, but there might be, so I'd suggest replacing the buggy software that caused all the problems in the first place. This is all a function of the OS's C runtime: I suspect Ctrl-D (eot) is interpreted as end-of-file on some OSes. Nul (\0) will terminate strings (that's standard in C, and enforced in recent versions of R). Duncan Murdoch -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Taking specific/timed differences in a zoo timeseries
Try this: library(zoo) # temp - ... from post asking question # create a day sequence, dt, with no missing days # and create a 0 width series with those times. # merge that with original series giving original # series plus a bunch of times having NA values. # Use na.locf to fill in those values with the last # non-missing so far. rng - range(time(temp)) dt - seq(rng[1], rng[2], day) temp.m - na.locf(merge(temp, zoo(, dt))) # create a lagged time scale and subtract the # lagged series from original dt.lag - as.Date(as.yearmon(dt)+1/12) + as.numeric(format(dt, %d)) - 1 temp - zoo(coredata(temp.m), dt.lag) Using your data the output from the last line is: temp - zoo(coredata(temp.m), dt.lag) 2009-10-05 2009-10-06 2009-10-07 2009-10-08 2009-10-09 2009-10-13 2009-10-14 -5 -6 3 2 -2 2 1 On Wed, Oct 14, 2009 at 10:39 AM, Sergey Goriatchev serg...@gmail.com wrote: Hello everyone. I have a specific problem that I have difficulties to solve. Assume I have a zoo object: set.seed(12345) data - round(runif(27)*10+runif(27)*5, 0) dates - as.Date(c(09/03/09, 09/04/09, 09/07/09, 09/09/09, 09/10/09, 09/11/09, 09/14/09, 09/16/09, 09/17/09, 09/18/09, 09/21/09, 09/22/09, 09/23/09, 09/24/09, 09/25/09, 09/28/09, 09/29/09, 09/30/09, 10/01/09, 10/02/09, 10/05/09, 10/06/09, 10/07/09, 10/08/09, 10/09/09, 10/13/09, 10/14/09), %m/%d/%y) temp - zoo(data, order.by=dates) What I need to do is to take differences between say October 14th and September 14, then October 13th and September 13th, that is 1 month difference independent of number of days inbetween. And when there is no matching date in an earlier month, like here where there is no September 13th, the date should be the first preceding date, that is September 11th in this example. How can I do that? The above is just an example, my zoo object is very big and I need to take differences between years, that is between October 14th, 2009 and October 14th, 2008, then Oct.13, 2009 and Oct.13, 2008, and so on. Also, the time index of my zoo object has format 10/14/09 (that is Oct.14, 2009), and that is the format I need to operate with and do not want to change. In the example I reformated just so that I can create a zoo object. Could some friendly person please show me how to do such a calculation? Thank you in advance! Best, Sergey __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] metaMDS NMDS: use of alternative distances?
On Wed, 2009-10-14 at 16:57 +0200, Kim Vanselow wrote: Dear r-helpers! How can I integrate other distances (in the form of a dist object) into function metaMDS? The problem: metaMDS needs the original data.frame for the calculation and only the default distances of function vegdist are allowed. Any suggestions are greatly appreciated! Thank you, Kim Read the help page for metaMDS more closely? ;-) the first argument of metaMDS is 'comm'; this is documented as: comm: Community data. Alternatively, dissimilarities either as a 'dist' structure or as a symmetric square matrix. In the latter case all other stages are skipped except random starts and centring and pc rotation of axes. notice the bit about dissimilarities - which can either by square symmetric matrices or objects of class 'dist'. When you supply your own distances, not all the transformations and other options in metaMDS are turned on, so you may want to check the effects of transformations etc yourself, which you would apply before computing the dissimilarity matrix. HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting initial numerals
Josh, One way would be to convert the numeric vector to a character and use the function substr(). Following code returns a numeric vector with the 2 first digits of every element. naics=c(238321, 624410, 484121 ,238911, 81, 531110, 621399,541613,524210 ,236115 ,811121 ,236115 ,236115 ,621610 ,814110 ,812320) First_Two - as.numeric ( substr ( as.character ( naics ), 1, 2 ) ) First_Two see also ?substr Cheers Joris On Wed, Oct 14, 2009 at 5:05 PM, Josh Roll j_r...@hotmail.com wrote: Joris, I figured out that was my issue. Thanks for your insights. However i need the first two digits of the numeral not the last two. How do i coerce the code to get this outcome. Cheers Date: Wed, 14 Oct 2009 13:50:54 +0200 Subject: Re: [R] Selecting initial numerals From: jorism...@gmail.com To: j_r...@hotmail.com CC: r-help@r-project.org On Tue, Oct 13, 2009 at 6:48 PM, PDXRugger j_r...@hotmail.com wrote: II just want to create a new object with the first two numerals of the data. Not sure why this isnt working, consider the following: EmpEst$naics=c(238321, 624410, 484121 ,238911, 81, 531110, 621399, 541613, 524210 ,236115 ,811121 ,236115 ,236115 ,621610 ,814110 ,812320) EmpEst$naics2-formatC(EmpEst$naics %% 1e2, width=2, flag=, mode =integer) #RESULT:Warning message: #In Ops.factor(EmpEst$naics, 100) : %% not meaningful for factors Wild guess : you get this warning EmpEst$naics is a factor? Quite some errors and warnings mean mostly what they say. If you see similar errors or warnings, please use the function str() first to check your data structure. For example : str(EmpEst$naics) You should also make sure you provide us with self contained, reproducible code. As we don't have the dataframe EmpEst, I cannot run the code you sent. If I change it, I don't get the error. Below a few code snippets to illustrate how the problem arises, and how to get it away : naics=c(238321, 624410, 484121 ,238911, 81, 531110, 621399,541613, + 524210 ,236115 ,811121 ,236115 ,236115 ,621610 ,814110 ,812320) naics2-formatC(naics %% 1e2, width=2, flag=, mode + =integer) naics2 [1] 21 10 21 11 11 10 99 13 10 15 21 15 15 10 10 [16] 20 No error, as vector naics is a numerical vector. I make it a factor : naics=factor(c(238321, 624410, 484121 ,238911, 81, 531110, 621399,541613, + 524210 ,236115 ,811121 ,236115 ,236115 ,621610 ,814110 ,812320)) naics2-formatC(naics %% 1e2, width=2, flag=, mode + =integer) Warning message: In Ops.factor(naics, 100) : %% not meaningful for factors naics2 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [16] NA Which is what you see. You can transform a factor to a numerical vector with a combination of as.numeric(as.character()). This is necessary as you would otherwise get the internal values for the factor levels (i.e. the numbers 1, 2, ... n with n being the number of levels.) naics=factor(c(238321, 624410, 484121 ,238911, 81, 531110, 621399,541613, + 524210 ,236115 ,811121 ,236115 ,236115 ,621610 ,814110 ,812320)) naics2-formatC(as.numeric(as.character(naics)) %% 1e2, width=2, flag=, mode + =integer) naics2 [1] 21 10 21 11 11 10 99 13 10 15 21 15 15 10 10 [16] 20 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] axis label
Hi all, I want the y-axis label to be ( in symbols) g(sigma given alpha) where given is the conditional sign. I've tried ylab=expression(g(sigma|alpha))) but it gave me g(|(sigma,alpha)) where the sigma and alpha are in greek but the conditional sign is misplaced (before the bracket) Any help would be appreciated. Maram. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] axis label
Hi Maram, How about this? plot(1, ylab = expression(sigma*|*alpha)) HTH, Jorge On Wed, Oct 14, 2009 at 11:12 AM, maram salem wrote: Hi all, I want the y-axis label to be ( in symbols) g(sigma given alpha) where given is the conditional sign. I've tried ylab=expression(g(sigma|alpha))) but it gave me g(|(sigma,alpha)) where the sigma and alpha are in greek but the conditional sign is misplaced (before the bracket) Any help would be appreciated. Maram. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] axis label
Try this: plot(0, main = ~ g(sigma * | * alpha)) On Wed, Oct 14, 2009 at 11:12 AM, maram salem marammagdysa...@yahoo.com wrote: Hi all, I want the y-axis label to be ( in symbols) g(sigma given alpha) where given is the conditional sign. I've tried ylab=expression(g(sigma|alpha))) but it gave me g(|(sigma,alpha)) where the sigma and alpha are in greek but the conditional sign is misplaced (before the bracket) Any help would be appreciated. Maram. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SPSS long variable names
On Wed, Oct 14, 2009 at 2:45 PM, Robert Baer rb...@atsu.edu wrote: The problem is the limit of 8 characters long on variable names. And again, my answer is that one approach would be to map SHORT names to long variable LABELS. This was a common use of labels before variable names supported 64 bytes. After reading into R with read.spss() you and easily replace the short R names with the long LABELS to form long R names. If for for some reason you are unwilling to give up some existing LABELS that are not, you could create some dummy variables for just this mapping purpose. Dear Robert, the problem with this approach is that : 1) if the first 8 characters of some variable names in the SPSS dataset are the same, you'd get confusing results. Mapping the labels on the short names might get complicated too, although I'm not sure on that. 2) data labels can contain spaces (and often do), so they cannot be readily used as variable names in R. Cheers Joris - Original Message - From: Orvalho Augusto orvaq...@gmail.com To: Robert Baer rb...@atsu.edu Cc: r-help@r-project.org Sent: Tuesday, October 13, 2009 10:39 AM Subject: Re: [R] SPSS long variable names No! That is variable labels. Caveman On Tue, Oct 13, 2009 at 4:52 PM, Robert Baer rb...@atsu.edu wrote: I am wondering if there is a patch for the SPSS reading code on the foreign package, in order to be able to read long variable names. Right now read.spss() just trunc the names to 8 characters. This sequence seems to access the long filenames for me if I know what you are asking for: library('foreign') a-read.spss('fil.sav') lnames - attr(a,variable.labels,exact=FALSE) Rob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating a list of empty lists
On 10/13/2009 10:06 AM, Henrique Dallazuanna wrote: Try this: replicate(3, list()) Thanks! I now have three ways to achieve my goal: 1: rep(list(list()), 3) 2: replicate(3, list()) 3: Due to the way R recycles arguments, I found that it is enough to have construct a list(list()), and then perform an assignment using an argument of the length I want (using mapply()). The empty list is then recycled enough times to hold the corresponding values. Best, Magnus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] post-hoc test with kruskal.test()
On Wed, 14 Oct 2009, Meyners, Michael, LAUSANNE, AppliedMathematics wrote: Robert, What do you mean by not symmetric? If you mean unbalanced in terms of sample size, that's not a problem if you choose the right specifications for wilcox.test. The Kruskal-Wallis-Test is a generalization of the unpaired Wilcoxon test for more than two groups. Not sure whether kruskal.test works with just two groups, but if so, it should give the same results as wilcox.test if you set the arguments accordingly. Having said that, I should mention that unlike some normality-based post-hoc tests, the proposed approch is not based on a common error term. The paired comparisons will ignore the fact that you had a third group, and this will in particular result in (possibly quite) different power of the three comparisons, depending on the sample sizes and the noise given in just these two groups. I wouldn't know what to do about that, though. It's worse than that: you don't necessarily even get the test in the same *direction* when you ignore the third group, though it takes some effort to produce a good example. There's a nice paper by Brown Hettmansperger in ANZ J Stat a few years ago where they look at the decomposition of the KW test into paired tests and 'non-transitivity' components. -thomas Michael -Original Message- From: Robert Kalicki Sent: Mittwoch, 14. Oktober 2009 14:11 To: Meyners,Michael,LAUSANNE,AppliedMathematics Subject: RE: [R] post-hoc test with kruskal.test() Hi Michael, Thank you very much for your clear and prompt answer. Is it still valid if I use an unpaired comparison with wilcox.test() since my groups are not symmetric. Many thanks Robert -Message d'origine- De : Meyners,Michael,LAUSANNE,AppliedMathematics Envoyé : mercredi 14 octobre 2009 10:30 À : Robert Kalicki; r-help@r-project.org Objet : RE: [R] post-hoc test with kruskal.test() Robert, you can do the corresponding paired comparisons using wilcox.test. As far as I know, there is no such general correction as Tukey's HSD for the Kruskal-Wallis-Test. However, if you have indeed only 3 groups (resulting in 3 paired comparisons), the intersection-union principle and the theory of closed test procedures should allow you to do these test without further correction, given the global test was statistically significant. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Robert Kalicki Sent: Mittwoch, 14. Oktober 2009 09:17 To: r-help@r-project.org Subject: [R] post-hoc test with kruskal.test() Dear R users, I would like to know if there is a way in R to execute a post-hoc test (factor levels comparison, like Tukey for ANOVA) of a non-parametric analysis of variance with kruskal.test() function. I am comparing three different groups. The preliminary analysis using the kruskal-wallis-test show significance, but I still don't know the relationship and the significance level between each group? Do you have any suggestion? Many thanks in advance! Robert ___ Robert M. Kalicki, MD Postdoctoral Fellow Department of Nephrology and Hypertension Inselspital University of Bern Switzerland Address: Klinik und Poliklinik für Nephrologie und Hypertonie KiKl G6 Freiburgstrasse 15 CH-3010 Inselspital Bern Tel +41(0)31 632 96 63 Fax+41(0)31 632 14 58 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SPSS long variable names
Hi The .dat file is a tab delimited file with the long variables names on it. The .sps file has the instructions to read the .dat and place all the variable and value labels. The ideia of reading the dat directely is good but I need the labels placed. Yes I could read the dat file and parese myself the sps to get sps file but that it is a job that to try and by these mails I will have to do so. For now what I made a little script to generate the sav using pspp; Then I read the sav with read.spss. Then I call read.delim to read the long names on the dat file. It works but is not beautfull and uses more resources. The great thing would be to support long variables names on read.spss. Thanks guys for everything Caveman On Wed, Oct 14, 2009 at 4:52 PM, joris meys jorism...@gmail.com wrote: Hi Orvalho, question : where do the .dat files come from and what do you have to do with the SPSS syntax files. I guess the syntax file is to change the .dat file into SPSS format. But you could take the shortcut and read in the .dat file directly. If the SPSS syntax file is a text file (which should be), you can construct your own function to read in all specifications from the syntax file. the function regexp() can be a great help for that. If you have no clue how to do that, just send me an example, and I'll take a look. Cheers Joris On Sat, Oct 10, 2009 at 6:14 PM, Orvalho Augusto orvaq...@gmail.com wrote: Hello guys I am new to this list and for R too. I am wondering if there is a patch for the SPSS reading code on the foreign package, in order to be able to read long variable names. Right now read.spss() just trunc the names to 8 characters. Or if someone could help me on other way: I have to process everyday a lot of SPSS Syntax Files and Dat files that come from one system that can only export data on through that way. I use PSPP to generate the spss data file (sav) that I read with R. From R I can export to MySQL, DBF and STATA to satisfy the needs of different guys here. The problem is the limit of 8 characters long on variable names. Can someone help on that? Caveman __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time Dependent Cox Model
Does anyone have suggestions? Thanks! quaildoc wrote: I am having trouble formatting some survival data to use in a time dependent cox model. My time dep. variable is habitat and I have it recorded for every day (with some NAs). I think it is working properly except for calculating the death.time. This column should be 1s or 0s and as I have it only produces 0s. Any help will be greatly appreciated. http://www.nabble.com/file/p25881478/Survival_master2.csv Survival_master2.csv Here is my code: sum(!is.na(surv[,16:726])) surv2-matrix(0,12329,19) colnames(surv2)-c('start', 'stop', 'death.time', names(surv)[1:15],'habitat') row-0 # set record counter to 0 for (i in 1:nrow(surv)) { # loop over individuals for (j in 16:726) { # loop over 726 days if (is.na(surv[i, j])) next # skip missing data else { row - row + 1 # increment row counter start - j - 11 # start time (previous day) stop - start + 1 # stop time (day) death.time - if (stop == surv[i, 4] surv[i, 5] ==1) 1 else 0 # construct record: surv2[row,] - c(start, stop, death.time, unlist(surv[i, c(1:15, j)])) } } } surv2-as.data.frame(surv2) -- View this message in context: http://www.nabble.com/Time-Dependent-Cox-Model-tp25881478p25893488.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Taking specific/timed differences in a zoo timeseries
Dear Gabor, Thank you very much for your help! I'm now using your suggestion with my data. May I ask a stupid question? The output's index now has format 2009-10-14. How can I transform it back into original 10/14/09 and use this in a zoo object? Regards, Sergey On Wed, Oct 14, 2009 at 17:03, Gabor Grothendieck ggrothendi...@gmail.comwrote: Try this: library(zoo) # temp - ... from post asking question # create a day sequence, dt, with no missing days # and create a 0 width series with those times. # merge that with original series giving original # series plus a bunch of times having NA values. # Use na.locf to fill in those values with the last # non-missing so far. rng - range(time(temp)) dt - seq(rng[1], rng[2], day) temp.m - na.locf(merge(temp, zoo(, dt))) # create a lagged time scale and subtract the # lagged series from original dt.lag - as.Date(as.yearmon(dt)+1/12) + as.numeric(format(dt, %d)) - 1 temp - zoo(coredata(temp.m), dt.lag) Using your data the output from the last line is: temp - zoo(coredata(temp.m), dt.lag) 2009-10-05 2009-10-06 2009-10-07 2009-10-08 2009-10-09 2009-10-13 2009-10-14 -5 -6 3 2 -2 2 1 On Wed, Oct 14, 2009 at 10:39 AM, Sergey Goriatchev serg...@gmail.com wrote: Hello everyone. I have a specific problem that I have difficulties to solve. Assume I have a zoo object: set.seed(12345) data - round(runif(27)*10+runif(27)*5, 0) dates - as.Date(c(09/03/09, 09/04/09, 09/07/09, 09/09/09, 09/10/09, 09/11/09, 09/14/09, 09/16/09, 09/17/09, 09/18/09, 09/21/09, 09/22/09, 09/23/09, 09/24/09, 09/25/09, 09/28/09, 09/29/09, 09/30/09, 10/01/09, 10/02/09, 10/05/09, 10/06/09, 10/07/09, 10/08/09, 10/09/09, 10/13/09, 10/14/09), %m/%d/%y) temp - zoo(data, order.by=dates) What I need to do is to take differences between say October 14th and September 14, then October 13th and September 13th, that is 1 month difference independent of number of days inbetween. And when there is no matching date in an earlier month, like here where there is no September 13th, the date should be the first preceding date, that is September 11th in this example. How can I do that? The above is just an example, my zoo object is very big and I need to take differences between years, that is between October 14th, 2009 and October 14th, 2008, then Oct.13, 2009 and Oct.13, 2008, and so on. Also, the time index of my zoo object has format 10/14/09 (that is Oct.14, 2009), and that is the format I need to operate with and do not want to change. In the example I reformated just so that I can create a zoo object. Could some friendly person please show me how to do such a calculation? Thank you in advance! Best, Sergey __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- I'm not young enough to know everything. /Oscar Wilde Experience is one thing you can't get for nothing. /Oscar Wilde When you are finished changing, you're finished. /Benjamin Franklin Tell me and I forget, teach me and I remember, involve me and I learn. /Benjamin Franklin Luck is where preparation meets opportunity. /George Patten Kniven skärpes bara mot stenen. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with NLSstClosestX; and suggested fix
Problem is demonstrated with this code, intended to find the approximate 'x' at which the 'y' is midway between the left and right asymptotes. This particular data set returns NA, which is a bit silly! -- sXY - structure(list(x = c(0, 24, 27, 48, 51, 72, 75, 96, 99), y = c(4.98227, 6.38021, 6.90309, 7.77815, 7.64345, 7.23045, 7.27875, 7.11394, 6.95424)), .Names = c(x, y), row.names = c(NA, 9L), class = c(sortedXyData, data.frame)) a - NLSstLfAsymptote(sXY) d - NLSstRtAsymptote(sXY) NLSstClosestX(sXY, (a+d)/2) I think the problem arises when the target y value is exactly equal to one of the y values in sXY and can be fixed by trapping that situation thus: NLSstClosestX.sortedXyData - function (xy, yval) { deviations - xy$y - yval if (any(deviations==0)) xy$x[match(0, deviations)] else { # new line inserted if (any(deviations = 0)) { dev1 - max(deviations[deviations = 0]) lim1 - xy$x[match(dev1, deviations)] if (all(deviations = 0)) { return(lim1) } } if (any(deviations = 0)) { dev2 - min(deviations[deviations = 0]) lim2 - xy$x[match(dev2, deviations)] if (all(deviations = 0)) { return(lim2) } } dev1 - abs(dev1) dev2 - abs(dev2) lim1 + (lim2 - lim1) * dev1/(dev1 + dev2) } # new line inserted } --- Comments/corrections welcome. Keith Jewell = sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices datasets tcltk utils methods base other attached packages: [1] nlme_3.1-93xlsReadWrite_1.3.3 svSocket_0.9-43svMisc_0.9-48 TinnR_1.0.3R2HTML_1.59-1 [7] Hmisc_3.6-1 loaded via a namespace (and not attached): [1] cluster_1.12.0 grid_2.9.2 lattice_0.17-25 stats4_2.9.2 tools_2.9.2 VGAM_0.7-9 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Getting indeices of intersecting elements.
Hi, Is there a command to get the indices of intersecting elements of two vectors as intersect() will give the elements and not its indices. Thanks in advance. Praveen Surendran School of Medicine and Medical Sciences University College Dublin Belfield, Dublin 4 Ireland. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot discriminant analysis
Hi Alain, thanks for the fast response. I've the same results with iris data, but when I use my data (mentioned in the first message), I have different results. Regards, Alejo 2009/10/14 Alain Guillet alain.guil...@uclouvain.be Hi, I did it with Iris - data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]), Sp = rep(c(s,c,v), rep(50,3))) train - sample(1:150, 75) table(Iris$Sp[train]) z - lda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train) Then I did plot(z,xlim=c(-10,10),ylim=c(-10,10)) before drawing points(predict(z)$x, col=palette()[predict(z)$class],xlim=c(-10,10),ylim=c(-10,10)) and all the points are superimposed. The only difference I found was the different x- and y-axis when I drew them separately, i.e. plot(z) plot(predict(z)$x, col=palette()[predict(z)$class]) Alain Alejo C.S. wrote: I'm confused on how is the right way to plot a discriminant analysis made by lda function (MASS package). (I had attached my data fro reproduction). When I plot a lda object : X - read.table(data, header=T) lda_analysis - lda(formula(X), data=X) plot(lda_analysis) #the above plot is completely different to: plot(predict(lda_analysis)$x, col=palette()[predict(lda_analysis)$class]) that should be the same graph than the first? In the second case, I use predict function to obtain the LD1 and LD2 coordinates of lda_analysis (predict(lda_analysis)$x) and it's respective class (predict(lda_analysis)$class), but it seems that the classes are different: table(X$G3, predict(lda_analysis)$class) BG M B 2903 G0 26 2 M 40 46 any clues? Regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Alain Guillet Statistician and Computer Scientist SMCS - Institut de statistique - Université catholique de Louvain Bureau c.316 Voie du Roman Pays, 20 B-1348 Louvain-la-Neuve Belgium tel: +32 10 47 30 50 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time Dependent Cox Model
Well, it might be wise to elaborate a bit more about the variables and what exactly you want e.g. death-time to be. I'd interprete it as time of death, but the fact that it is 0/1, means it is a logical (?) binary variable of some sort. Please ask your question in such a way that somebody who doesn't know the dataset and your research, can still understand what is inside the dataset and what exactly you're trying to obtain. I'd also suggest to add the command to read in the data. I don't have the time to spend looking around how exactly I can read in the dataset in such a way it fits what you have in your workspace. Cheers Joris On Wed, Oct 14, 2009 at 5:37 PM, quaildoc just.strut...@gmail.com wrote: Does anyone have suggestions? Thanks! quaildoc wrote: I am having trouble formatting some survival data to use in a time dependent cox model. My time dep. variable is habitat and I have it recorded for every day (with some NAs). I think it is working properly except for calculating the death.time. This column should be 1s or 0s and as I have it only produces 0s. Any help will be greatly appreciated. http://www.nabble.com/file/p25881478/Survival_master2.csv Survival_master2.csv Here is my code: sum(!is.na(surv[,16:726])) surv2-matrix(0,12329,19) colnames(surv2)-c('start', 'stop', 'death.time', names(surv)[1:15],'habitat') row-0 # set record counter to 0 for (i in 1:nrow(surv)) { # loop over individuals for (j in 16:726) { # loop over 726 days if (is.na(surv[i, j])) next # skip missing data else { row - row + 1 # increment row counter start - j - 11 # start time (previous day) stop - start + 1 # stop time (day) death.time - if (stop == surv[i, 4] surv[i, 5] ==1) 1 else 0 # construct record: surv2[row,] - c(start, stop, death.time, unlist(surv[i, c(1:15, j)])) } } } surv2-as.data.frame(surv2) -- View this message in context: http://www.nabble.com/Time-Dependent-Cox-Model-tp25881478p25893488.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Survival and nonparametric
Hi all, Has any body the exprience to iclude a nonparametric component into the survival analysis using R package? *Can someone recommend *me * some ** references? * Thanks a lot Ashta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot discriminant analysis
On Oct 14, 2009, at 12:24 PM, Alejo C.S. wrote: Hi Alain, thanks for the fast response. I've the same results with iris data, but when I use my data (mentioned in the first message), You are apparently under the false impression that the data made it through the listserv. Read the Posting Guide to find out why that impression is false. I have different results. Regards, Alejo 2009/10/14 Alain Guillet alain.guil...@uclouvain.be Hi, I did it with Iris - data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]), Sp = rep(c(s,c,v), rep(50,3))) train - sample(1:150, 75) table(Iris$Sp[train]) z - lda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train) Then I did plot(z,xlim=c(-10,10),ylim=c(-10,10)) before drawing points(predict(z)$x, col=palette()[predict(z)$class],xlim=c(-10,10),ylim=c(-10,10)) and all the points are superimposed. The only difference I found was the different x- and y-axis when I drew them separately, i.e. plot(z) plot(predict(z)$x, col=palette()[predict(z)$class]) Alain Alejo C.S. wrote: I'm confused on how is the right way to plot a discriminant analysis made by lda function (MASS package). (I had attached my data fro reproduction). When I plot a lda object : X - read.table(data, header=T) lda_analysis - lda(formula(X), data=X) plot(lda_analysis) #the above plot is completely different to: plot(predict(lda_analysis)$x, col=palette()[predict(lda_analysis) $class]) that should be the same graph than the first? In the second case, I use predict function to obtain the LD1 and LD2 coordinates of lda_analysis (predict(lda_analysis)$x) and it's respective class (predict(lda_analysis)$class), but it seems that the classes are different: table(X$G3, predict(lda_analysis)$class) BG M B 2903 G0 26 2 M 40 46 any clues? Regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Alain Guillet Statistician and Computer Scientist SMCS - Institut de statistique - Université catholique de Louvain Bureau c.316 Voie du Roman Pays, 20 B-1348 Louvain-la-Neuve Belgium tel: +32 10 47 30 50 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting indeices of intersecting elements.
On Oct 14, 2009, at 12:15 PM, Praveen Surendran wrote: Hi, Is there a command to get the indices of intersecting elements of two vectors as intersect() will give the elements and not its indices. ?which samp1 - sample(seq(3,198, by=3), 20); samp2 - sample(seq(3,198, by=3), 20) int - intersect(samp1, samp2) int #[1] 48 87 9 159 36 6 39 105 which(seq(3,198, by=3) %in% int) #[1] 2 3 12 13 16 29 35 53 Thanks in advance. Praveen Surendran School of Medicine and Medical Sciences University College Dublin Belfield, Dublin 4 Ireland. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] currency conversion function?
Hello On 10/14/09, Henrique Dallazuanna www...@gmail.com wrote: foo('BRL', 'USD', '2009-10-14') Nice function, thank you. Two issues, though: - it seems to provide reverse output. Example: ## how many dollars do you get from one euro? foo('EUR', 'USD', '2009-10-14') [1] 0.67544 ## however, the equivalent of 1 Euro would be .. 1/foo('EUR', 'USD', '2009-10-14') [1] 1.4805 ## .. dollars ## 1 Euro = 1.48051 US Dollar ## taken from the on-line converter The dirty hack is to use 1/as.numeric(value) as a return value. For describing the next issue I will use the improved version of the function. - second issue, for weaker currencies (that is, with more digits) the reported value is not necessarily correct. Example: ## fine foo('EUR', 'RUB', '2009-10-14') [1] 43.745 ## fine 1/foo('EUR', 'RUB', '2009-10-14') [1] 0.02286 ## wrong foo('RUB', 'EUR', '2009-10-14') [1] 0.26878 ## taken from the on-line converter ## 1 Euro = 43.75188 Russian Rouble ## 1 Russian Rouble (RUB) = 0.02286 Euro (EUR) I am not sure on how to fix this one. Thank you Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival and nonparametric
http://finzi.psych.upenn.edu/views/Survival.html Kalbfleisch Prentice The Statistical Analysis of Failure Time Data Therneau Grambsch Modeling Survival Data Harrell Regression Modeling Strategies On Oct 14, 2009, at 12:35 PM, Ashta wrote: Hi all, Has any body the exprience to iclude a nonparametric component into the survival analysis using R package? *Can someone recommend *me * some ** references? * Thanks a lot Ashta David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SPSS long variable names
You could read in the sps file in R with readLines() for example, and then use the tools for regular expression and substring to find the label statements. Then you can just use R to add the labels to it, without having to pass through PSPP. So you could actually just create an R script that takes both files as input, and generates the output automatically. I's prefer to do it that way than to use YAP (yet another program), while waiting for the adaptation of read.spss... Cheers Joris On Wed, Oct 14, 2009 at 5:48 PM, Orvalho Augusto orvaq...@gmail.com wrote: Hi The .dat file is a tab delimited file with the long variables names on it. The .sps file has the instructions to read the .dat and place all the variable and value labels. The ideia of reading the dat directely is good but I need the labels placed. Yes I could read the dat file and parese myself the sps to get sps file but that it is a job that to try and by these mails I will have to do so. For now what I made a little script to generate the sav using pspp; Then I read the sav with read.spss. Then I call read.delim to read the long names on the dat file. It works but is not beautfull and uses more resources. The great thing would be to support long variables names on read.spss. Thanks guys for everything Caveman On Wed, Oct 14, 2009 at 4:52 PM, joris meys jorism...@gmail.com wrote: Hi Orvalho, question : where do the .dat files come from and what do you have to do with the SPSS syntax files. I guess the syntax file is to change the .dat file into SPSS format. But you could take the shortcut and read in the .dat file directly. If the SPSS syntax file is a text file (which should be), you can construct your own function to read in all specifications from the syntax file. the function regexp() can be a great help for that. If you have no clue how to do that, just send me an example, and I'll take a look. Cheers Joris On Sat, Oct 10, 2009 at 6:14 PM, Orvalho Augusto orvaq...@gmail.com wrote: Hello guys I am new to this list and for R too. I am wondering if there is a patch for the SPSS reading code on the foreign package, in order to be able to read long variable names. Right now read.spss() just trunc the names to 8 characters. Or if someone could help me on other way: I have to process everyday a lot of SPSS Syntax Files and Dat files that come from one system that can only export data on through that way. I use PSPP to generate the spss data file (sav) that I read with R. From R I can export to MySQL, DBF and STATA to satisfy the needs of different guys here. The problem is the limit of 8 characters long on variable names. Can someone help on that? Caveman __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival and nonparametric
On Oct 14, 2009, at 12:47 PM, David Winsemius wrote: http://finzi.psych.upenn.edu/views/Survival.html # lots of 404 link errors The finzi.psych server does something weird to the task view pages, so you would get greater linkability with the actual CRAN version: http://cran.r-project.org/web/views/Survival.html Kalbfleisch Prentice The Statistical Analysis of Failure Time Data Therneau Grambsch Modeling Survival Data Harrell Regression Modeling Strategies On Oct 14, 2009, at 12:35 PM, Ashta wrote: Hi all, Has any body the exprience to iclude a nonparametric component into the survival analysis using R package? *Can someone recommend *me * some ** references? * Thanks a lot Ashta David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tunnelling X for R graphics
In an ssh client, under connections section, there is an option for tunneling. Please ensure that the tunneling options are turned on.. and if applicable, incoming/outgoing tunnels, listen/destination ports, etc. are set. Thanks, Santosh On Tue, Feb 3, 2009 at 2:03 PM, Adam D. I. Kramer a...@ilovebacon.orgwrote: Thanks very much for the reassurance. Really, I can just open a new X11 device on the same display, since the display (localhost:10) is effectively reconnected when I ssh in again. I'll reply again to this post if I find other parts of R working poorly after the disconnection. --Adam On Tue, 3 Feb 2009, Prof Brian Ripley wrote: To answer your basic question, you do need to shut down everything involivng X, that is X11() devices and the X11 dataeditor. If you do that (and graphics.off() will suffice for the first), you should be able to re-open an X11 device on another display (which is what presumably a new VNC connection gives you). The warning comes from any X erorr, and it is not possible to know how serious it is without external information. On Mon, 2 Feb 2009, Adam D. I. Kramer wrote: On Tue, 3 Feb 2009, Patrick Connolly wrote: The problem, and maybe I'm just whining here, is that because the data sets are large this takes several minutes where I'm basically just sitting around. This happens once every other day as the VPN software I'm using times out after about 24 hours and thus the ssh session dies. Is it possible to do anything about the VPN software? I use tightVNC to do something similar and it doesn't time out after 24 hours. Even closing the desktop machine down altogether does not lose the ssh connexion. Restarting the desktop a week later will still find the X session without loss. The VPN software is managed and maintained by the company I'm doing statistical computing work for...out of my control. Your comments about TightVNC are pretty impressive, though--I'm not really sure how that would work...though if you set your ssh connection to not push any data towards your computer, I gather the server would have no reason to believe you were unresponsive? In any case, this sadly doesn't help me, but many thanks! For now, I'm just trying my hardest to remeber to dev.off() when I'm done using graphics. --Adam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/http://www.stats.ox.ac.uk/%7Eripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error: testing 'stats' failed - R 2.9.2 on Linux
I've just built R 2.9.2 from source on Slackware Linux 13.0 - 32-bit (will try 64-bit also next) - and seen: Collecting examples for package 'stats' Running examples in package 'stats' Error: testing 'stats' failed Execution halted make[3]: *** [test-Examples-Base] Error 1 Looking at R-2.9.2/tests/Examples/stats-Ex.Rout.fail I see: ... ### ** Examples ## see also JohnsonJohnson, Nile and AirPassengers require(graphics) trees - window(treering, start=0) (fit - StructTS(trees, type = level)) Call: StructTS(x = trees, type = level) Variances: levelepsilon 0.0003700 0.0719877 plot(trees) lines(fitted(fit), col = green) tsdiag(fit) (fit - StructTS(log10(UKgas), type = BSM)) Error in optim(init[mask], getLike, method = L-BFGS-B, lower = rep(0, : non-finite value supplied by optim Calls: StructTS - optim Execution halted I'll be happy to supply any information I can upon request to help the developers solve this apparent problem. I've searched the R mailing lists - and done a general Google - but can't find any previous mention of the matter. John A. Murdie __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pairs
Does the pairs2 function in the TeachingDemos package do what you want? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Seyit Ali Kayis Sent: Wednesday, October 14, 2009 6:35 AM To: r-help@r-project.org Subject: [R] pairs Dear all, I have two sets of data (say set1 and set2) as follow: set1 x1 x2 x3 0.30 0.43 3.88 0.38 0.59 3.53 0.30 0.42 2.12 0.33 0.53 2.12 0.30 0.47 3.76 set2 y1 y2 y3 0.32 0.47 5.18 0.23 0.26 1.06 0.42 0.65 3.88 0.28 0.38 3.76 0.35 0.47 1.41 The pairs function (such as pairs(~x1+x2+x3 data=set1, main=Simple Scatterplot Matrix) ) is producing scatterplot matrix where lower and upper diagonals have scatter plots of set1 variables. I want to produce a scatterplot matrix where in upper panel (diagonal) I should have plots from set1 variables and in lower panel (diagonal) I should have plots from set2 variables. Is there a way that I can do this? Any help is deeply appreciated. Kind Regards Seyit Ali --- --- Dr. Seyit Ali KAYIS Selcuk University Faculty of Agriculture Kampus, Konya, TURKEY s_a_ka...@yahoo.com,s_a_ka...@hotmail.com Tell: +90 332 223 2830 Mobile: +90 535 587 1139 Fax: +90 332 241 0108 Greetings from Konya, TURKEY http://www.ziraat.selcuk.edu.tr/skayis/ --- --- _ Facebook. k-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:en-nz:SI_SB_2:092010 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SPSS long variable names
Ok. I will try that then. Caveman On Wed, Oct 14, 2009 at 6:53 PM, joris meys jorism...@gmail.com wrote: You could read in the sps file in R with readLines() for example, and then use the tools for regular expression and substring to find the label statements. Then you can just use R to add the labels to it, without having to pass through PSPP. So you could actually just create an R script that takes both files as input, and generates the output automatically. I's prefer to do it that way than to use YAP (yet another program), while waiting for the adaptation of read.spss... Cheers Joris On Wed, Oct 14, 2009 at 5:48 PM, Orvalho Augusto orvaq...@gmail.com wrote: Hi The .dat file is a tab delimited file with the long variables names on it. The .sps file has the instructions to read the .dat and place all the variable and value labels. The ideia of reading the dat directely is good but I need the labels placed. Yes I could read the dat file and parese myself the sps to get sps file but that it is a job that to try and by these mails I will have to do so. For now what I made a little script to generate the sav using pspp; Then I read the sav with read.spss. Then I call read.delim to read the long names on the dat file. It works but is not beautfull and uses more resources. The great thing would be to support long variables names on read.spss. Thanks guys for everything Caveman On Wed, Oct 14, 2009 at 4:52 PM, joris meys jorism...@gmail.com wrote: Hi Orvalho, question : where do the .dat files come from and what do you have to do with the SPSS syntax files. I guess the syntax file is to change the .dat file into SPSS format. But you could take the shortcut and read in the .dat file directly. If the SPSS syntax file is a text file (which should be), you can construct your own function to read in all specifications from the syntax file. the function regexp() can be a great help for that. If you have no clue how to do that, just send me an example, and I'll take a look. Cheers Joris On Sat, Oct 10, 2009 at 6:14 PM, Orvalho Augusto orvaq...@gmail.com wrote: Hello guys I am new to this list and for R too. I am wondering if there is a patch for the SPSS reading code on the foreign package, in order to be able to read long variable names. Right now read.spss() just trunc the names to 8 characters. Or if someone could help me on other way: I have to process everyday a lot of SPSS Syntax Files and Dat files that come from one system that can only export data on through that way. I use PSPP to generate the spss data file (sav) that I read with R. From R I can export to MySQL, DBF and STATA to satisfy the needs of different guys here. The problem is the limit of 8 characters long on variable names. Can someone help on that? Caveman __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Large data sets with high dimensional fixed effects
Hi, I have a data set that consists of about 2 million observations and several high dimensional fixed effects (2 factors at around 1000 levels each, and others with a few hundred levels). I'm looking to run linear and logit regressions. I've tried packages such as filehash and biglm to store some of the big matrices on the hard drive, but I still get errors like Cannot allocate vector of length I've read about some iterative methods for coefficient estimation in STATA that would probably work for this, but I'm wondering if there is an R package out there meant for situations like mine. I'm running a XP x64 machine with an AMD 2.8Ghz dual core processor and 6GB of RAM, and I'm not really concerned with memory- and time-intensive solutions as long as they work. Thanks, Dan -- View this message in context: http://www.nabble.com/Large-data-sets-with-high-dimensional-fixed-effects-tp25894824p25894824.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting indeices of intersecting elements.
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Praveen Surendran Sent: Wednesday, October 14, 2009 9:15 AM To: r-help@r-project.org Subject: [R] Getting indeices of intersecting elements. Hi, Is there a command to get the indices of intersecting elements of two vectors as intersect() will give the elements and not its indices. intersect() uses match() to do that. match(x,y,nomatch) gives the indices of elements of y that elements of x match and match(y,x,nomatch) gives the indices of the elements of x that elements of y match. In each case, the unmatched elements are marked by the value in nomatch. E.g. x-letters[1:5] y-letters[(1:5)*2] x [1] a b c d e y [1] b d f h j match(y,x,nomatch=0) # x[c(2,4)] are in y [1] 2 4 0 0 0 match(x,y,nomatch=0) # y[c(1,2)] are in x [1] 0 1 0 2 0 (If the second argument contains duplicates the index is for the first of them.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com Thanks in advance. Praveen Surendran School of Medicine and Medical Sciences University College Dublin Belfield, Dublin 4 Ireland. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RPostgreSQL: unable to load shared library
Hello list, I'm using R 2.9.2 on a WinXP system, and I installed the RPostgreSQL library using the package installer. When trying to load it, I get the following error: library('RPostgreSQL') Error in inDL(x, as.logical(local), as.logical(now), ...) : unable to load shared library 'C:/PROGRA~1/R/R/library/RPostgreSQL/libs/RPostgreSQL.dll': LoadLibrary failure: The operating system cannot run %1. Error: package/namespace load failed for 'RPostgreSQL' So one way or the other, the DLL is not found... Does anyone know how to fix this? I don't suppose DLL should be directly in my PATH, right? Thanks for any hints, Arnout __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] puzzle using gsub (and encodings maybe)
Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) x [1] NEW YORK NEW ENGLAND gsub( -, -, x)# this does not work! [1] NEW YORK NEW ENGLAND Encoding(x) # is x in a special encoding? no [1] unknown y = NEW YORK -NEW ENGLAND # I type in variable y gsub( -, -, y)# and gsub works as expected [1] NEW YORK-NEW ENGLAND I'm sure the problem has to do with the way I read the variable x. But even if I change the encoding for x to ASCII, I still cannot do the sub. I get x by reading a pdf file with pdftotext so you will not be able to replicate my issue. Thanks for any suggestions, Adrian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzle using gsub (and encodings maybe)
On 10/14/2009 1:30 PM, Adrian Dragulescu wrote: Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) x [1] NEW YORK NEW ENGLAND gsub( -, -, x)# this does not work! [1] NEW YORK NEW ENGLAND It looks as though it worked, presumably because something got lost in your email. Could you post charToRaw(x) so we can see what's in x? Duncan Murdoch Encoding(x) # is x in a special encoding? no [1] unknown y = NEW YORK -NEW ENGLAND # I type in variable y gsub( -, -, y)# and gsub works as expected [1] NEW YORK-NEW ENGLAND I'm sure the problem has to do with the way I read the variable x. But even if I change the encoding for x to ASCII, I still cannot do the sub. I get x by reading a pdf file with pdftotext so you will not be able to replicate my issue. Thanks for any suggestions, Adrian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzle using gsub (and encodings maybe)
charToRaw(x) [1] 4e 45 57 20 59 4f 52 4b 20 ad 4e 45 57 20 45 4e 47 4c 41 4e 44 charToRaw(y) [1] 4e 45 57 20 59 4f 52 4b 20 2d 4e 45 57 20 45 4e 47 4c 41 4e 44 So they are different. Adrian I use R 2.8.1 on WinXP On Wed, 14 Oct 2009, Duncan Murdoch wrote: On 10/14/2009 1:30 PM, Adrian Dragulescu wrote: Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) x [1] NEW YORK NEW ENGLAND gsub( -, -, x)# this does not work! [1] NEW YORK NEW ENGLAND It looks as though it worked, presumably because something got lost in your email. Could you post charToRaw(x) so we can see what's in x? Duncan Murdoch Encoding(x) # is x in a special encoding? no [1] unknown y = NEW YORK -NEW ENGLAND # I type in variable y gsub( -, -, y)# and gsub works as expected [1] NEW YORK-NEW ENGLAND I'm sure the problem has to do with the way I read the variable x. But even if I change the encoding for x to ASCII, I still cannot do the sub. I get x by reading a pdf file with pdftotext so you will not be able to replicate my issue. Thanks for any suggestions, Adrian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzle using gsub (and encodings maybe)
On 10/14/2009 1:41 PM, Adrian Dragulescu wrote: charToRaw(x) [1] 4e 45 57 20 59 4f 52 4b 20 ad 4e 45 57 20 45 4e 47 4c 41 4e 44 charToRaw(y) [1] 4e 45 57 20 59 4f 52 4b 20 2d 4e 45 57 20 45 4e 47 4c 41 4e 44 So they are different. Adrian I use R 2.8.1 on WinXP But that's ancient. Please try again with the beta of 2.10.0, and let us know if you still see a problem. Duncan Murdoch On Wed, 14 Oct 2009, Duncan Murdoch wrote: On 10/14/2009 1:30 PM, Adrian Dragulescu wrote: Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) x [1] NEW YORK NEW ENGLAND gsub( -, -, x)# this does not work! [1] NEW YORK NEW ENGLAND It looks as though it worked, presumably because something got lost in your email. Could you post charToRaw(x) so we can see what's in x? Duncan Murdoch Encoding(x) # is x in a special encoding? no [1] unknown y = NEW YORK -NEW ENGLAND # I type in variable y gsub( -, -, y)# and gsub works as expected [1] NEW YORK-NEW ENGLAND I'm sure the problem has to do with the way I read the variable x. But even if I change the encoding for x to ASCII, I still cannot do the sub. I get x by reading a pdf file with pdftotext so you will not be able to replicate my issue. Thanks for any suggestions, Adrian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Scatter plot using icons (from a gif) instaed of points - is it possible ?
Hello dear R-help group. I wish to plot a scatter plot using icons (or images) instead of points. Is it possible? and how so? Thanks, Tal -- My contact information: Tal Galili E-mail: tal.gal...@gmail.com Phone number: 972-52-7275845 FaceBook: Tal Galili My Blogs: http://www.talgalili.com (Web and general, Hebrew) http://www.biostatistics.co.il (Statistics, Hebrew) http://www.r-statistics.com/ (Statistics,R, English) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Handle lot of variables - Regression
anna0102 wrote: Hey, I've got a data set (e.g. named Data) which contains a lot of variables, for example: s1, s2, ..., s50 My first question is: It is possible to do this: Data$s1 But is it also possible to do something like this: Data$s1:s50 (I've tried a lot of versions of those without a result) My second question: I want to do a stepwise logistic regression. For this purpose I use the following procedures: result-glm(...) step(result, direction=forward) Now the problem I have, is, that I have to include all my 50 variables (s1-s50), but I don't want to write them all down like y~s1+s2+s3+s4... (furthermore it has to be implemented in a loop, so I really need it). I've tried do store the 50 variables in a list (e.g. list[[1]]) and tried this: result-glm(y ~ list[[1]], ...) This works! But if I try to do it stepwise result2-step(result) I always get the same results as from glm without a stepwise approach. So obviously R can't handle this if you put a list in. How can I make this work? Thanks in advance, Anna Anna, You might as well just take a random sample of your candidate predictors. Stepwise regression isn't much better than that. Note that if you don't have enough events (say 15 times 50) to fit a full model then you don't have enough events to do stepwise regression without appropriate penalization. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzle using gsub (and encodings maybe)
On Wed, 14 Oct 2009, Adrian Dragulescu wrote: charToRaw(x) [1] 4e 45 57 20 59 4f 52 4b 20 ad 4e 45 57 20 45 4e 47 4c 41 4e 44 charToRaw(y) [1] 4e 45 57 20 59 4f 52 4b 20 2d 4e 45 57 20 45 4e 47 4c 41 4e 44 So they are different. We really do need the 'at a minimum' information we asked you for in the posting guide. But in cp1252 (a guess as to what you might be using) \xad is a 'soft hyphen', and that is not the same thing as a hyphen -- you will get the same issues with 'non-breaking space'. BDR Adrian I use R 2.8.1 on WinXP On Wed, 14 Oct 2009, Duncan Murdoch wrote: On 10/14/2009 1:30 PM, Adrian Dragulescu wrote: Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) x [1] NEW YORK NEW ENGLAND gsub( -, -, x)# this does not work! [1] NEW YORK NEW ENGLAND Well, I see no hyphen at all here, but then I am not on Windows. It looks as though it worked, presumably because something got lost in your email. Could you post charToRaw(x) so we can see what's in x? Duncan Murdoch Encoding(x) # is x in a special encoding? no [1] unknown y = NEW YORK -NEW ENGLAND # I type in variable y gsub( -, -, y)# and gsub works as expected [1] NEW YORK-NEW ENGLAND I'm sure the problem has to do with the way I read the variable x. But even if I change the encoding for x to ASCII, I still cannot do the sub. I get x by reading a pdf file with pdftotext so you will not be able to replicate my issue. Thanks for any suggestions, Adrian -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzle using gsub (and encodings maybe)
I get the same results (not working) using R 2.9.2 and R.10.0 beta. Thank you for looking at this. On Wed, 14 Oct 2009, Duncan Murdoch wrote: On 10/14/2009 1:41 PM, Adrian Dragulescu wrote: charToRaw(x) [1] 4e 45 57 20 59 4f 52 4b 20 ad 4e 45 57 20 45 4e 47 4c 41 4e 44 charToRaw(y) [1] 4e 45 57 20 59 4f 52 4b 20 2d 4e 45 57 20 45 4e 47 4c 41 4e 44 So they are different. Adrian I use R 2.8.1 on WinXP But that's ancient. Please try again with the beta of 2.10.0, and let us know if you still see a problem. Duncan Murdoch On Wed, 14 Oct 2009, Duncan Murdoch wrote: On 10/14/2009 1:30 PM, Adrian Dragulescu wrote: Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) x [1] NEW YORK NEW ENGLAND gsub( -, -, x)# this does not work! [1] NEW YORK NEW ENGLAND It looks as though it worked, presumably because something got lost in your email. Could you post charToRaw(x) so we can see what's in x? Duncan Murdoch Encoding(x) # is x in a special encoding? no [1] unknown y = NEW YORK -NEW ENGLAND # I type in variable y gsub( -, -, y)# and gsub works as expected [1] NEW YORK-NEW ENGLAND I'm sure the problem has to do with the way I read the variable x. But even if I change the encoding for x to ASCII, I still cannot do the sub. I get x by reading a pdf file with pdftotext so you will not be able to replicate my issue. Thanks for any suggestions, Adrian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzle using gsub (and encodings maybe)
On 10/14/2009 2:16 PM, Adrian Dragulescu wrote: I get the same results (not working) using R 2.9.2 and R.10.0 beta. But it is working: the dash is an ad in x, not a 2d. You need to ask to substitute for the ad character, e.g. by spacelongdash - rawToChar(as.raw(c(0x20, 0xad))) gsub(spacelongdash, -, x) Duncan Murdoch Thank you for looking at this. On Wed, 14 Oct 2009, Duncan Murdoch wrote: On 10/14/2009 1:41 PM, Adrian Dragulescu wrote: charToRaw(x) [1] 4e 45 57 20 59 4f 52 4b 20 ad 4e 45 57 20 45 4e 47 4c 41 4e 44 charToRaw(y) [1] 4e 45 57 20 59 4f 52 4b 20 2d 4e 45 57 20 45 4e 47 4c 41 4e 44 So they are different. Adrian I use R 2.8.1 on WinXP But that's ancient. Please try again with the beta of 2.10.0, and let us know if you still see a problem. Duncan Murdoch On Wed, 14 Oct 2009, Duncan Murdoch wrote: On 10/14/2009 1:30 PM, Adrian Dragulescu wrote: Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) x [1] NEW YORK NEW ENGLAND gsub( -, -, x)# this does not work! [1] NEW YORK NEW ENGLAND It looks as though it worked, presumably because something got lost in your email. Could you post charToRaw(x) so we can see what's in x? Duncan Murdoch Encoding(x) # is x in a special encoding? no [1] unknown y = NEW YORK -NEW ENGLAND # I type in variable y gsub( -, -, y)# and gsub works as expected [1] NEW YORK-NEW ENGLAND I'm sure the problem has to do with the way I read the variable x. But even if I change the encoding for x to ASCII, I still cannot do the sub. I get x by reading a pdf file with pdftotext so you will not be able to replicate my issue. Thanks for any suggestions, Adrian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cacheing computationally expensive getter methods for S4 objects
Hi, I was wondering if there was a way to store the results of a computationally expensive getter call on an S4 object, so that it is only calculated once for each object. Trivial example: let's say I want to cache the expensive area calculation of a square object. setClass(Square, representation( length='numeric', width='numeric', area='numeric' ), prototype( length=0, width=0, area=-1 ) ) setGeneric(area, function(x) standardGeneric(area)) setMethod(area, Square, function(x) { if (x...@area == -1) { x...@area - x...@width * x...@height } x...@area }) Now the first time I call ``area(my.square)`` it computes ``my.squ...@width * my.squ...@height``, but each subsequent call returns `...@area`` since the area computation has already been calc'd and set for this object. Is this possible? I'm guessing the R pass by value semantics is going to make this one difficult ... is there some S4 reference I missed that has this type of info from? Thanks, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzle using gsub (and encodings maybe)
Thank you. If I use gsub( \xad, -, x) [1] NEW YORK-NEW ENGLAND I get what I want. Adrian sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base On Wed, 14 Oct 2009, Prof Brian Ripley wrote: On Wed, 14 Oct 2009, Adrian Dragulescu wrote: charToRaw(x) [1] 4e 45 57 20 59 4f 52 4b 20 ad 4e 45 57 20 45 4e 47 4c 41 4e 44 charToRaw(y) [1] 4e 45 57 20 59 4f 52 4b 20 2d 4e 45 57 20 45 4e 47 4c 41 4e 44 So they are different. We really do need the 'at a minimum' information we asked you for in the posting guide. But in cp1252 (a guess as to what you might be using) \xad is a 'soft hyphen', and that is not the same thing as a hyphen -- you will get the same issues with 'non-breaking space'. BDR Adrian I use R 2.8.1 on WinXP On Wed, 14 Oct 2009, Duncan Murdoch wrote: On 10/14/2009 1:30 PM, Adrian Dragulescu wrote: Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) x [1] NEW YORK NEW ENGLAND gsub( -, -, x)# this does not work! [1] NEW YORK NEW ENGLAND Well, I see no hyphen at all here, but then I am not on Windows. It looks as though it worked, presumably because something got lost in your email. Could you post charToRaw(x) so we can see what's in x? Duncan Murdoch Encoding(x) # is x in a special encoding? no [1] unknown y = NEW YORK -NEW ENGLAND # I type in variable y gsub( -, -, y)# and gsub works as expected [1] NEW YORK-NEW ENGLAND I'm sure the problem has to do with the way I read the variable x. But even if I change the encoding for x to ASCII, I still cannot do the sub. I get x by reading a pdf file with pdftotext so you will not be able to replicate my issue. Thanks for any suggestions, Adrian -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RPostgreSQL: unable to load shared library
On Wed, 14 Oct 2009, Fanfaar wrote: Hello list, I'm using R 2.9.2 on a WinXP system, and I installed the RPostgreSQL library using the package installer. When trying to load it, I get the following error: library('RPostgreSQL') Error in inDL(x, as.logical(local), as.logical(now), ...) : unable to load shared library 'C:/PROGRA~1/R/R/library/RPostgreSQL/libs/RPostgreSQL.dll': LoadLibrary failure: The operating system cannot run %1. Error: package/namespace load failed for 'RPostgreSQL' So one way or the other, the DLL is not found... Does anyone know how That is not what it says: it says it cannot *load* the DLL. You need the PostgreSQL client dll in your path, and I guess that (or its version) is the problem. (Usually Windows gives you a popup with more information, and indeed on my laptop it told me LIBPQ.DLL could not be found.) And pedump suggests that it is linked against entry points by number not name, a very fragile arrangement. I always worry that packages that link to external DLLs can be very dependent on the version of that DLL (and see the above comment). I could not see a description of the version of PostgreSQL used on Uwe's ReadMe (assuming this is a binary from CRAN), and suggest (as did the rw-FAQ) that you install RPostgreSQL from source against your own PostgreSQL installation. (That's what I do on my Windows desktop which does have PostgreSQL installed, and when I updated PostgreSQL I had to re-install PostgreSQL ) to fix this? I don't suppose DLL should be directly in my PATH, right? Thanks for any hints, Arnout __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Yes another person who thinks that does not apply to them. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.