Re: [R] matching observations and ranking
Hi, It is not that clear. If VAR1 is a match between columns AB001A, AB0002A, VAR2 between AB001A, AB362 and VAR3 between AB0002A and AB362: Also, I assume row8 match would be taken as 1. dat1- read.table(text= S.No AB001A AB0002A AB362 1 -/- C/C A/A 2 C/C C/C A/A 3 C/C C/C A/A 4 C/C C/C A/A 5 C/C C/C A/A 6 C/C C/C A/A 7 C/C C/C A/A 8 -/- -/- -/- 9 C/C C/C A/A 10 C/C C/C A/A 11 -/- C/C A/A 12 C/C C/C A/A 13 C/C C/C A/A 14 C/C C/C A/A 16 C/C -/- A/A 17 -/- C/C A/A 18 C/C C/C A/A 19 C/C C/C A/A ,sep=,header=TRUE,stringsAsFactors=FALSE) library(plyr) res-mutate(dat1,VAR1=1*(AB001A==AB0002A),VAR2=1*(AB001A==AB362),VAR3=1*(AB0002A==AB362),SUM=rowSums(cbind(VAR1,VAR2,VAR3)),MATCH=(SUM/3)*100,Rank=rank(MATCH) head(res) # S.No AB001A AB0002A AB362 VAR1 VAR2 VAR3 SUM MATCH Rank #1 1 -/- C/C A/A 0 0 0 0 0.0 2.5 #2 2 C/C C/C A/A 1 0 0 1 33.3 11.0 #3 3 C/C C/C A/A 1 0 0 1 33.3 11.0 #4 4 C/C C/C A/A 1 0 0 1 33.3 11.0 #5 5 C/C C/C A/A 1 0 0 1 33.3 11.0 #6 6 C/C C/C A/A 1 0 0 1 33.3 11.0 #or res-mutate(dat1,VAR1=1*(AB001A==AB0002A),VAR2=1*(AB001A==AB362),VAR3=1*(AB0002A==AB362),SUM=rowSums(cbind(VAR1,VAR2,VAR3)),MATCH=(SUM/3)*100,Rank=rank(MATCH,ties.method=min)) head(res) # S.No AB001A AB0002A AB362 VAR1 VAR2 VAR3 SUM MATCH Rank #1 1 -/- C/C A/A 0 0 0 0 0.0 1 #2 2 C/C C/C A/A 1 0 0 1 33.3 5 #3 3 C/C C/C A/A 1 0 0 1 33.3 5 #4 4 C/C C/C A/A 1 0 0 1 33.3 5 #5 5 C/C C/C A/A 1 0 0 1 33.3 5 #6 6 C/C C/C A/A 1 0 0 1 33.3 5 A.K. Hi to all bloggers, my data looks like this, S. No AB001A AB0002A AB362 VAR1 VAR2 VAR3 SUM %Match Rank 1 -/- C/C A/A 2 C/C C/C A/A 3 C/C C/C A/A 4 C/C C/C A/A 5 C/C C/C A/A 6 C/C C/C A/A 7 C/C C/C A/A 8 -/- -/- -/- 9 C/C C/C A/A 10 C/C C/C A/A 11 -/- C/C A/A 12 C/C C/C A/A 13 C/C C/C A/A 14 C/C C/C A/A 16 C/C -/- A/A 17 -/- C/C A/A 18 C/C C/C A/A 19 C/C C/C A/A I want to match obs 3 with obs 2 if it exactly matched then score will be 1 else 0, that will be stored in var1 for AB001a, in var2 for ab0002a and in var3 for ab362 and i want to calculate sum of all the 1's and observation match percent and their rank (top ten matchers), I did this successfully in excel but it took me lot of time, i used if condition in excel like (=if(A3=A$2,1,0) and then i dragged among all obs and i did sum of all obs, their %match and rank. My question is how can i do this in R? can i use match package for this? or other packages will help me? my data is so big with 5,15,567 obs. can any one guide me how to do this in sas because i want to reduce my time to analyze my data. Thanking you Regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regression on stratified count data
On Apr 24, 2013, at 06:15 , meng wrote: Hi all: For stratified count data,how to perform regression analysis? My data: age case oc count 1 1 121 1 1 226 1 2 117 1 2 259 2 1 118 2 1 288 2 2 1 7 2 2 2 95 age: 1:40y 2:40y case: 1:patient 2:health oc: 1:use drug 2:not use drug My purpose: Anaysis whether case and oc are correlated, and age is a stratified variable. My solution: 1,Mantel-Haenszel test by using function mantelhaen.test 2,loglinear regression by using function glm(count~case*oc,family=poisson).But I don't know how to handle variable age,which is the stratified variable. The canonical way is to fit the model without 2nd order interaction: count~case*oc*age-case:oc:case . (It may take the back of an envelope or two to realize that this is equivalent to the common OR assumption of the MH test.) Alternatively, use logistic regression glm(case ~ oc + age, family=binomial, weight=count, data=dd) (NB: it is important that case is a factor here!) -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simulate user input in R
Dear all, Can we simulate user input in R ? for example if we have a function which needs an input from the user to continue its work, can we automate this step (simulate the input...) Here is the sample: choose between one of the grouping factor available : c(Village, Country) we need to enter Village or Country to continue.can I automatically pass one of the values without user input. Regards, Vahe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Understanding how nls() works
I'm trying to understand what goes on in the process that nls() uses. This converges without much drama: rate.nls - nls(Dev ~ exp(rho * (T)) - exp(rho*Tmax - (Tmax - T)/del)+ + lam , data = xx, trace = TRUE, + start = c(rho = 0.15, del = 7, lam = .02, Tmax = 30)) 599.7841 : 0.15 7.00 0.02 30.00 4.69849 : 0.14673457 6.83443060 -0.01417782 29.94392535 0.06971343 : 0.14787121 6.75095966 -0.01281743 28.88851217 0.01197127 : 0.1460683 6.8334550 -0.0128118 30.1599472 0.008279834 : 0.14406976 6.92819920 -0.01281783 31.63914711 0.002267513 : 0.1437983 6.9426081 -0.0127973 31.9153204 0.002264977 : 0.14377323 6.94385235 -0.01279224 31.95091153 0.002264977 : 0.14378546 6.94326629 -0.01278643 31.95045828 0.002264977 : 0.14378124 6.94346882 -0.01278843 31.95065902 0.002264977 : 0.14378271 6.94339837 -0.01278772 31.95059050 If I make one parameter further away from optimum, I get this: rate.nls - nls(Dev ~ exp(rho * (T)) - exp(rho*Tmax - (Tmax - T)/del)+ + lam , data = xx, trace = TRUE, + nls.control(warnOnly = TRUE), + start = c(rho = 0.15, del = 7, lam = 1.02, Tmax = 30)) 59.80968 : 0.15 7.00 1.02 30.00 4.69849 : 0.14673457 6.83443060 -0.01417782 29.94392535 0.06973771 : 0.14787046 6.75099517 -0.01281668 28.88853959 0.0119916 : 0.1460709 6.8333430 -0.0128098 30.1600040 0.008369133 : 0.14406664 6.92834092 -0.01281926 31.63978393 0.002267664 : 0.1438008 6.9424901 -0.0127959 31.9151258 0.002264977 : 0.1437771 6.9436655 -0.0127904 31.9507300 0.002264977 : 0.14378253 6.94340678 -0.01278781 31.95059843 0.002264977 : 0.1437849 6.9432936 -0.0127867 31.9504860 Warning message: In nls(Dev ~ exp(rho * (T)) - exp(rho * Tmax - (Tmax - T)/del) + : step factor 0.000488281 reduced below 'minFactor' of 0.000976562 There's no surprise that the traces will be different, but I can't understand why the first iteration gives exactly the same result in both cases. *Then* they diverge. What is the explanation? (Maybe if I knew what the step factor refers to, it could be clearer but I suspect it doesn't come in at that stage.) I can't think of a way of making a minimal working example but I'd guess that it's not necessary to explain the procedure. I could supply my xx dataframe if it's indispensible to the explanation. TIA -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) . Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulate user input in R
Your example is not reproducible, so it is open to misinterpretation. Normally the approach taken in R is not to wait for user input, but have the user call the function with the desired values. One interpretation of your question is that the function you are using is broken by design, and you should re-write it. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Vahe nr vne...@gmail.com wrote: Dear all, Can we simulate user input in R ? for example if we have a function which needs an input from the user to continue its work, can we automate this step (simulate the input...) Here is the sample: choose between one of the grouping factor available : c(Village, Country) we need to enter Village or Country to continue.can I automatically pass one of the values without user input. Regards, Vahe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulate user input in R
Thanks Jeff for your reply... The code is not written by me it is the NDVITS package which is used in R. TimeSeriesAnalysis {ndvits} And at that point it requires my input. Regards, Vahe On Wed, Apr 24, 2013 at 11:04 AM, Jeff Newmiller jdnew...@dcn.davis.ca.uswrote: Your example is not reproducible, so it is open to misinterpretation. Normally the approach taken in R is not to wait for user input, but have the user call the function with the desired values. One interpretation of your question is that the function you are using is broken by design, and you should re-write it. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Vahe nr vne...@gmail.com wrote: Dear all, Can we simulate user input in R ? for example if we have a function which needs an input from the user to continue its work, can we automate this step (simulate the input...) Here is the sample: choose between one of the grouping factor available : c(Village, Country) we need to enter Village or Country to continue.can I automatically pass one of the values without user input. Regards, Vahe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] News package
Hi, Is there any package available in R to download news content? Thanks in advance. Ashy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] pglm package: fitted values and residuals
I'm using the package pglm and I'have estimated a random probit model. I need to save in a vector the fitted values and the residuals of the model but I can not do it. I tried with the command fitted.values using the following procedure without results: library(pglm) m1_S-pglm(Feed ~ Cons_PC_1 + imp_gen_1 + LGDP_PC_1 + lnEI_1 + SH_Ren_1,data,family=binomial(probit),model=random,method=bfgs,index=c(Year,IDCountry)) m1_S$fitted.values residuals(m1) Can someone help me about it? Thanks ** IL MERITO DEGLI STUDENTI VIENE RICONOSCIUTO Il 5 per mille all'Universita' degli Studi di Napoli Parthenope incrementa le borse di studio agli studenti - codice fiscale 80018240632 http://www.uniparthenope.it/index.php/5xmille http://www.uniparthenope.it/index.php/it/avvisi-sito-di-ateneo/2943-la-parthenope-premia-il-tuo-voto-di-diploma-ed-il-tuo-imegno-con-i-proventi-del-5-per-mille Questa informativa e' inserita in automatico dal sistema al fine esclusivo della realizzazione dei fini istituzionali dell'ente. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] identify object that causes Error in loadNamespace(name) : there is no package called ‘R.utils’
Dear all, I've bumped into the: Error in loadNamespace(name) : there is no package called ‘R.utils’ error. I've already read a bit on this ( http://www.cybaea.net/Blogs/Data/A-warning-on-the-R-save-format.html ) but I have a follow-up question. Given a workspace that automatically loads a package that I don't really need/want (e.g. ‘R.utils’), how do I identify which object requires this package to load? I would like to avoid loading ‘R.utils’ every time I open an R session. Regards, Liviu sessionInfo() R version 2.15.3 (2013-03-01) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] datasets grDevices splines graphics utils stats methods base other attached packages: [1] R.utils_1.23.2R.oo_1.13.0 R.methodsS3_1.4.2 tables_0.7.57 reshape2_1.2.2 [6] car_2.0-15nnet_7.3-6MASS_7.3-23 Hmisc_3.10-1 survival_2.37-2 [11] foreign_0.8-53 loaded via a namespace (and not attached): [1] cluster_1.14.3 grid_2.15.3 lattice_0.20-13 plyr_1.8 rstudio_0.97.312 [6] stringr_0.6.2tools_2.15.3 -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Looking for a better code for my problem.
Hello again, Let say I have following data: Dat - structure(list(AA = structure(c(3L, 1L, 2L, 1L, 2L, 3L, 3L, 2L, 3L, 1L, 1L, 3L, 3L, 2L, 2L, 3L, 2L, 1L, 1L, 1L), .Label = c(A, B, C), class = factor), BB = structure(c(2L, 3L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 1L, 1L, 2L, 3L, 1L, 3L, 2L, 1L, 2L, 2L, 3L ), .Label = c(a, b, c), class = factor), CC = 1:20), .Names = c(AA, BB, CC), row.names = c(NA, -20L), class = data.frame) Now I want to select a subset of that 'Dat', for which: 1. First column will contain ALL A 2. First column will contain those B for which BB = b in second column. Therefore I tries following: Only_A - Dat[Dat[, 'AA'] == A, ] Only_B - Dat[Dat[, 'AA'] == B, ] rbind(Only_A, Only_B[Only_B[, 'BB'] == b, ]) AA BB CC 2 A c 2 4 A b 4 10 A a 10 11 A a 11 18 A b 18 19 A b 19 20 A c 20 3 B b 3 5 B b 5 8 B b 8 However I believe there must be some better code to achieve that which is tidier, i.e. there must be some 1-liner code. Can somebody suggest any better approach if possible? Thanks and regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looking for a better code for my problem.
Hello, You can easily make of the following a one-liner. Note that the order of rows is not the same as in your code, so identical() will return FALSE. idx - Dat[, 'AA'] == A | (Dat[, 'AA'] == B Dat[, 'BB'] == b) res2 - Dat[idx, ] Hope this helps, Rui Barradas Em 24-04-2013 11:21, Christofer Bogaso escreveu: Hello again, Let say I have following data: Dat - structure(list(AA = structure(c(3L, 1L, 2L, 1L, 2L, 3L, 3L, 2L, 3L, 1L, 1L, 3L, 3L, 2L, 2L, 3L, 2L, 1L, 1L, 1L), .Label = c(A, B, C), class = factor), BB = structure(c(2L, 3L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 1L, 1L, 2L, 3L, 1L, 3L, 2L, 1L, 2L, 2L, 3L ), .Label = c(a, b, c), class = factor), CC = 1:20), .Names = c(AA, BB, CC), row.names = c(NA, -20L), class = data.frame) Now I want to select a subset of that 'Dat', for which: 1. First column will contain ALL A 2. First column will contain those B for which BB = b in second column. Therefore I tries following: Only_A - Dat[Dat[, 'AA'] == A, ] Only_B - Dat[Dat[, 'AA'] == B, ] rbind(Only_A, Only_B[Only_B[, 'BB'] == b, ]) AA BB CC 2 A c 2 4 A b 4 10 A a 10 11 A a 11 18 A b 18 19 A b 19 20 A c 20 3 B b 3 5 B b 5 8 B b 8 However I believe there must be some better code to achieve that which is tidier, i.e. there must be some 1-liner code. Can somebody suggest any better approach if possible? Thanks and regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] extract function extracting only NA values
Hello, I have five raster files in ASCII format. With four of them I have no problem extracting values based on a set of X and Y coordinates. Unfortunately with one of the files all I managed to extract is NA values. To verify the problem I have opened the raster with ArcMap and there are no NA values where I am extracting. I have also plotted the spatial point class on top of the raster in R and it does correspond to the correct locations. These are some of the commands I am using, and as I already pointed out that works perfectly with other raster files. sp-SpatialPoints(xysp) xy$rasterimg-extract(rasterimg,sp) Can anyone help? At this point I am rather clueless about this. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regression on stratified count data
On Wed, 24 Apr 2013, meng wrote: Hi all: For stratified count data,how to perform regression analysis? My data: age case oc count 1 1 121 1 1 226 1 2 117 1 2 259 2 1 118 2 1 288 2 2 1 7 2 2 2 95 age: 1:40y 2:40y case: 1:patient 2:health oc: 1:use drug 2:not use drug My purpose: Anaysis whether case and oc are correlated, and age is a stratified variable. My solution: 1,Mantel-Haenszel test by using function mantelhaen.test 2,loglinear regression by using function glm(count~case*oc,family=poisson).But I don't know how to handle variable age,which is the stratified variable. Instead of using glm(family = poisson) it is also convenient to use loglm() from package MASS for the associated convenience table. The code below shows how to set up the contingency table, visualize it using package vcd, and then fit two models using loglm. The models considered are conditional independence of case and drug given age and the no three-way interaction already suggested by Peter. Both models are also accompanied by visualizations of the residuals. ## contingency table with nice labels tab - expand.grid(drug = 1:2, case = 1:2, age = 1:2) tab$count - c(21, 26, 17, 59, 18, 88, 7, 95) tab$age - factor(tab$age, levels = 1:2, labels = c(40, 40)) tab$case - factor(tab$case, levels = 1:2, labels = c(patient, healthy)) tab$drug - factor(tab$drug, levels = 1:2, labels = c(yes, no)) tab - xtabs(count ~ age + drug + case, data = tab) ## visualize case explained by drug given age library(vcd) mosaic(case ~ drug | age, data = tab, split_vertical = c(TRUE, TRUE, FALSE)) ## test wheter drug and case are independent given age m1 - loglm(~ age/(drug + case), data = tab) m1 ## visualize corresponding residuals from independence model mosaic(case ~ drug | age, data = tab, split_vertical = c(TRUE, TRUE, FALSE), residuals_type = deviance, gp = shading_hcl, gp_args = list(interpolate = 1.2)) mosaic(case ~ drug | age, data = tab, split_vertical = c(TRUE, TRUE, FALSE), residuals_type = pearson, gp = shading_hcl, gp_args = list(interpolate = 1.2)) ## test whether there is no three-way interaction ## (i.e., dependence of case on drug is the same for both age groups) m2 - loglm(~ (age + drug + case)^2, data = tab) m2 ## visualization (with default pearson residuals) mosaic(case ~ drug | age, data = tab, expected = ~ (age + drug + case)^2, split_vertical = c(TRUE, TRUE, FALSE), gp = shading_hcl, gp_args = list(interpolate = 1.2)) Many thanks for your help. My best. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looking for a better code for my problem.
Try subset(Dat, AA == A | (AA == B BB == b)) HTH, Jorge.- On Wed, Apr 24, 2013 at 8:21 PM, Christofer Bogaso bogaso.christo...@gmail.com wrote: Hello again, Let say I have following data: Dat - structure(list(AA = structure(c(3L, 1L, 2L, 1L, 2L, 3L, 3L, 2L, 3L, 1L, 1L, 3L, 3L, 2L, 2L, 3L, 2L, 1L, 1L, 1L), .Label = c(A, B, C), class = factor), BB = structure(c(2L, 3L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 1L, 1L, 2L, 3L, 1L, 3L, 2L, 1L, 2L, 2L, 3L ), .Label = c(a, b, c), class = factor), CC = 1:20), .Names = c(AA, BB, CC), row.names = c(NA, -20L), class = data.frame) Now I want to select a subset of that 'Dat', for which: 1. First column will contain ALL A 2. First column will contain those B for which BB = b in second column. Therefore I tries following: Only_A - Dat[Dat[, 'AA'] == A, ] Only_B - Dat[Dat[, 'AA'] == B, ] rbind(Only_A, Only_B[Only_B[, 'BB'] == b, ]) AA BB CC 2 A c 2 4 A b 4 10 A a 10 11 A a 11 18 A b 18 19 A b 19 20 A c 20 3 B b 3 5 B b 5 8 B b 8 However I believe there must be some better code to achieve that which is tidier, i.e. there must be some 1-liner code. Can somebody suggest any better approach if possible? Thanks and regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract function extracting only NA values
So I think I might have found what is causing this problem. The values I have in this raster summary(ps0011yme) layer Min.-9458.911 1st Qu. 1955256.000 Median 10618870.000 3rd Qu. 79577730.000 Max.167780500.000 NA's0.000 From ArcMap though I get different values min -69826224 and max 167780496 So maybe when I am importing the rest are in R something goes wrong and all the values below a certain threshold are considered NA. Is this a bug or do I get to use a specific parameter for the raster function? On 4/24/2013 13:10, Fabio Berzaghi wrote: Hello, I have five raster files in ASCII format. With four of them I have no problem extracting values based on a set of X and Y coordinates. Unfortunately with one of the files all I managed to extract is NA values. To verify the problem I have opened the raster with ArcMap and there are no NA values where I am extracting. I have also plotted the spatial point class on top of the raster in R and it does correspond to the correct locations. These are some of the commands I am using, and as I already pointed out that works perfectly with other raster files. sp-SpatialPoints(xysp) xy$rasterimg-extract(rasterimg,sp) Can anyone help? At this point I am rather clueless about this. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bootstrapping a 11x10 matrix
Herbejie Rose, You could use the boot() function in the R package boot. For example: # example data matrix m - matrix(sample(11*10), ncol=10) # function to calculate column means for indexed rows of matrix myfun - function(data, i) { apply(data[i, ], 2, mean) } # 1000 bootstrap samples library(boot) myboot - boot(m, myfun, R=1000) myboot dim(myboot$t) Jean On Tue, Apr 23, 2013 at 10:41 AM, Herbejie Rose Cuizon herbejier...@gmail.com wrote: Good evening! This is Herbejie Rose I need your help for me to compute the following: I just want to ask on how to bootstrap a 11x10 matrix to obtain 1000 bootstrap samples and compute the 10 dimensional mean per bootstrap sample. the 11x10 dimension of the matrix has 11 subjects and 10 variables. Just want to have 11x10 per bootrap sample with replacement and compute the 10 dimensional mean. Hope to have a positive response on this matter.. Thank you so much. Sincerely yours, Herbejie Rose [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] News package
On Wed, Apr 24, 2013 at 9:02 AM, Aswathy Nair ashy4...@gmail.com wrote: Hi, Is there any package available in R to download news content? What news source are you looking for? You could, e.g., use the twitteR package, but to my knowledge for things like Google News or the BBC you'll need to likely roll your own with the XML and RCurl packages. MW __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identify object that causes Error in loadNamespace(name) : there is no package called ‘R.utils’
On 13-04-24 5:46 AM, Liviu Andronic wrote: Dear all, I've bumped into the: Error in loadNamespace(name) : there is no package called ‘R.utils’ error. I've already read a bit on this ( http://www.cybaea.net/Blogs/Data/A-warning-on-the-R-save-format.html ) but I have a follow-up question. Given a workspace that automatically loads a package that I don't really need/want (e.g. ‘R.utils’), how do I identify which object requires this package to load? I would like to avoid loading ‘R.utils’ every time I open an R session. That's not easy, because the code in R that triggers that error has no idea of the name of the object it is loading. You could try a binary search to find out, but it will be tedious: 1. Install R.utils. 2. Load the workspace successfully. 3. Delete half the objects, and save it. 4. Uninstall R.utils, and see if you can load the workspace. At this point you'll know if there's an object needing R.utils still left or not, and you can repeat the steps until you find a single object that causes the problem. (But it might not be the only one, so deleting it from the original workspace might not solve your problem.) A better approach is to *never* save and load workspaces unless you know exactly what is in them. Always reply no to the question about saving your workspace (or set that as the default). If you accidentally end up with a workspace being loaded, delete it. Duncan Murdoch Regards, Liviu sessionInfo() R version 2.15.3 (2013-03-01) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] datasets grDevices splines graphics utils stats methods base other attached packages: [1] R.utils_1.23.2R.oo_1.13.0 R.methodsS3_1.4.2 tables_0.7.57 reshape2_1.2.2 [6] car_2.0-15nnet_7.3-6MASS_7.3-23 Hmisc_3.10-1 survival_2.37-2 [11] foreign_0.8-53 loaded via a namespace (and not attached): [1] cluster_1.14.3 grid_2.15.3 lattice_0.20-13 plyr_1.8 rstudio_0.97.312 [6] stringr_0.6.2tools_2.15.3 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identify object that causes Error in loadNamespace(name) : there is no package called ‘R.utils’
On 04/24/2013 06:03 AM, Duncan Murdoch wrote: On 13-04-24 5:46 AM, Liviu Andronic wrote: Dear all, I've bumped into the: Error in loadNamespace(name) : there is no package called ‘R.utils’ error. I've already read a bit on this ( http://www.cybaea.net/Blogs/Data/A-warning-on-the-R-save-format.html ) but I have a follow-up question. Given a workspace that automatically loads a package that I don't really need/want (e.g. ‘R.utils’), how do I identify which object requires this package to load? I would like to avoid loading ‘R.utils’ every time I open an R session. That's not easy, because the code in R that triggers that error has no idea of the name of the object it is loading. Maybe traceback() can provide some hints? I did, more or less arbitrarily library(rms) a = list(fun=ie.setup) save(a, file=/tmp/a.rda) remove.packages(rms) and then in a new session load(/tmp/a.rda) Error in loadNamespace(name) : there is no package called 'rms' traceback() 7: stop(e) 6: value[[3L]](cond) 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 4: tryCatchList(expr, classes, parentenv, handlers) 3: tryCatch(loadNamespace(name), error = function(e) stop(e)) 2: getNamespace(c(rms, 3.6-3)) 1: load(/tmp/a.rda) with the line numbered 2 giving me the necessary hint. Martin You could try a binary search to find out, but it will be tedious: 1. Install R.utils. 2. Load the workspace successfully. 3. Delete half the objects, and save it. 4. Uninstall R.utils, and see if you can load the workspace. At this point you'll know if there's an object needing R.utils still left or not, and you can repeat the steps until you find a single object that causes the problem. (But it might not be the only one, so deleting it from the original workspace might not solve your problem.) A better approach is to *never* save and load workspaces unless you know exactly what is in them. Always reply no to the question about saving your workspace (or set that as the default). If you accidentally end up with a workspace being loaded, delete it. Duncan Murdoch Regards, Liviu sessionInfo() R version 2.15.3 (2013-03-01) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] datasets grDevices splines graphics utils stats methods base other attached packages: [1] R.utils_1.23.2R.oo_1.13.0 R.methodsS3_1.4.2 tables_0.7.57 reshape2_1.2.2 [6] car_2.0-15nnet_7.3-6MASS_7.3-23 Hmisc_3.10-1 survival_2.37-2 [11] foreign_0.8-53 loaded via a namespace (and not attached): [1] cluster_1.14.3 grid_2.15.3 lattice_0.20-13 plyr_1.8 rstudio_0.97.312 [6] stringr_0.6.2tools_2.15.3 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about the loglm and glm---Re:Re: Regression on stratified count data
On Wed, 24 Apr 2013, meng wrote: Hi,Achim: Can all the analysis you mentioned via loglm be performed via glm(...,family=poisson)? Yes. ## transform table back to data.frame df - as.data.frame(tab) ## fit models: conditional independence, no-three way interaction, ## and saturated g1 - glm(Freq ~ age/(drug + case), data = df, family = poisson) g2 - glm(Freq ~ (age + drug + case)^2, data = df, family = poisson) g3 - glm(Freq ~ age * drug * case, data = df, family = poisson) ## likelihood-ratio tests (against saturated) anova(g1, g3, test = Chisq) anova(g2, g3, test = Chisq) ## compare fitted frequencies (which are essentially equal) all.equal(as.numeric(fitted(g1)), as.data.frame(as.table(fitted(m1)))$Freq) all.equal(as.numeric(fitted(g2)), as.data.frame(as.table(fitted(m2)))$Freq) The difference is mainly that loglm() has a specialized user interface and that it uses a different optimizer (iterative proportional fitting rather than iterative reweighted least squares). Best, Z Many thanks. At 2013-04-24 19:37:10,Achim Zeileis achim.zeil...@uibk.ac.at wrote: On Wed, 24 Apr 2013, meng wrote: Hi all: For stratified count data,how to perform regression analysis? My data: age case oc count 1 1 121 1 1 226 1 2 117 1 2 259 2 1 118 2 1 288 2 2 1 7 2 2 2 95 age: 1:40y 2:40y case: 1:patient 2:health oc: 1:use drug 2:not use drug My purpose: Anaysis whether case and oc are correlated, and age is a stratified varia ble. My solution: 1,Mantel-Haenszel test by using function mantelhaen.test 2,loglinear regression by using function glm(count~case*oc,family=poisson ).But I don't know how to handle variable age,which is the stratified vari able. Instead of using glm(family = poisson) it is also convenient to use loglm() from package MASS for the associated convenience table. The code below shows how to set up the contingency table, visualize it using package vcd, and then fit two models using loglm. The models considered are conditional independence of case and drug given age and the no three-way interaction already suggested by Peter. Both models are also accompanied by visualizations of the residuals. ## contingency table with nice labels tab - expand.grid(drug = 1:2, case = 1:2, age = 1:2) tab$count - c(21, 26, 17, 59, 18, 88, 7, 95) tab$age - factor(tab$age, levels = 1:2, labels = c(40, 40)) tab$case - factor(tab$case, levels = 1:2, labels = c(patient, healthy)) tab$drug - factor(tab$drug, levels = 1:2, labels = c(yes, no)) tab - xtabs(count ~ age + drug + case, data = tab) ## visualize case explained by drug given age library(vcd) mosaic(case ~ drug | age, data = tab, split_vertical = c(TRUE, TRUE, FALSE)) ## test wheter drug and case are independent given age m1 - loglm(~ age/(drug + case), data = tab) m1 ## visualize corresponding residuals from independence model mosaic(case ~ drug | age, data = tab, split_vertical = c(TRUE, TRUE, FALSE), residuals_type = deviance, gp = shading_hcl, gp_args = list(interpolate = 1.2)) mosaic(case ~ drug | age, data = tab, split_vertical = c(TRUE, TRUE, FALSE), residuals_type = pearson, gp = shading_hcl, gp_args = list(interpolate = 1.2)) ## test whether there is no three-way interaction ## (i.e., dependence of case on drug is the same for both age groups) m2 - loglm(~ (age + drug + case)^2, data = tab) m2 ## visualization (with default pearson residuals) mosaic(case ~ drug | age, data = tab, expected = ~ (age + drug + case)^2, split_vertical = c(TRUE, TRUE, FALSE), gp = shading_hcl, gp_args = list(interpolate = 1.2)) Many thanks for your help. My best. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.h tml and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract function extracting only NA values
ok so i imported the raster as a dataframe and everything is fine Min. :-69826220 Max. :167780500 so it's something with values lower than - that are interpreted as NA by the raster function On 4/24/2013 13:52, Fabio Berzaghi wrote: So I think I might have found what is causing this problem. The values I have in this raster summary(ps0011yme) layer Min.-9458.911 1st Qu. 1955256.000 Median 10618870.000 3rd Qu. 79577730.000 Max.167780500.000 NA's0.000 From ArcMap though I get different values min -69826224 and max 167780496 So maybe when I am importing the rest are in R something goes wrong and all the values below a certain threshold are considered NA. Is this a bug or do I get to use a specific parameter for the raster function? On 4/24/2013 13:10, Fabio Berzaghi wrote: Hello, I have five raster files in ASCII format. With four of them I have no problem extracting values based on a set of X and Y coordinates. Unfortunately with one of the files all I managed to extract is NA values. To verify the problem I have opened the raster with ArcMap and there are no NA values where I am extracting. I have also plotted the spatial point class on top of the raster in R and it does correspond to the correct locations. These are some of the commands I am using, and as I already pointed out that works perfectly with other raster files. sp-SpatialPoints(xysp) xy$rasterimg-extract(rasterimg,sp) Can anyone help? At this point I am rather clueless about this. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to make a raster image in R from my own data set
Hi R-user, I was trying to make a raster map with WGS84 projection in R, but I could not make it. I found one data set in Google that data is almost the same format as of mine. I wanted to make a raster map of temperature with 1 degree spatial resolution for the global scale. I could make it in GIS software but I do have many variables (to be many raster images) and ultimately I am importing them to R for further analysis. Therefore, I wanted to make them in R, if possible. It would be great if you give some hints on how script look like in creating a raster map from my own data set (I have provided link for your references, this is an example data set). I am really appropriating for your help. #-- #create a raster map from scratch install.packages(raster, dependencies=TRUE) library(raster) # raster data install.packages(rgdal, dependencies=TRUE) library(rgdal) # input/output, projections install.packages(rgeos, dependencies=TRUE) library(rgeos) # geometry ops install.packages(spdep, dependencies=TRUE) library(spdep) # spatial dependence install.packages(pastecs, dependencies=TRUE) library(pastecs) pts-read.table.url(https://www.betydb.org//miscanthusyield.csv;, header=T, sep=,) proj4string(pts)=- CRS(+proj=longlat +datum=WGS84) #--- Cheers, Kristi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identify object that causes Error in loadNamespace(name) : there is no package called ‘R.utils’
On 13-04-24 10:12 AM, Martin Morgan wrote: On 04/24/2013 06:03 AM, Duncan Murdoch wrote: On 13-04-24 5:46 AM, Liviu Andronic wrote: Dear all, I've bumped into the: Error in loadNamespace(name) : there is no package called ‘R.utils’ error. I've already read a bit on this ( http://www.cybaea.net/Blogs/Data/A-warning-on-the-R-save-format.html ) but I have a follow-up question. Given a workspace that automatically loads a package that I don't really need/want (e.g. ‘R.utils’), how do I identify which object requires this package to load? I would like to avoid loading ‘R.utils’ every time I open an R session. That's not easy, because the code in R that triggers that error has no idea of the name of the object it is loading. Maybe traceback() can provide some hints? I did, more or less arbitrarily library(rms) a = list(fun=ie.setup) save(a, file=/tmp/a.rda) remove.packages(rms) and then in a new session load(/tmp/a.rda) Error in loadNamespace(name) : there is no package called 'rms' traceback() 7: stop(e) 6: value[[3L]](cond) 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 4: tryCatchList(expr, classes, parentenv, handlers) 3: tryCatch(loadNamespace(name), error = function(e) stop(e)) 2: getNamespace(c(rms, 3.6-3)) 1: load(/tmp/a.rda) with the line numbered 2 giving me the necessary hint. That tells you that some object needs the rms package, but I don't see how you would know it is a that is the problem. We already knew that rms was needed from the error message. What I've done sometimes in debugging is to change that error to a warning in the getNamespace() function, and add some tracing code to the serialization code to print the names of objects as they are loaded. (This goes in ReadItem in src/main/serialize.c.) I wouldn't expect Liviu to make those changes, but perhaps a verbose option could be added to load(), so that it could be available to users. Duncan Murdoch Martin You could try a binary search to find out, but it will be tedious: 1. Install R.utils. 2. Load the workspace successfully. 3. Delete half the objects, and save it. 4. Uninstall R.utils, and see if you can load the workspace. At this point you'll know if there's an object needing R.utils still left or not, and you can repeat the steps until you find a single object that causes the problem. (But it might not be the only one, so deleting it from the original workspace might not solve your problem.) A better approach is to *never* save and load workspaces unless you know exactly what is in them. Always reply no to the question about saving your workspace (or set that as the default). If you accidentally end up with a workspace being loaded, delete it. Duncan Murdoch Regards, Liviu sessionInfo() R version 2.15.3 (2013-03-01) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] datasets grDevices splines graphics utils stats methods base other attached packages: [1] R.utils_1.23.2R.oo_1.13.0 R.methodsS3_1.4.2 tables_0.7.57 reshape2_1.2.2 [6] car_2.0-15nnet_7.3-6MASS_7.3-23 Hmisc_3.10-1 survival_2.37-2 [11] foreign_0.8-53 loaded via a namespace (and not attached): [1] cluster_1.14.3 grid_2.15.3 lattice_0.20-13 plyr_1.8 rstudio_0.97.312 [6] stringr_0.6.2tools_2.15.3 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identify object that causes Error in loadNamespace(name) : there is no package called ‘R.utils’
Dear Duncan, On Wed, Apr 24, 2013 at 4:57 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: What I've done sometimes in debugging is to change that error to a warning in the getNamespace() function, and add some tracing code to the serialization code to print the names of objects as they are loaded. (This goes in ReadItem in src/main/serialize.c.) I wouldn't expect Liviu to make those changes, but perhaps a verbose option could be added to load(), so that it could be available to users. That would be useful, indeed. As it stands save()/load() workspaces in R seems very fragile and could easily trip users when working on multiple machines or sharing their workspace with a colleague. In such cases it is important to be able to quickly pinpoint the offending object. Regards, Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identify object that causes Error in loadNamespace(name) : there is no package called ‘R.utils’
Dear Duncan, On Wed, Apr 24, 2013 at 3:03 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: A better approach is to *never* save and load workspaces unless you know exactly what is in them. Always reply no to the question about saving your workspace (or set that as the default). If you accidentally end up with a workspace being loaded, delete it. I must admit that I'm a bit surprised by this. I was always under the impression that saving/restoring workspaces was the proper workflow in R. If you use R interactively (e.g., not by running scripts), how else would you store your data, intermediary results, etc., while working on a project? Am I missing something? Regards, Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identify object that causes Error in loadNamespace(name) : there is no package called ‘R.utils’
On Wed, Apr 24, 2013 at 4:08 PM, Liviu Andronic landronim...@gmail.com wrote: Dear Duncan, On Wed, Apr 24, 2013 at 3:03 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: A better approach is to *never* save and load workspaces unless you know exactly what is in them. Always reply no to the question about saving your workspace (or set that as the default). If you accidentally end up with a workspace being loaded, delete it. I must admit that I'm a bit surprised by this. I was always under the impression that saving/restoring workspaces was the proper workflow in R. If you use R interactively (e.g., not by running scripts), how else would you store your data, intermediary results, etc., while working on a project? Am I missing something? I save the objects themselves (rather than the whole workspace) using saveRDS() and readRDS() -- which I *think* are considered better practice than save() and load() because they don't force names on you. Cheers, MW __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with raster function with values lower than -9999
As I have reported in a previous thread, I have been having problems importing a ASCII raster that has values that go from Min. :-69826220 to Max. :167780500. The problem I am encountering is that when I use the raster function to import the ASCII file then every value smaller than - is reported as NA and the minimum value is -9458. Is this a bug of the function and is there a workaround? When I import the same ASCII file as a data frame everything is fine and I get the whole range of values. Any help is appreciated __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identify object that causes Error in loadNamespace(name) : there is no package called ‘R.utils’
I must admit that I'm a bit surprised by this. I was always under the impression that saving/restoring workspaces was the proper workflow in R. If you use R interactively (e.g., not by running scripts), how else would you store your data, intermediary results, etc., while working on a project? Am I missing something? You don't _store_ intermediate results - you recreate them from your script. If they are time consuming, then you can use readRDS/saveRDS to cache the results. If you don't start with a clean workspace frequently, it's difficult to tell whether or not your code is reproducible. Hadley -- Chief Scientist, RStudio http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the joy of spreadsheets (off-topic)
In case you haven't noticed, this is making the rounds in the media, including a handful of references to R. See e.g. http://news.slashdot.org/story/13/04/17/0215211/excel-error-contributes-to-problems-with-austerity-study I suppose we can't fortune()'ify anonymous quotes, but I kind of like this exchange: Bacon Bits: SPSS and R are very good at statistical analysis. Quantrix, MapleSoft, IBM Algorithmics, and other software is for financial data modeling. None of those is particularly appropriate for sharing data in a useful format with peers. Excel is. Hatta: R is extremely appropriate for sharing data in a useful format with peers. It's completely free for one. But more importantly, it saves every single step of your analysis. Send someone an Excel file, and who knows what they've done to the data. Send someone your R project directory and they can see exactly what you did. The problem with sending R files to your peers isn't that the R files aren't useful. It's that your peers aren't. On Apr 16, 2013, at 19:25 , Sarah Goslee wrote: Given that we occasionally run into problems with comparing Excel results to R results, and other spreadsheet-induced errors, I thought this might be of interest. http://www.nextnewdeal.net/rortybomb/researchers-finally-replicated-reinhart-rogoff-and-there-are-serious-problems The punchline: If this error turns out to be an actual mistake Reinhart-Rogoff made, well, all I can hope is that future historians note that one of the core empirical points providing the intellectual foundation for the global move to austerity in the early 2010s was based on someone accidentally not updating a row formula in Excel. Ouch. (Note: I know nothing about the site, the author of the article, or the study in question. I was pointed to it by someone else. But if true: highly problematic.) Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identify object that causes Error in loadNamespace(name) : there is no package called ‘R.utils’
-Original Message- From: h.wick...@gmail.com Sent: Wed, 24 Apr 2013 10:48:01 -0500 To: landronim...@gmail.com Subject: Re: [R] identify object that causes Error in loadNamespace(name) : there is no package called ‘R.utils’ I must admit that I'm a bit surprised by this. I was always under the impression that saving/restoring workspaces was the proper workflow in R. If you use R interactively (e.g., not by running scripts), how else would you store your data, intermediary results, etc., while working on a project? Am I missing something? You don't _store_ intermediate results - you recreate them from your script. If they are time consuming, then you can use readRDS/saveRDS to cache the results. If you don't start with a clean workspace frequently, it's difficult to tell whether or not your code is reproducible. Hadley -- Chief Scientist, RStudio http://had.co.nz/ Not to mention http://www.mail-archive.com/r-help@r-project.org/msg196997.html John Kane Kingston ON Canada GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need to replicate Boltzman Signmodial Curve fit from Graph Pad
Sorry. I left out a line: pHvals - seq(min(dta$pH), max(dta$pH), length.out=100) Should be inserted just before the lines() function. - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 From: J J [mailto:rnoob...@gmail.com] Sent: Tuesday, April 23, 2013 6:49 PM To: David Carlson Subject: Re: [R] Need to replicate Boltzman Signmodial Curve fit from Graph Pad Thank you David! That looks really close, I believe Graphpad gave a V50 of 7.244. One thing I missed on your example was where you obtained 'pHvals' from? On Tue, Apr 23, 2013 at 5:17 PM, David Carlson dcarl...@tamu.edu wrote: You may want to compare R's four point logistic with the results you get using Graphpad. For these data the Bottom parameter is not significantly different from zero (so the three point sigmoid might be adequate), but as Bert points out, especially with only 8 data points, it is probably overfitting the model. dta - read.table(text=pH counts + 3.8 968 + 5.0 1347 + 5.8 2867 + 6.6 9203 + 7.0 15817 + 7.4 20297 + 8.2 31916 + 9.2 35756, + header=TRUE) Sig.nls - nls(counts~SSfpl(pH, Bottom, Top, V50, Slope), dta) summary(Sig.nls) Formula: counts ~ SSfpl(pH, Bottom, Top, V50, Slope) Parameters: Estimate Std. Error t value Pr(|t|) Bottom 718.76871 579.25327 1.241 0.282463 Top 36983.98871 1008.31727 36.679 0.032986 *** V50 7.24930 0.05054 143.436 0.000142 *** Slope 0.55582 0.05023 11.065 0.000379 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 767.1 on 4 degrees of freedom Number of iterations to convergence: 0 Achieved convergence tolerance: 0.03808 options(scipen=5) confint(Sig.nls) Waiting for profiling to be done... 2.5% 97.5% Bottom -963.4106068 2243.7154750 Top 34499.9962176 40150.4114887 V50 7.1173103 7.4033391 Slope 0.4382893 0.7114438 coef(Sig.nls) Bottom Top V50 Slope 718.7687104 36983.9887074 7.2493032 0.5558236 plot(dta, pch=16) lines(pHvals, predict(Sig.nls, data.frame(pH=pHvals)), col=red) - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter Sent: Tuesday, April 23, 2013 12:04 PM To: John Kane Cc: r-help@r-project.org; J J Subject: Re: [R] Need to replicate Boltzman Signmodial Curve fit from Graph Pad I think you'll find that this is just an alternate parameterization of a 4 P logistic. ... and in any case there is no statistically detectable difference between the two. That is, this is just basically a Graphpad marketing ploy. Fit such data with anyone's logistic function ... and that is probably overfitting in many cases, anyway. Cheers, Bert On Tue, Apr 23, 2013 at 9:54 AM, John Kane jrkrid...@inbox.com wrote: No idea of the area but does this link help? http://bioinformatics.oxfordjournals.org/content/24/13/1549.full John Kane Kingston ON Canada -Original Message- From: rnoob...@gmail.com Sent: Tue, 23 Apr 2013 10:55:59 -0400 To: r-help@r-project.org Subject: [R] Need to replicate Boltzman Signmodial Curve fit from Graph Pad Hello useRs (please don't kill me), I've fairly new to R having only a few months of playing around with R. What little I've learned has been extremely useful. If someone could point me as to how to replicate the Boltzman Sigmodial curve fit as provided by Graphpad software I'd be eternally grateful. Where we currently use Graphpad for only this one function,its seems highly inefficient for a $100 piece of software to be used for only one function (which isn't nearly as bad as the company's pandemic reliance on Excel for nearly everything else). so anyway, what I'd normally do is take a set of data like this: pH counts 3.8 968 5.0 1347 5.8 2867 6.6 9203 7.0 15817 7.4 20297 8.2 31916 9.2 35756 then fit this to a Boltzman sigmodial in Graphpad. Graphpad spits out a much longer set of vectors and also information about v50 and confidence intervals etc (if anyone is familiar with that software). This is what i'd like to replicate. It might be that I just don't know what i'm doing,so feel free to call me an idiot but any help is greatly appreciated! -JJ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
Re: [R] identify object that causes Error in loadNamespace(name) : there is no package called ‘R.utils’
On 13-04-24 11:08 AM, Liviu Andronic wrote: Dear Duncan, On Wed, Apr 24, 2013 at 3:03 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: A better approach is to *never* save and load workspaces unless you know exactly what is in them. Always reply no to the question about saving your workspace (or set that as the default). If you accidentally end up with a workspace being loaded, delete it. I must admit that I'm a bit surprised by this. I was always under the impression that saving/restoring workspaces was the proper workflow in R. If you use R interactively (e.g., not by running scripts), how else would you store your data, intermediary results, etc., while working on a project? Am I missing something? I think Michael and Hadley have already given good advice. I'll just add one thing: If you use R interactively, you should still be running scripts, just running them within an interactive session. That way you have a record of what you did, and can back up (by starting over and running everything up to a known good spot) to make corrections. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Trouble Computing Type III SS in a Cox Regression using drop1 and Anova
Hello All, Am having some trouble computing Type III SS in a Cox Regression using either drop1 or Anova from the car package. Am hoping that people will take a look to see if they can tell what's going on. Here is my R code: cox3grp - subset(survData, Treatment %in% c(DC, DA, DO), c(PTNO, Treatment, PFS_CENSORED, PFS_MONTHS, AGE, PS2)) cox3grp - droplevels(cox3grp) str(cox3grp) coxCV - coxph(Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data=cox3grp, method = efron) coxCV drop1(coxCV, test=Chisq) require(car) Anova(coxCV, type=III) And here are my results: cox3grp - subset(survData, + Treatment %in% c(DC, DA, DO), + c(PTNO, Treatment, PFS_CENSORED, PFS_MONTHS, AGE, PS2)) cox3grp - droplevels(cox3grp) str(cox3grp) 'data.frame': 227 obs. of 6 variables: $ PTNO: int 1195997 104625 106646 1277507 220506 525343 789119 817160 824224 82632 ... $ Treatment : Factor w/ 3 levels DC,DA,DO: 1 1 1 1 1 1 1 1 1 1 ... $ PFS_CENSORED: int 1 1 1 0 1 1 1 1 0 1 ... $ PFS_MONTHS : num 1.12 8.16 6.08 1.35 9.54 ... $ AGE : num 72 71 80 65 72 60 63 61 71 70 ... $ PS2 : Ord.factor w/ 2 levels YesNo: 2 2 2 2 2 2 2 2 2 2 ... coxCV - coxph(Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data=cox3grp, method = efron) coxCV Call: coxph(formula = Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data = cox3grp, method = efron) coef exp(coef) se(coef) z p AGE0.00492 1.005 0.00789 0.624 0.530 PS2.L -0.34523 0.708 0.14315 -2.412 0.016 Likelihood ratio test=5.66 on 2 df, p=0.0591 n= 227, number of events= 198 drop1(coxCV, test=Chisq) Single term deletions Model: Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2 DfAICLRT Pr(Chi) none1755.2 AGE 1 1753.6 0.3915 0.53151 PS2 1 1758.4 5.2364 0.02212 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 require(car) Anova(coxCV, type=III) Analysis of Deviance Table (Type III tests) LR Chisq Df Pr(Chisq) AGE 0.3915 10.53151 PS2 5.2364 10.02212 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Both drop1 and Anova give me a different p-value than I get from coxph for both my two-level ps2 variable and for age. This is not what I would expect based on experience using SAS to conduct similar analyses. Indeed SAS consistently produces the same p-values. Namely the ones I get from coxph. My sense is that I'm probably misusing R in some way but I'm not sure what I'm likely to be doing wrong. SAS prodcues Wald Chi-Square results for its type III tests. Maybe that has something to do with it. Ideally, I'd like to get type III values that match those from coxph. If anyone could help me understand better, that would be greatly appreciated. Thanks, Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [r] Problems with Fortran calls when loaded a dll compiled with gfortran-4 Cygwin 4.5.3
Dear users of R I have a subroutine in Fortran95, compiled to a DLL with gfortran in Cygwin 4.5.3. The subroutine is: subroutine MyPBP( S, p, N ) ! Expose subroutine rtest to users of this DLL !DEC$ ATTRIBUTES DLLEXPORT, C, REFERENCE, ALIAS: mypbp_ ::mypbp ! This function computes the Poisson-Binomial distribution ! of size N using p double precision, intent(inout) :: S(N+1) double precision, intent(in) :: p(N) integer, intent(in) :: N double precision :: X(N+1) integer i, j !X=0 !S=0 X(1) = 1 - p(1) X(2) = p(1) do i = 2, N S(1) = X(1)*(1-p(i)) do j = 2,i S(j) = X(j-1)*p(i) + X(j)*(1-p(i)) end do S(i+1) = X(i)*p(i) X = S if (i == N) then S = X end if end do end subroutine MyPBP and it is saved into Mango.f95 I compile it from the bash shell using: gfortran-4 c- Mango.f95 and gfortran-4 -shared -o Mango.dll Mango.o I am on a Windows machine running Windows 7 with Intel i7. I load the dll in a 32-bit R by dyn.load(Mango.dll). Using getLoadedDLLs I can see the DLL. However, is.loaded(Mango.dll) = FALSE. In addition, R stop responding when I try .Fortran(MyPBP, as.numeric(S), as.numeric(p), as.integer(N)), where N-5, S-array(0,N+1) and p- c(0.1, 0.2, 0.5, 0.8, 0.9). What am I doing wrong? Any ideas, thoughts and/or comments are highly appreciated. Jens [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RDA permutest envfit
Dear all, I did a RDA and when I looked to the signification of the test with permutest, the output was non-significant. But when I used the envfit function, some of the vectors are significant. All the test's conditions are respected. What it means? Is it an error in the script? Commands and output: permutest(rda.ind, perm=999, first=TRUE) Permutation test for rda Call: rda(formula = x ~ ARH_frs + CON_frs + PRP_frs + RUI_frs + VAM_frs, data = z) Permutation test for first constrained eigenvalue Pseudo-F:     4.093568 (with 1, 10 Degrees of Freedom) Significance:   0.413 Based on 999 permutations under reduced model. fit - envfit(rda.ind, z, perm = 999, display = lc) fit ***VECTORS        RDA1    RDA2   r2 Pr(r)   VAM_frs  0.145281 -0.989390 0.2388  0.147   ARH_frs -0.876494 -0.481413 0.6127  0.002 ** CON_frs  0.904278  0.426944 0.4846  0.013 *  PRP_frs -0.997634  0.068755 0.9433  0.001 *** RUI_frs -0.648512 -0.761204 0.6243  0.004 ** --- Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1 P values based on 999 permutations.  Rémi Lesmerises, biol. M.Sc., Candidat Ph.D. en Biologie Université du Québec à Rimouski remilesmeri...@yahoo.ca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sum up column values according to row id
Dear All, here a problem I think many of you can solve in few minutes. I have a dataframe which contains values of plot id, diameters, heigths and basal area of trees, thus columns names are: id | dbh | h | g head(ipso, n=10)id dbh h g 1 FPE0164 36 13.62 0.10178760 2 FPE0164 31 12.70 0.07547676 21 FPE1127 57 18.85 0.25517586 13 FPE1127 39 15.54 0.11945906 12 FPE1127 34 14.78 0.09079203 6 FPE1127 32 15.12 0.08042477 5 FPE1127 28 14.13 0.06157522 15 FPE1127 27 13.50 0.05725553 19 FPE1127 25 13.28 0.04908739 11 FPE1127 19 11.54 0.02835287 from here I need to calculate the mean of the six greater g_ith for each id_ith. The clauses are that: if length(id) =6 do the mean of the first six greaters g else do the mean of all the g_ith in the id_ith (in head print above e.g. for the id==FPE0164 do the mean of just these two values of g). The g are already ordered by id ascending and g descending using: ipso - ipso[with(ipso, order(ipso$id, -ipso$g)), ] # Order for id ascending and g descending I tried a lot of for loops and tapply() without results. Can anyone help me to solve this? Thanks for your attention Best whishes Matteo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R Interactive Mode
Dear all, We are doing some research about the time series analysis of NDVI, and we found the NDVITS package which is a very great tool. Unfortunately when we run it, after TimeSeriesAnalysis it asks to enter Village or Country. library(ndvits, lib.loc=/home/vahe/R/i686-pc-linux-gnu-library/2.15) ndvidirectory=paste(system.file(extdata/VITO_Mzimba, package=ndvits), /, sep=) region=Mzimba Ystart=2004 Yend=2006 shape=SLP_Mzimba shapedir=paste(system.file(extdata/shape, package=ndvits), /, sep=) outfile = mzimbaTS2.txt outfile2 = MzimbaTS2.pdf outfiel3 = my.pdf signal = TimeSeriesAnalysis(shape, shapedir, ndvidirectory, region, Ystart, Yend, outfile, outfile2) How it is possible to call the package by default indicating Village option /we don't want to enter the parameter and don't want to change anything in the code which is quite difficult/. Thank you in advance, Hrach -- Hrachya Astsatryan Head of HPC Laboratory, Institute for Informatics and Automation Problems, National Academy of Sciences of the Republic of Armenia 1, P. Sevak str., Yerevan 0014, Armenia t: 374 10 284780 f: 374 10 285812 e: hr...@sci.am skype: tighra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] about the loglm and glm---Re:Re: Regression on stratified count data
Hi,Achim: Can all the analysis you mentioned via loglm be performed via glm(...,family=poisson)? Many thanks. At 2013-04-24 19:37:10,Achim Zeileis achim.zeil...@uibk.ac.at wrote: On Wed, 24 Apr 2013, meng wrote: Hi all: For stratified count data,how to perform regression analysis? My data: age case oc count 1 1 121 1 1 226 1 2 117 1 2 259 2 1 118 2 1 288 2 2 1 7 2 2 2 95 age: 1:40y 2:40y case: 1:patient 2:health oc: 1:use drug 2:not use drug My purpose: Anaysis whether case and oc are correlated, and age is a stratified variable. My solution: 1,Mantel-Haenszel test by using function mantelhaen.test 2,loglinear regression by using function glm(count~case*oc,family=poisson).But I don't know how to handle variable age,which is the stratified variable. Instead of using glm(family = poisson) it is also convenient to use loglm() from package MASS for the associated convenience table. The code below shows how to set up the contingency table, visualize it using package vcd, and then fit two models using loglm. The models considered are conditional independence of case and drug given age and the no three-way interaction already suggested by Peter. Both models are also accompanied by visualizations of the residuals. ## contingency table with nice labels tab - expand.grid(drug = 1:2, case = 1:2, age = 1:2) tab$count - c(21, 26, 17, 59, 18, 88, 7, 95) tab$age - factor(tab$age, levels = 1:2, labels = c(40, 40)) tab$case - factor(tab$case, levels = 1:2, labels = c(patient, healthy)) tab$drug - factor(tab$drug, levels = 1:2, labels = c(yes, no)) tab - xtabs(count ~ age + drug + case, data = tab) ## visualize case explained by drug given age library(vcd) mosaic(case ~ drug | age, data = tab, split_vertical = c(TRUE, TRUE, FALSE)) ## test wheter drug and case are independent given age m1 - loglm(~ age/(drug + case), data = tab) m1 ## visualize corresponding residuals from independence model mosaic(case ~ drug | age, data = tab, split_vertical = c(TRUE, TRUE, FALSE), residuals_type = deviance, gp = shading_hcl, gp_args = list(interpolate = 1.2)) mosaic(case ~ drug | age, data = tab, split_vertical = c(TRUE, TRUE, FALSE), residuals_type = pearson, gp = shading_hcl, gp_args = list(interpolate = 1.2)) ## test whether there is no three-way interaction ## (i.e., dependence of case on drug is the same for both age groups) m2 - loglm(~ (age + drug + case)^2, data = tab) m2 ## visualization (with default pearson residuals) mosaic(case ~ drug | age, data = tab, expected = ~ (age + drug + case)^2, split_vertical = c(TRUE, TRUE, FALSE), gp = shading_hcl, gp_args = list(interpolate = 1.2)) Many thanks for your help. My best. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching observations and ranking
Hi, May be this helps: As you wanted to match only from row3 onwards to row2, the corresponding values on row1 and row2 were set to NA. dat1- read.table(text= S.No AB001A AB0002A AB362 P1 -/- C/C A/A P2 C/C C/C A/A 3 C/C C/C A/A 4 C/C C/C A/A 5 C/C C/C A/A 6 C/C C/C A/A 7 C/C C/C A/A 8 -/- -/- -/- 9 C/C C/C A/A 10 C/C C/C A/A 11 -/- C/C A/A 12 C/C C/C A/A 13 C/C C/C A/A 14 C/C C/C A/A 15 C/C -/- A/A 16 -/- C/C A/A 17 A/A A/C A/A 18 C/A A/A A/A ,sep=,header=TRUE,stringsAsFactors=FALSE) dat2-cbind(dat1,(1*mapply(==,dat1[,-1],dat1[2,-1]))) names(dat2)[duplicated(names(dat2))]- paste0(names(dat2)[duplicated(names(dat2))],_1) library(plyr) dat3-mutate(dat2,SUM=rowSums(cbind(AB001A_1,AB0002A_1,AB362_1)), MATCH=(SUM/3)*100) dat3[1:2,5:9]-NA res-mutate(dat3,RANK=rank(MATCH,ties.method=min)) head(res) # S.No AB001A AB0002A AB362 AB001A_1 AB0002A_1 AB362_1 SUM MATCH RANK #1 P1 -/- C/C A/A NA NA NA NA NA 17 #2 P2 C/C C/C A/A NA NA NA NA NA 18 #3 3 C/C C/C A/A 1 1 1 3 100 7 #4 4 C/C C/C A/A 1 1 1 3 100 7 #5 5 C/C C/C A/A 1 1 1 3 100 7 #6 6 C/C C/C A/A 1 1 1 3 100 7 A.K. Hi Arun, Thank you very much for your help in solving my problem, S. No AB001A AB0002A AB362 AB001A AB0002A AB362 SUM %Match Rank P1 -/- C/C A/A P 2 C/C C/C A/A 3 C/C C/C A/A 4 C/C C/C A/A 5 C/C C/C A/A 6 C/C C/C A/A 7 C/C C/C A/A 8 -/- -/- -/- 9 C/C C/C A/A 10 C/C C/C A/A 11 -/- C/C A/A 12 C/C C/C A/A 13 C/C C/C A/A 14 C/C C/C A/A 16 C/C -/- A/A Actually i want to match observation from 3 to 16 with the value in p2 (i.e 3 with p2, 4 with p2, 5 with p2 etc), if they match i would like to give value 1 and store it in corresponding dummy variable i.e. AB001A and i would like to do samething for remaining vars too and storing in their dummy vars. Finally i want make sum of all the matched (i.e. 1 score) in each row and calculate percentage of match and then rank. This what i want, sorry for not expressing my problem exactly in understandable way. Hi to all bloggers, my data looks like this, S. No AB001A AB0002A AB362 VAR1 VAR2 VAR3 SUM %Match Rank 1 -/- C/C A/A 2 C/C C/C A/A 3 C/C C/C A/A 4 C/C C/C A/A 5 C/C C/C A/A 6 C/C C/C A/A 7 C/C C/C A/A 8 -/- -/- -/- 9 C/C C/C A/A 10 C/C C/C A/A 11 -/- C/C A/A 12 C/C C/C A/A 13 C/C C/C A/A 14 C/C C/C A/A 16 C/C -/- A/A 17 -/- C/C A/A 18 C/C C/C A/A 19 C/C C/C A/A I want to match obs 3 with obs 2 if it exactly matched then score will be 1 else 0, that will be stored in var1 for AB001a, in var2 for ab0002a and in var3 for ab362 and i want to calculate sum of all the 1's and observation match percent and their rank (top ten matchers), I did this successfully in excel but it took me lot of time, i used if condition in excel like (=if(A3=A$2,1,0) and then i dragged among all obs and i did sum of all obs, their
[R] R cannot find the path on my mac
Hi I am really sorry for this probably quite simple question. I am new to R, and I am running a pipeline that has already been made. All I have to do is give the paths for different folders, where the pipeline can find the files with my data. But every time I try to run the pipeline it returns with the message, that it cannot find the file. And I really don't know why. I have found the path by going to the folder and finding the info about the folder, where the location of this is stated. I copy-paste this into the source command. The path for the folder is: When found with the info about the folder: /Users/gban/Desktop/Methylation analysis/MethylationDataAnalysis But I think it should be: /Users/gban/Desktop/Methylation analysis/MethylationDataAnalysis/1.Normalization_raw_data/ as it is the folder 1.Normalization_raw_data where my data are in. I have tried both paths but it returns with this message every time: source('/Users/gban/Desktop/Methylation analysis/MethylationDataAnalysis/1.Normalization_raw_data/pipelineIlluminaMethylation.main.R') Fejl i file(filename, r, encoding = encoding) : cannot open the connection In addition: Advarselsbesked: In file(filename, r, encoding = encoding) : cannot open file '/Users/gban/Desktop/Methylation analysis/MethylationDataAnalysis/1. 450K_pipeline_Nov2012release./SRC/loadMethylumi2.R': No such file or directory Do you have any idea why? I know this is probably more a question of how to type the path on a mac, than it is a question of how R works. But I am really frustrated about this, and thought it might be possible for you to help? Thank you in advance, and again sorry for the quite stupid question. Best Gitte Brinch Andersen E-mail: gitt...@hum-gen.au.dk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regression and FMMs with flexmix
I am repeating this because it seems that some people think it is important to reveal your identity I don;t understand why this is so important. Hopefuly now this list will be helpful. Could someone please assist with this I am trying to understand how to use the flexmix package, I have read the Leisch paper but am very unclear what is needed for the M-step driver. I am just fitting a simple linear regression model. The documentation is far from clear what the FLXmclust function does, but, it in principle could do all I need, however, I do not get sentible results as if I try the following the result is poor: x-c() for(i in 0:99){x$y[2*i]=(0+i);x$x[2*i]=i;x$x[2*i+1]=i;x$y[2*i+1]=i+1000;x$g[2*i]=1;x$g[2*i+1]=2} m1-flexmix(y~x ,data=x,k=2) table(x$g,m1@cluster) 1 2 1 25 74 2 67 33 It all depends on the randomised starting values. So I think I need a better driver, but, I cannot find a spec for what I have to do in the driver. Where is FLXmclust documented? can anyone assist? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching observations and ranking
Just to add: If your original dataset have only few columns, then you can try this too: res1-within(mutate(dat1,AB001A_1=1*(AB001A==AB001A[2]),AB0002A_1=1*(AB0002A==AB0002A[2]),AB362_1=1*(AB362==AB362[2]),SUM=rowSums(cbind(AB001A_1,AB0002A_1,AB362_1)),MATCH=(SUM/3)*100),{MATCH[1:2]-NA;RANK=rank(MATCH,ties.method=min);SUM[1:2]-NA;AB001A_1[1:2]-NA;AB0002A_1[1:2]-NA;AB362_1[1:2]-NA}) identical(res,res1) #[1] TRUE A.K. - Original Message - From: arun smartpink...@yahoo.com To: R help r-help@r-project.org Cc: Sent: Wednesday, April 24, 2013 10:09 AM Subject: Re: matching observations and ranking Hi, May be this helps: As you wanted to match only from row3 onwards to row2, the corresponding values on row1 and row2 were set to NA. dat1- read.table(text= S.No AB001A AB0002A AB362 P1 -/- C/C A/A P2 C/C C/C A/A 3 C/C C/C A/A 4 C/C C/C A/A 5 C/C C/C A/A 6 C/C C/C A/A 7 C/C C/C A/A 8 -/- -/- -/- 9 C/C C/C A/A 10 C/C C/C A/A 11 -/- C/C A/A 12 C/C C/C A/A 13 C/C C/C A/A 14 C/C C/C A/A 15 C/C -/- A/A 16 -/- C/C A/A 17 A/A A/C A/A 18 C/A A/A A/A ,sep=,header=TRUE,stringsAsFactors=FALSE) dat2-cbind(dat1,(1*mapply(==,dat1[,-1],dat1[2,-1]))) names(dat2)[duplicated(names(dat2))]- paste0(names(dat2)[duplicated(names(dat2))],_1) library(plyr) dat3-mutate(dat2,SUM=rowSums(cbind(AB001A_1,AB0002A_1,AB362_1)), MATCH=(SUM/3)*100) dat3[1:2,5:9]-NA res-mutate(dat3,RANK=rank(MATCH,ties.method=min)) head(res) # S.No AB001A AB0002A AB362 AB001A_1 AB0002A_1 AB362_1 SUM MATCH RANK #1 P1 -/- C/C A/A NA NA NA NA NA 17 #2 P2 C/C C/C A/A NA NA NA NA NA 18 #3 3 C/C C/C A/A 1 1 1 3 100 7 #4 4 C/C C/C A/A 1 1 1 3 100 7 #5 5 C/C C/C A/A 1 1 1 3 100 7 #6 6 C/C C/C A/A 1 1 1 3 100 7 A.K. Hi Arun, Thank you very much for your help in solving my problem, S. No AB001A AB0002A AB362 AB001A AB0002A AB362 SUM %Match Rank P1 -/- C/C A/A P 2 C/C C/C A/A 3 C/C C/C A/A 4 C/C C/C A/A 5 C/C C/C A/A 6 C/C C/C A/A 7 C/C C/C A/A 8 -/- -/- -/- 9 C/C C/C A/A 10 C/C C/C A/A 11 -/- C/C A/A 12 C/C C/C A/A 13 C/C C/C A/A 14 C/C C/C A/A 16 C/C -/- A/A Actually i want to match observation from 3 to 16 with the value in p2 (i.e 3 with p2, 4 with p2, 5 with p2 etc), if they match i would like to give value 1 and store it in corresponding dummy variable i.e. AB001A and i would like to do samething for remaining vars too and storing in their dummy vars. Finally i want make sum of all the matched (i.e. 1 score) in each row and calculate percentage of match and then rank. This what i want, sorry for not expressing my problem exactly in understandable way. Hi to all bloggers, my data looks like this, S. No AB001A AB0002A AB362 VAR1 VAR2 VAR3 SUM %Match Rank 1 -/- C/C A/A 2 C/C C/C A/A 3 C/C C/C A/A 4 C/C C/C A/A 5 C/C C/C A/A 6 C/C C/C A/A 7 C/C C/C A/A 8 -/- -/- -/- 9 C/C C/C A/A 10 C/C C/C A/A 11 -/- C/C A/A 12 C/C C/C A/A 13 C/C C/C A/A 14 C/C C/C A/A 16 C/C -/- A/A 17
[R] puzzles of Finance data programming with R
Hi This is Candice from Newcastle University. I am now coming across some problems with the programming of R. It is like this. I am asked to run three models based on a sample of CPI: Random walk Recursive AR(4) Rolling AR(4) 1.Based on the papers I find, I think the random walk may be similar to an AR(1) model (if with drift). But I am not sure whether this is right? 2.For the recursive model, I found a command named filter. m4=filter(Z,filter=c(c1,c2,c3,c4),method=recursive,side=1) However, since it is not a simulating process, I cannot define the coefficients. But this type of command needs specific coefs. And I also try another code like m=filter(Z,rep(1,5),method=recursive,side=1). However, this command is applied to MA models.By doing rep(1,5), I think this means I define all the coefs to be 1. Additionally, I tried the ?fiter. But there are not that relevant examples. 3.For the rolling AR(4) model. In fact, I have done a rollapply command for 12 years rolling windows for previous questions. I think the methodology for the merge command should be used. Since rolling is a bit like a process of merging and moving on. But I do not know, in order to run a rolling AR(4) model, which command should I pick? PS. If any additional packages like the TSA, need to be installed, please tell me. Thank you very much! I am really looking forward to hear from you. Best Regards Candice [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fonts not rendering during plot
Hi, I am having a problem where sometimes fonts for text in plots don't get rendered-the text just shows up as boxes. If I call R from a certain directory the problem goes away, otherwise the fonts don't render (but plots get made). In both cases, my PANGO_RC and FONTCONFIG_FILE do not change… Thank you, -George __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Residuals for fracdiff
Hi, I am using the fracdiff package to estimate the parameters of an ARFIMA(1,d,1) model. I would also like to get the residuals of the series. I have seen another post about this (below). However, being still quite at the beginner level in terms of R, I did not quite understand how this worked. I also read through the fracdiff package manual with no success to find any help with the residuals. I would be very grateful for any help. Thank you! Best regards, François Messer HEC, Faculty of Business and Economics University of Lausanne Message-id: 3c3d6095.5eb01...@lmttrading.com Date: Wed, 9 Jan 2002 07:57:31 + From: Susana Barbosa susanabarb...@novalis.fc.up.pt mailto:susanabarb...@novalis.fc.up.pt?Subject=Re:%20[R]%20How%20to%20obtain %20the%20series%20of%20residuals%20from%20fracdiff Subject: [R] How to obtain the series of residuals from fracdiff Hi I'm using fracdiff package to estimate the parameters of a fractionally-differenced ARIMA (p,d,q) model, and it works fine, but I wanted to have also the filtered series and the series of residuals. I understand these are calculated in the subroutine fdfilt, in the program fdcore.f, but I can't manage to get them out. Any suggestion would be much appreciated Thanks Susana Barbosa Hi Susana For fractional differencing you can use, e.g., something like fracdiff - function (x, d, N = 100) { n - 0:N w - gamma(-d+n)/(gamma(-d)*gamma(n+1)) y - filter(x, w, sides=1) return (y) } Adrian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] string size limits in RCurl
Hi All, I am running into what appears to be character size limit in a JSON string when trying retrieve data from either `curlPerform()` or `getURL()`. Here is non-reproducible code [1], but it should shed some light on the problem. # Note that .base.url is the basic url for the API, q is a query, user # is specified, etc. session = getCurlHandle() curl.opts - list(userpwd = paste(user, :, key, sep = ), httpheader = Content-Type: application/json) request - paste(.base.url, q, sep = ) txt - getURL(url = request, curl = session, .opts = curl.opts, write = basicTextGatherer()) or r = dynCurlReader() curlPerform(url = request, writefunction = r$update, curl = session, .opts = curl.opts) My guess is that the `update` or `value` functions in the `basicTextGather` or `dynCurlReader` text handler objects are having trouble with the large strings. In this example, `r$value()` will return a truncated string that is approximately 2 MB. The code given above will work fine for queries 2 MB. Note that I can easily do the following from the command line (or using `system()` in R), but writing to disc seems like a waste if I am doing the subsequent analysis in R. curl -v --header Content-Type: application/json --user username:register:passwd https://base.url.for.api/getdata/select+*+from+sometable stream.json where `file.json` is a roughly 14MB json string. I can read the string into R using either con - file(paste(.project.path, data/stream.json, sep = ), r) string - readLines(con) or directly to list as tmp - fromJSON(file = paste(.project.path, data/stream.json, sep = )) Any thoughts are very much appreciated. Note that I posted this same question/comment to StackOverflow and will happily provide any helpful suggestions to that list as well. Ryan [1] - Sorry for not providing reproducible code, but I'm dealing with a govt firewall. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [r] Problems with Fortran calls when loaded a dll compiled with gfortran-4 Cygwin 4.5.3
On 13-04-24 1:36 PM, Jens Olofsson wrote: Dear users of R I have a subroutine in Fortran95, compiled to a DLL with gfortran in Cygwin 4.5.3. We don't support Cygwin. You should use the gfortran in Rtools, and get R to set the command line options for you, either by putting the code in a package, or by using R CMD SHLIB Mango.f95. Duncan Murdoch The subroutine is: subroutine MyPBP( S, p, N ) ! Expose subroutine rtest to users of this DLL !DEC$ ATTRIBUTES DLLEXPORT, C, REFERENCE, ALIAS: mypbp_ ::mypbp ! This function computes the Poisson-Binomial distribution ! of size N using p double precision, intent(inout) :: S(N+1) double precision, intent(in) :: p(N) integer, intent(in) :: N double precision :: X(N+1) integer i, j !X=0 !S=0 X(1) = 1 - p(1) X(2) = p(1) do i = 2, N S(1) = X(1)*(1-p(i)) do j = 2,i S(j) = X(j-1)*p(i) + X(j)*(1-p(i)) end do S(i+1) = X(i)*p(i) X = S if (i == N) then S = X end if end do end subroutine MyPBP and it is saved into Mango.f95 I compile it from the bash shell using: gfortran-4 c- Mango.f95 and gfortran-4 -shared -o Mango.dll Mango.o I am on a Windows machine running Windows 7 with Intel i7. I load the dll in a 32-bit R by dyn.load(Mango.dll). Using getLoadedDLLs I can see the DLL. However, is.loaded(Mango.dll) = FALSE. In addition, R stop responding when I try .Fortran(MyPBP, as.numeric(S), as.numeric(p), as.integer(N)), where N-5, S-array(0,N+1) and p- c(0.1, 0.2, 0.5, 0.8, 0.9). What am I doing wrong? Any ideas, thoughts and/or comments are highly appreciated. Jens [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] substitute rename.vars {gdata}
is there a equivalent to rename.vars {gdata}? As I do not use read.xls i am not using library gdata for a package except this function Knut __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [r] Problems with Fortran calls when loaded a dll compiled with gfortran-4 Cygwin 4.5.3
Dear Duncan, I know this isn't a forum for Cygwin, but for R. Pls treat me as a noob and also remember I am on Windows. How should I use R CMD SHLIB as if I write that at the prompt I get the error: unexpected symbol in R CMD. I have mango.f95 in the working directory. //Jens On 24 April 2013 19:46, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 13-04-24 1:36 PM, Jens Olofsson wrote: Dear users of R I have a subroutine in Fortran95, compiled to a DLL with gfortran in Cygwin 4.5.3. We don't support Cygwin. You should use the gfortran in Rtools, and get R to set the command line options for you, either by putting the code in a package, or by using R CMD SHLIB Mango.f95. Duncan Murdoch The subroutine is: subroutine MyPBP( S, p, N ) ! Expose subroutine rtest to users of this DLL !DEC$ ATTRIBUTES DLLEXPORT, C, REFERENCE, ALIAS: mypbp_ ::mypbp ! This function computes the Poisson-Binomial distribution ! of size N using p double precision, intent(inout) :: S(N+1) double precision, intent(in) :: p(N) integer, intent(in) :: N double precision :: X(N+1) integer i, j !X=0 !S=0 X(1) = 1 - p(1) X(2) = p(1) do i = 2, N S(1) = X(1)*(1-p(i)) do j = 2,i S(j) = X(j-1)*p(i) + X(j)*(1-p(i)) end do S(i+1) = X(i)*p(i) X = S if (i == N) then S = X end if end do end subroutine MyPBP and it is saved into Mango.f95 I compile it from the bash shell using: gfortran-4 c- Mango.f95 and gfortran-4 -shared -o Mango.dll Mango.o I am on a Windows machine running Windows 7 with Intel i7. I load the dll in a 32-bit R by dyn.load(Mango.dll). Using getLoadedDLLs I can see the DLL. However, is.loaded(Mango.dll) = FALSE. In addition, R stop responding when I try .Fortran(MyPBP, as.numeric(S), as.numeric(p), as.integer(N)), where N-5, S-array(0,N+1) and p- c(0.1, 0.2, 0.5, 0.8, 0.9). What am I doing wrong? Any ideas, thoughts and/or comments are highly appreciated. Jens [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the joy of spreadsheets (off-topic)
One might wonder if the Excel error was indeed THAT or perhaps a way to get the desired results, give the other issues in their analysis? On Wed, Apr 24, 2013 at 11:58 AM, peter dalgaard pda...@gmail.com wrote: In case you haven't noticed, this is making the rounds in the media, including a handful of references to R. See e.g. http://news.slashdot.org/story/13/04/17/0215211/excel-error-contributes-to-problems-with-austerity-study I suppose we can't fortune()'ify anonymous quotes, but I kind of like this exchange: Bacon Bits: SPSS and R are very good at statistical analysis. Quantrix, MapleSoft, IBM Algorithmics, and other software is for financial data modeling. None of those is particularly appropriate for sharing data in a useful format with peers. Excel is. Hatta: R is extremely appropriate for sharing data in a useful format with peers. It's completely free for one. But more importantly, it saves every single step of your analysis. Send someone an Excel file, and who knows what they've done to the data. Send someone your R project directory and they can see exactly what you did. The problem with sending R files to your peers isn't that the R files aren't useful. It's that your peers aren't. On Apr 16, 2013, at 19:25 , Sarah Goslee wrote: Given that we occasionally run into problems with comparing Excel results to R results, and other spreadsheet-induced errors, I thought this might be of interest. http://www.nextnewdeal.net/rortybomb/researchers-finally-replicated-reinhart-rogoff-and-there-are-serious-problems The punchline: If this error turns out to be an actual mistake Reinhart-Rogoff made, well, all I can hope is that future historians note that one of the core empirical points providing the intellectual foundation for the global move to austerity in the early 2010s was based on someone accidentally not updating a row formula in Excel. Ouch. (Note: I know nothing about the site, the author of the article, or the study in question. I was pointed to it by someone else. But if true: highly problematic.) Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Distance matrices Combinations
Dear UseRs, MY PROBLEM IS A SMALL PIECE OF A REAL BIG AND A COMPLICATED PROBLEM. IF I DELIBERATE IN A VERY SIMPLE WAY THEN ALL I WANT IS TO PUT ALL THE POSSIBLE COMBINATIONS OF 75 DISTANCE MATRICES (BY TAKING 4 MATRICES, MORE COMMONLY 75C4), in the following equation. t-as.matrix((MAT1)^2+(MAT2)^2+(MAT3)^2+(MAT4)^2+,upper=T,diag=T)) Then 1215450 values of t(one for each combination) should one by one be inculcated in the following loop(to calculate the o value) and in the end want those 10 combinations of distance matrices which have lowest o values. e - vector(list, 124) w-sqrt(t) mat1-w for (i in 1:124){ r-matrix(sort(mat1[i,],index.return=TRUE)$ix,ncol=1) u-r[c(2,3,4,5,6),1] mata-m[,c(u)] ##(shifted) amata-apply(mata,1,mean) amata-data.frame(amata) aavg-as.matrix(amata, ncol=1) b-aavg e[[i]]-print(sum(abs(b-m[,i]))) } x-do.call(rbind,e) Y-x z - apply(Y,2,sd) o-mean(Y) Does my question make any scene? Thanks in advance Elisa [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum up column values according to row id
Something like this? mean6 - function(x) { if (length(x) 6) { mn - mean(x) } else { mn - mean(x[1:6]) } return(mn) } aggregate(g~id, ipso, mean6) - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Matteo Mura Sent: Wednesday, April 24, 2013 7:57 AM To: r-help@r-project.org Subject: [R] Sum up column values according to row id Dear All, here a problem I think many of you can solve in few minutes. I have a dataframe which contains values of plot id, diameters, heigths and basal area of trees, thus columns names are: id | dbh | h | g head(ipso, n=10)id dbh h g 1 FPE0164 36 13.62 0.10178760 2 FPE0164 31 12.70 0.07547676 21 FPE1127 57 18.85 0.25517586 13 FPE1127 39 15.54 0.11945906 12 FPE1127 34 14.78 0.09079203 6 FPE1127 32 15.12 0.08042477 5 FPE1127 28 14.13 0.06157522 15 FPE1127 27 13.50 0.05725553 19 FPE1127 25 13.28 0.04908739 11 FPE1127 19 11.54 0.02835287 from here I need to calculate the mean of the six greater g_ith for each id_ith. The clauses are that: if length(id) =6 do the mean of the first six greaters g else do the mean of all the g_ith in the id_ith (in head print above e.g. for the id==FPE0164 do the mean of just these two values of g). The g are already ordered by id ascending and g descending using: ipso - ipso[with(ipso, order(ipso$id, -ipso$g)), ] # Order for id ascending and g descending I tried a lot of for loops and tapply() without results. Can anyone help me to solve this? Thanks for your attention Best whishes Matteo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [r] Problems with Fortran calls when loaded a dll compiled with gfortran-4 Cygwin 4.5.3
On 13-04-24 1:51 PM, Jens Olofsson wrote: Dear Duncan, I know this isn't a forum for Cygwin, but for R. Pls treat me as a noob and also remember I am on Windows. How should I use R CMD SHLIB as if I write that at the prompt I get the error: unexpected symbol in R CMD. I have mango.f95 in the working directory. That's a command-line command, not something done with R. You can use it from your bash shell if you have R and the Rtools directories on your path, or from the Windows CMD shell. BTW, my comment wasn't trying to tell you to go to a Cygwin forum, it was telling you that Cygwin's gfortran is unsupported. You need to use the MinGW-64 one that we distribute if you want us to be able to help. Duncan Murdoch //Jens On 24 April 2013 19:46, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 13-04-24 1:36 PM, Jens Olofsson wrote: Dear users of R I have a subroutine in Fortran95, compiled to a DLL with gfortran in Cygwin 4.5.3. We don't support Cygwin. You should use the gfortran in Rtools, and get R to set the command line options for you, either by putting the code in a package, or by using R CMD SHLIB Mango.f95. Duncan Murdoch The subroutine is: subroutine MyPBP( S, p, N ) ! Expose subroutine rtest to users of this DLL !DEC$ ATTRIBUTES DLLEXPORT, C, REFERENCE, ALIAS: mypbp_ ::mypbp ! This function computes the Poisson-Binomial distribution ! of size N using p double precision, intent(inout) :: S(N+1) double precision, intent(in) :: p(N) integer, intent(in) :: N double precision :: X(N+1) integer i, j !X=0 !S=0 X(1) = 1 - p(1) X(2) = p(1) do i = 2, N S(1) = X(1)*(1-p(i)) do j = 2,i S(j) = X(j-1)*p(i) + X(j)*(1-p(i)) end do S(i+1) = X(i)*p(i) X = S if (i == N) then S = X end if end do end subroutine MyPBP and it is saved into Mango.f95 I compile it from the bash shell using: gfortran-4 c- Mango.f95 and gfortran-4 -shared -o Mango.dll Mango.o I am on a Windows machine running Windows 7 with Intel i7. I load the dll in a 32-bit R by dyn.load(Mango.dll). Using getLoadedDLLs I can see the DLL. However, is.loaded(Mango.dll) = FALSE. In addition, R stop responding when I try .Fortran(MyPBP, as.numeric(S), as.numeric(p), as.integer(N)), where N-5, S-array(0,N+1) and p- c(0.1, 0.2, 0.5, 0.8, 0.9). What am I doing wrong? Any ideas, thoughts and/or comments are highly appreciated. Jens [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with logkda.pari (in package untb)
Hello- I'm looking for help using the function logkda.pari. logkda.pari needs to access the program pari/gp (which of course I've installed), but I don't understand what commands I should use so that it recognizes that it is installed. Any help with this would be much appreciated. Cheers Jen Chase [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble Computing Type III SS in a Cox Regression using drop1 and Anova
Dear Paul, -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Paul Miller Sent: Wednesday, April 24, 2013 1:18 PM To: r-help@r-project.org Subject: [R] Trouble Computing Type III SS in a Cox Regression using drop1 and Anova Hello All, Am having some trouble computing Type III SS in a Cox Regression using either drop1 or Anova from the car package. Am hoping that people will take a look to see if they can tell what's going on. Here is my R code: . . . Both drop1 and Anova give me a different p-value than I get from coxph for both my two-level ps2 variable and for age. This is not what I would expect based on experience using SAS to conduct similar analyses. Indeed SAS consistently produces the same p-values. Namely the ones I get from coxph. My sense is that I'm probably misusing R in some way but I'm not sure what I'm likely to be doing wrong. SAS prodcues Wald Chi-Square results for its type III tests. Maybe that has something to do with it. Ideally, I'd like to get type III values that match those from coxph. If anyone could help me understand better, that would be greatly appreciated. You've answered your own question: The summary() output gives you Wald tests, and drop1() and Anova() give you LR tests. From ?Anova: test.statistic: ... for a Cox model, whether to calculate LR (partial-likelihood ratio) or Wald tests; ... Thus, if you want the Wald test, you can ask for it (though it escapes me why you prefer it to the LR test). I hope this helps, John --- John Fox Senator McMaster Professor of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum up column values according to row id
Hello, Note also that in the with(9 instruction there's some redundancy, th OP could simply do with(ipso, order(id, -g)) Hope this helps, Rui Barradas Em 24-04-2013 19:10, David Carlson escreveu: Something like this? mean6 - function(x) { if (length(x) 6) { mn - mean(x) } else { mn - mean(x[1:6]) } return(mn) } aggregate(g~id, ipso, mean6) - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Matteo Mura Sent: Wednesday, April 24, 2013 7:57 AM To: r-help@r-project.org Subject: [R] Sum up column values according to row id Dear All, here a problem I think many of you can solve in few minutes. I have a dataframe which contains values of plot id, diameters, heigths and basal area of trees, thus columns names are: id | dbh | h | g head(ipso, n=10)id dbh h g 1 FPE0164 36 13.62 0.10178760 2 FPE0164 31 12.70 0.07547676 21 FPE1127 57 18.85 0.25517586 13 FPE1127 39 15.54 0.11945906 12 FPE1127 34 14.78 0.09079203 6 FPE1127 32 15.12 0.08042477 5 FPE1127 28 14.13 0.06157522 15 FPE1127 27 13.50 0.05725553 19 FPE1127 25 13.28 0.04908739 11 FPE1127 19 11.54 0.02835287 from here I need to calculate the mean of the six greater g_ith for each id_ith. The clauses are that: if length(id) =6 do the mean of the first six greaters g else do the mean of all the g_ith in the id_ith (in head print above e.g. for the id==FPE0164 do the mean of just these two values of g). The g are already ordered by id ascending and g descending using: ipso - ipso[with(ipso, order(ipso$id, -ipso$g)), ] # Order for id ascending and g descending I tried a lot of for loops and tapply() without results. Can anyone help me to solve this? Thanks for your attention Best whishes Matteo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [r] Problems with Fortran calls when loaded a dll compiled with gfortran-4 Cygwin 4.5.3
Ok. I apologise for not understanding. So, I have installed R-tools. It changed my PATH-variable. I didn't installed Cygwin dlls as stated by http://cran.r-project.org/doc/manuals/R-admin.html#The-Windows-toolset. Instead my PATH-variable contains the path to the Cygwin dlls AFTER the path to R... So, I started RTerm (32-bit) and tried R CMD SHLIB mango.f95 and got the same error as earlier Error: unexpected symbol in R CMD. The same goes for RTerm (64-bit). Can you pls advice me on how to proceed? Sincerely Jens On 24 April 2013 20:08, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 13-04-24 1:51 PM, Jens Olofsson wrote: Dear Duncan, I know this isn't a forum for Cygwin, but for R. Pls treat me as a noob and also remember I am on Windows. How should I use R CMD SHLIB as if I write that at the prompt I get the error: unexpected symbol in R CMD. I have mango.f95 in the working directory. That's a command-line command, not something done with R. You can use it from your bash shell if you have R and the Rtools directories on your path, or from the Windows CMD shell. BTW, my comment wasn't trying to tell you to go to a Cygwin forum, it was telling you that Cygwin's gfortran is unsupported. You need to use the MinGW-64 one that we distribute if you want us to be able to help. Duncan Murdoch //Jens On 24 April 2013 19:46, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 13-04-24 1:36 PM, Jens Olofsson wrote: Dear users of R I have a subroutine in Fortran95, compiled to a DLL with gfortran in Cygwin 4.5.3. We don't support Cygwin. You should use the gfortran in Rtools, and get R to set the command line options for you, either by putting the code in a package, or by using R CMD SHLIB Mango.f95. Duncan Murdoch The subroutine is: subroutine MyPBP( S, p, N ) ! Expose subroutine rtest to users of this DLL !DEC$ ATTRIBUTES DLLEXPORT, C, REFERENCE, ALIAS: mypbp_ ::mypbp ! This function computes the Poisson-Binomial distribution ! of size N using p double precision, intent(inout) :: S(N+1) double precision, intent(in) :: p(N) integer, intent(in) :: N double precision :: X(N+1) integer i, j !X=0 !S=0 X(1) = 1 - p(1) X(2) = p(1) do i = 2, N S(1) = X(1)*(1-p(i)) do j = 2,i S(j) = X(j-1)*p(i) + X(j)*(1-p(i)) end do S(i+1) = X(i)*p(i) X = S if (i == N) then S = X end if end do end subroutine MyPBP and it is saved into Mango.f95 I compile it from the bash shell using: gfortran-4 c- Mango.f95 and gfortran-4 -shared -o Mango.dll Mango.o I am on a Windows machine running Windows 7 with Intel i7. I load the dll in a 32-bit R by dyn.load(Mango.dll). Using getLoadedDLLs I can see the DLL. However, is.loaded(Mango.dll) = FALSE. In addition, R stop responding when I try .Fortran(MyPBP, as.numeric(S), as.numeric(p), as.integer(N)), where N-5, S-array(0,N+1) and p- c(0.1, 0.2, 0.5, 0.8, 0.9). What am I doing wrong? Any ideas, thoughts and/or comments are highly appreciated. Jens [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-helphttps://stat.ethz.ch/mailman/**listinfo/r-help https://stat.**ethz.ch/mailman/listinfo/r-**helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the joy of spreadsheets (off-topic)
On Apr 24, 2013, at 20:01 , Thomas Adams wrote: One might wonder if the Excel error was indeed THAT or perhaps a way to get the desired results, give the other issues in their analysis? I think I'd reserve that suspicion for what they did with the NZ data: Growth for 1946-49: 7.7, 11.9, −9.9, and 10.8 --1951: -7.6 Those were the 5 years with Debt/GDP 90%. Obviously, the economy was going up and down like a yoyo. So they retain only the last value, miscode it as -7.9, and give that one year the same weight as decades of positive growth in other countries... On Wed, Apr 24, 2013 at 11:58 AM, peter dalgaard pda...@gmail.com wrote: In case you haven't noticed, this is making the rounds in the media, including a handful of references to R. See e.g. http://news.slashdot.org/story/13/04/17/0215211/excel-error-contributes-to-problems-with-austerity-study I suppose we can't fortune()'ify anonymous quotes, but I kind of like this exchange: Bacon Bits: SPSS and R are very good at statistical analysis. Quantrix, MapleSoft, IBM Algorithmics, and other software is for financial data modeling. None of those is particularly appropriate for sharing data in a useful format with peers. Excel is. Hatta: R is extremely appropriate for sharing data in a useful format with peers. It's completely free for one. But more importantly, it saves every single step of your analysis. Send someone an Excel file, and who knows what they've done to the data. Send someone your R project directory and they can see exactly what you did. The problem with sending R files to your peers isn't that the R files aren't useful. It's that your peers aren't. On Apr 16, 2013, at 19:25 , Sarah Goslee wrote: Given that we occasionally run into problems with comparing Excel results to R results, and other spreadsheet-induced errors, I thought this might be of interest. http://www.nextnewdeal.net/rortybomb/researchers-finally-replicated-reinhart-rogoff-and-there-are-serious-problems The punchline: If this error turns out to be an actual mistake Reinhart-Rogoff made, well, all I can hope is that future historians note that one of the core empirical points providing the intellectual foundation for the global move to austerity in the early 2010s was based on someone accidentally not updating a row formula in Excel. Ouch. (Note: I know nothing about the site, the author of the article, or the study in question. I was pointed to it by someone else. But if true: highly problematic.) Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R cannot find the path on my mac
The why is that you haven't got the path correct yet. Assuming you are running R.app or R64.app (running the R Console), one way to find the correct path is to type file.choose() Navigate to your file and choose it. It will then tell you the correct path. You can also do things like source( file.path( 'path.to.folder', 'filename' ) ) if you want to keep the folder name and the filename separate. You could also open Terminal (in the Utilities folder) and type find . -name filename and it should tell you the path relative to your home folder. This assumes that the file is somewhere below /Users/gban -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 4/24/13 8:46 AM, Gitte Brinch Andersen gitt...@hum-gen.au.dk wrote: Hi I am really sorry for this probably quite simple question. I am new to R, and I am running a pipeline that has already been made. All I have to do is give the paths for different folders, where the pipeline can find the files with my data. But every time I try to run the pipeline it returns with the message, that it cannot find the file. And I really don't know why. I have found the path by going to the folder and finding the info about the folder, where the location of this is stated. I copy-paste this into the source command. The path for the folder is: When found with the info about the folder: /Users/gban/Desktop/Methylation analysis/MethylationDataAnalysis But I think it should be: /Users/gban/Desktop/Methylation analysis/MethylationDataAnalysis/1.Normalization_raw_data/ as it is the folder 1.Normalization_raw_data where my data are in. I have tried both paths but it returns with this message every time: source('/Users/gban/Desktop/Methylation analysis/MethylationDataAnalysis/1.Normalization_raw_data/pipelineIllumin aMethylation.main.R') Fejl i file(filename, r, encoding = encoding) : cannot open the connection In addition: Advarselsbesked: In file(filename, r, encoding = encoding) : cannot open file '/Users/gban/Desktop/Methylation analysis/MethylationDataAnalysis/1. 450K_pipeline_Nov2012release./SRC/loadMethylumi2.R': No such file or directory Do you have any idea why? I know this is probably more a question of how to type the path on a mac, than it is a question of how R works. But I am really frustrated about this, and thought it might be possible for you to help? Thank you in advance, and again sorry for the quite stupid question. Best Gitte Brinch Andersen E-mail: gitt...@hum-gen.au.dk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [r] Problems with Fortran calls when loaded a dll compiled with gfortran-4 Cygwin 4.5.3
On Apr 24, 2013, at 7:46 PM, Jens Olofsson jens.olofs...@gmail.com wrote: Ok. I apologise for not understanding. So, I have installed R-tools. It changed my PATH-variable. I didn't installed Cygwin dlls as stated by http://cran.r-project.org/doc/manuals/R-admin.html#The-Windows-toolset. Instead my PATH-variable contains the path to the Cygwin dlls AFTER the path to R... So, I started RTerm (32-bit) and tried R CMD SHLIB mango.f95 Repeating what Duncan said -- R CMD SHLIB is to be done outside of R, not within R. Some recent discussion also suggests its perhaps easier to do this with RStudio + devtools as part of a package than as a standalone shared object. MW and got the same error as earlier Error: unexpected symbol in R CMD. The same goes for RTerm (64-bit). Can you pls advice me on how to proceed? Sincerely Jens On 24 April 2013 20:08, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 13-04-24 1:51 PM, Jens Olofsson wrote: Dear Duncan, I know this isn't a forum for Cygwin, but for R. Pls treat me as a noob and also remember I am on Windows. How should I use R CMD SHLIB as if I write that at the prompt I get the error: unexpected symbol in R CMD. I have mango.f95 in the working directory. That's a command-line command, not something done with R. You can use it from your bash shell if you have R and the Rtools directories on your path, or from the Windows CMD shell. BTW, my comment wasn't trying to tell you to go to a Cygwin forum, it was telling you that Cygwin's gfortran is unsupported. You need to use the MinGW-64 one that we distribute if you want us to be able to help. Duncan Murdoch //Jens On 24 April 2013 19:46, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 13-04-24 1:36 PM, Jens Olofsson wrote: Dear users of R I have a subroutine in Fortran95, compiled to a DLL with gfortran in Cygwin 4.5.3. We don't support Cygwin. You should use the gfortran in Rtools, and get R to set the command line options for you, either by putting the code in a package, or by using R CMD SHLIB Mango.f95. Duncan Murdoch The subroutine is: subroutine MyPBP( S, p, N ) ! Expose subroutine rtest to users of this DLL !DEC$ ATTRIBUTES DLLEXPORT, C, REFERENCE, ALIAS: mypbp_ ::mypbp ! This function computes the Poisson-Binomial distribution ! of size N using p double precision, intent(inout) :: S(N+1) double precision, intent(in) :: p(N) integer, intent(in) :: N double precision :: X(N+1) integer i, j !X=0 !S=0 X(1) = 1 - p(1) X(2) = p(1) do i = 2, N S(1) = X(1)*(1-p(i)) do j = 2,i S(j) = X(j-1)*p(i) + X(j)*(1-p(i)) end do S(i+1) = X(i)*p(i) X = S if (i == N) then S = X end if end do end subroutine MyPBP and it is saved into Mango.f95 I compile it from the bash shell using: gfortran-4 c- Mango.f95 and gfortran-4 -shared -o Mango.dll Mango.o I am on a Windows machine running Windows 7 with Intel i7. I load the dll in a 32-bit R by dyn.load(Mango.dll). Using getLoadedDLLs I can see the DLL. However, is.loaded(Mango.dll) = FALSE. In addition, R stop responding when I try .Fortran(MyPBP, as.numeric(S), as.numeric(p), as.integer(N)), where N-5, S-array(0,N+1) and p- c(0.1, 0.2, 0.5, 0.8, 0.9). What am I doing wrong? Any ideas, thoughts and/or comments are highly appreciated. Jens [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-helphttps://stat.ethz.ch/mailman/**listinfo/r-help https://stat.**ethz.ch/mailman/listinfo/r-**helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mgcv: how select significant predictor vars when using gam(...select=TRUE) using automatic optimization
Hi Jan and Simon, If possible, could you attach the diagnostic plots. I would be curious to see them. Thanks, Juliet On Fri, Apr 19, 2013 at 4:39 AM, jholstei jan.holst...@awi.de wrote: Simon, that was very instructivevery special thanks to you. I already noticed that the model was bad, but it was not clear to me that transformation of predictors to, say a more centered distribution is helpful here. And thanks for pointing out Tweedie, I noticed that the error structure is far from normal and more like gamma or poisson, but Gamma made things worse. Best regards, Jan Am 18 Apr 2013 um 17:25 schrieb Simon Wood: Jan, Thanks for the data (off list). The p-value computations are based on the approximation that things are approximately normal on the linear predictor scale, but actually they are no where close to normal in this case, which is why the p-values look inconsistent. The reason that the approximate normality assumption doesn't hold is that the model is quite a poor fit. If you take a look at gam.check(fit) you'll see that the constant variance assumption of quasi(link=log) is violated quite badly, and the residual distribution is really quite odd (plot residuals against fitted as well). Also see plot(fit,pages=1,scale=0) - it shows ballooning confidence intervals and smooth estimates that are so low in places that they might as well be minus infinity (given log link) - clearly something is wrong with this model! I would be inclined to reset all the 0's to 0 (rather than 0.01), and then to try Tweedie(p=1.5,link=log) as the family. Also the predictor variables are very skewed which is giving leverage problems, so I would transform them to give less skew. e.g. Something like fit-gam(target~s(log(mgs))+s(I(gsd^.5))+s(I(mud^.25))+s(log(ssCmax)), family=Tweedie(p=1.6,link=log),data=df,method=REML) gives a model that is closer to being reasonable (p-values are then consistent between select=TRUE and FALSE). best, Simon On 18/04/13 14:24, Simon Wood wrote: Jan, Thanks for this. Is there any chance that you could send me the data off list and I'll try to figure out what is happening? (Under the understanding that I'll only use the data for investigating this issue, of course). best, Simon on 18/04/13 11:11, Jan Holstein wrote: Simon, thanks for the reply, I guess I'm pretty much up to date using mgcv 1.7-22. Upgrading to R 3.0.0 also didn't do any change. Unfortunately using method=REML does not make any difference: ### first with select=FALSE fit-gam(target ~s(mgs)+s(gsd)+s(mud)+s(ssCmax),family=quasi(link=log),data=wspe1,method=REML,select=F) summary(fit) Family: quasi Link function: log Formula: target ~ s(mgs) + s(gsd) + s(mud) + s(ssCmax) Parametric coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -4.724 7.462 -0.6330.527 Approximate significance of smooth terms: edf Ref.df F p-value s(mgs)3.118 3.492 0.099 0.974 s(gsd)6.377 7.044 15.596 2e-16 *** s(mud)8.837 8.971 18.832 2e-16 *** s(ssCmax) 3.886 4.051 2.342 0.052 . --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 R-sq.(adj) = 0.403 Deviance explained = 40.6% REML score = 33186 Scale est. = 8.7812e+05 n = 4511 Then using select=T fit2-gam(target ~s(mgs)+s(gsd)+s(mud)+s(ssCmax),family=quasi(link=log),data=wspe1,method=REML,select=TRUE) summary(fit2) Family: quasi Link function: log Formula: target ~ s(mgs) + s(gsd) + s(mud) + s(ssCmax) Parametric coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -6.406 5.239 -1.2230.222 Approximate significance of smooth terms: edf Ref.df F p-value s(mgs)2.844 8 25.43 2e-16 *** s(gsd)6.071 9 14.50 2e-16 *** s(mud)6.875 8 21.79 2e-16 *** s(ssCmax) 3.787 8 18.42 2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 R-sq.(adj) =0.4 Deviance explained = 40.1% REML score = 33203 Scale est. = 8.8359e+05 n = 4511 I played around with other families/link functions with no success regarding the select behaviour. Well, look at the structure of my data: http://r.789695.n4.nabble.com/file/n4664586/screen-capture-1.png All possible predictor variables in principle look like this, and taken alone, each and every is significant according to p-value (but not all can at the same time). In theory, the target variable should be a hypersurface in 11dim space with lots of noise, but interaction of more than 2 vars gets costly (not to think of 11) and often enough (also without interaction) the solution does not converge at minimal step size. If it does, results are usually not as good as without interaction. Any comment/advice on model setup is
Re: [R] [r] Problems with Fortran calls when loaded a dll compiled with gfortran-4 Cygwin 4.5.3
Hello again. I may be a bit slow, but I'm learning. :-) Installed Rtools, added R to the path. Used CMD, moved to the folder of the source and wrote R CMD SHLIB Mango.f95 and voila I got a dll I loaded in R and I could call my current subroutines as intended. I am very grateful for ur help and ur patience. Sincerely, Jens 24 apr 2013 kl. 21:33 skrev R. Michael Weylandt michael.weyla...@gmail.com michael.weyla...@gmail.com: On Apr 24, 2013, at 7:46 PM, Jens Olofsson jens.olofs...@gmail.com wrote: Ok. I apologise for not understanding. So, I have installed R-tools. It changed my PATH-variable. I didn't installed Cygwin dlls as stated by http://cran.r-project.org/doc/manuals/R-admin.html#The-Windows-toolset. Instead my PATH-variable contains the path to the Cygwin dlls AFTER the path to R... So, I started RTerm (32-bit) and tried R CMD SHLIB mango.f95 Repeating what Duncan said -- R CMD SHLIB is to be done outside of R, not within R. Some recent discussion also suggests its perhaps easier to do this with RStudio + devtools as part of a package than as a standalone shared object. MW and got the same error as earlier Error: unexpected symbol in R CMD. The same goes for RTerm (64-bit). Can you pls advice me on how to proceed? Sincerely Jens On 24 April 2013 20:08, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 13-04-24 1:51 PM, Jens Olofsson wrote: Dear Duncan, I know this isn't a forum for Cygwin, but for R. Pls treat me as a noob and also remember I am on Windows. How should I use R CMD SHLIB as if I write that at the prompt I get the error: unexpected symbol in R CMD. I have mango.f95 in the working directory. That's a command-line command, not something done with R. You can use it from your bash shell if you have R and the Rtools directories on your path, or from the Windows CMD shell. BTW, my comment wasn't trying to tell you to go to a Cygwin forum, it was telling you that Cygwin's gfortran is unsupported. You need to use the MinGW-64 one that we distribute if you want us to be able to help. Duncan Murdoch //Jens On 24 April 2013 19:46, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 13-04-24 1:36 PM, Jens Olofsson wrote: Dear users of R I have a subroutine in Fortran95, compiled to a DLL with gfortran in Cygwin 4.5.3. We don't support Cygwin. You should use the gfortran in Rtools, and get R to set the command line options for you, either by putting the code in a package, or by using R CMD SHLIB Mango.f95. Duncan Murdoch The subroutine is: subroutine MyPBP( S, p, N ) ! Expose subroutine rtest to users of this DLL !DEC$ ATTRIBUTES DLLEXPORT, C, REFERENCE, ALIAS: mypbp_ ::mypbp ! This function computes the Poisson-Binomial distribution ! of size N using p double precision, intent(inout) :: S(N+1) double precision, intent(in) :: p(N) integer, intent(in) :: N double precision :: X(N+1) integer i, j !X=0 !S=0 X(1) = 1 - p(1) X(2) = p(1) do i = 2, N S(1) = X(1)*(1-p(i)) do j = 2,i S(j) = X(j-1)*p(i) + X(j)*(1-p(i)) end do S(i+1) = X(i)*p(i) X = S if (i == N) then S = X end if end do end subroutine MyPBP and it is saved into Mango.f95 I compile it from the bash shell using: gfortran-4 c- Mango.f95 and gfortran-4 -shared -o Mango.dll Mango.o I am on a Windows machine running Windows 7 with Intel i7. I load the dll in a 32-bit R by dyn.load(Mango.dll). Using getLoadedDLLs I can see the DLL. However, is.loaded(Mango.dll) = FALSE. In addition, R stop responding when I try .Fortran(MyPBP, as.numeric(S), as.numeric(p), as.integer(N)), where N-5, S-array(0,N+1) and p- c(0.1, 0.2, 0.5, 0.8, 0.9). What am I doing wrong? Any ideas, thoughts and/or comments are highly appreciated. Jens [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-helphttps://stat.ethz.ch/mailman/**listinfo/r-help https://stat.**ethz.ch/mailman/listinfo/r-**helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] RDA permutest envfit
On Wed, 2013-04-24 at 10:38 -0700, Rémi Lesmerises wrote: Dear all, I did a RDA and when I looked to the signification of the test with permutest, the output was non-significant. But when I used the envfit function, some of the vectors are significant. All the test's conditions are respected. What it means? Is it an error in the script? The two functions are testing two fundamentally different things. `permutest` (as you invoked it --- `first = TRUE`) is testing the first axis of the RDA. That axis is a linear combination of the constraints. The `envfit` procedure is testing the individual correlations between the 2-d configuration of samples in the RDA space and the direction of maximal variance of the environmental data. Each variable is considered separately. I can easily imagine a case, where the variance explained on the first axis is not significant but variance over 2 axes is significant, as one where the vectors do not point solely along the first RDA axis but also point along the 2nd axis. By looking only at their contributions to the first axis and summing them you don't explain a whole lot, but when you look at the directions in 2D space each variable explains a significant amount of variance. HTH G Commands and output: permutest(rda.ind, perm=999, first=TRUE) Permutation test for rda Call: rda(formula = x ~ ARH_frs + CON_frs + PRP_frs + RUI_frs + VAM_frs, data = z) Permutation test for first constrained eigenvalue Pseudo-F:4.093568 (with 1, 10 Degrees of Freedom) Significance:0.413 Based on 999 permutations under reduced model. fit - envfit(rda.ind, z, perm = 999, display = lc) fit ***VECTORS RDA1 RDA2 r2 Pr(r) VAM_frs 0.145281 -0.989390 0.2388 0.147 ARH_frs -0.876494 -0.481413 0.6127 0.002 ** CON_frs 0.904278 0.426944 0.4846 0.013 * PRP_frs -0.997634 0.068755 0.9433 0.001 *** RUI_frs -0.648512 -0.761204 0.6243 0.004 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 P values based on 999 permutations. Rémi Lesmerises, biol. M.Sc., Candidat Ph.D. en Biologie Université du Québec à Rimouski remilesmeri...@yahoo.ca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [r] Problems with Fortran calls when loaded a dll compiled with gfortran-4 Cygwin 4.5.3
On 13-04-24 2:46 PM, Jens Olofsson wrote: Ok. I apologise for not understanding. So, I have installed R-tools. It changed my PATH-variable. I didn't installed Cygwin dlls as stated by http://cran.r-project.org/doc/manuals/R-admin.html#The-Windows-toolset. Instead my PATH-variable contains the path to the Cygwin dlls AFTER the path to R... So, I started RTerm (32-bit) and tried R CMD SHLIB mango.f95 and got the same error as earlier Error: unexpected symbol in R CMD. The same goes for RTerm (64-bit). Can you pls advice me on how to proceed? See my first paragraph below. Duncan Murdoch Sincerely Jens On 24 April 2013 20:08, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 13-04-24 1:51 PM, Jens Olofsson wrote: Dear Duncan, I know this isn't a forum for Cygwin, but for R. Pls treat me as a noob and also remember I am on Windows. How should I use R CMD SHLIB as if I write that at the prompt I get the error: unexpected symbol in R CMD. I have mango.f95 in the working directory. That's a command-line command, not something done with R. You can use it from your bash shell if you have R and the Rtools directories on your path, or from the Windows CMD shell. BTW, my comment wasn't trying to tell you to go to a Cygwin forum, it was telling you that Cygwin's gfortran is unsupported. You need to use the MinGW-64 one that we distribute if you want us to be able to help. Duncan Murdoch //Jens On 24 April 2013 19:46, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 13-04-24 1:36 PM, Jens Olofsson wrote: Dear users of R I have a subroutine in Fortran95, compiled to a DLL with gfortran in Cygwin 4.5.3. We don't support Cygwin. You should use the gfortran in Rtools, and get R to set the command line options for you, either by putting the code in a package, or by using R CMD SHLIB Mango.f95. Duncan Murdoch The subroutine is: subroutine MyPBP( S, p, N ) ! Expose subroutine rtest to users of this DLL !DEC$ ATTRIBUTES DLLEXPORT, C, REFERENCE, ALIAS: mypbp_ ::mypbp ! This function computes the Poisson-Binomial distribution ! of size N using p double precision, intent(inout) :: S(N+1) double precision, intent(in) :: p(N) integer, intent(in) :: N double precision :: X(N+1) integer i, j !X=0 !S=0 X(1) = 1 - p(1) X(2) = p(1) do i = 2, N S(1) = X(1)*(1-p(i)) do j = 2,i S(j) = X(j-1)*p(i) + X(j)*(1-p(i)) end do S(i+1) = X(i)*p(i) X = S if (i == N) then S = X end if end do end subroutine MyPBP and it is saved into Mango.f95 I compile it from the bash shell using: gfortran-4 c- Mango.f95 and gfortran-4 -shared -o Mango.dll Mango.o I am on a Windows machine running Windows 7 with Intel i7. I load the dll in a 32-bit R by dyn.load(Mango.dll). Using getLoadedDLLs I can see the DLL. However, is.loaded(Mango.dll) = FALSE. In addition, R stop responding when I try .Fortran(MyPBP, as.numeric(S), as.numeric(p), as.integer(N)), where N-5, S-array(0,N+1) and p- c(0.1, 0.2, 0.5, 0.8, 0.9). What am I doing wrong? Any ideas, thoughts and/or comments are highly appreciated. Jens [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-helphttps://stat.ethz.ch/mailman/**listinfo/r-help https://stat.**ethz.ch/mailman/listinfo/r-**helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tables package - remove NAs and NaN
Dear Rxperts, Sorry if I am posting a really really dumb request.. I am new to subversion and am trying to use subversion to download the tables package as suggested by Duncan. I installed subversion client(from collabnet) and tried to access tables package using the command below. svn checkout svn://scm.r-forge.r-project.org/svnroot/tables/ I get the following error message: C:\Users\santosh\tempsvn checkout svn:// scm.r-forge.r-project.org/svnroot/tables/ svn: E730060: Unable to connect to a repository at URL 'svn://scm.r-forge.r-proj ect.org/svnroot/tables' svn: E730060: Can't connect to host 'scm.r-forge.r-project.org': A connection at tempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. Is there anything additional I need to do with Subversion or with the commands? Regards, Santosh On Tue, Apr 23, 2013 at 5:13 AM, Duncan Murdoch murdoch.dun...@gmail.comwrote: On 13-04-23 6:31 AM, Duncan Murdoch wrote: On 13-04-22 10:40 PM, David Winsemius wrote: On Apr 22, 2013, at 5:49 PM, Santosh wrote: Dear Rxperts, q - data.frame(p=rep(c(A,B),**each=10,len=30), a=rep(c(1,2,3),each=10),id=**seq(30), b=round(runif(30,10,20)), c=round(runif(30,40,70))) The operation below... tabular(((p=factor(p))*(a=**factor(a))+1) ~ (N = 1) + (b + c)* (mean+sd),data=q) yields some rows of NAs and NaN as shown below b c p a N mean sdmean sd A 1 10 16.30 2.497 52.30 9.358 20 NaNNA NaN NA 3 10 15.60 2.716 60.30 8.001 B 10 NaNNA NaN NA 2 10 15.40 2.366 57.70 10.414 30 NaNNA NaN NA All 30 15.77 2.473 56.77 9.601 How do I remove the rows having N=0 ? I would like the resulting table look like.. b c p a N mean sdmean sd A 1 10 16.30 2.497 52.30 9.358 3 10 15.60 2.716 60.30 8.001 B 2 10 15.40 2.366 57.70 10.414 All 30 15.77 2.473 56.77 9.601 Here's a bit of a hack: tabular( (`p a`=interaction(p,a, drop=TRUE, sep= )) ~ (N = 1) + (b + c)* (mean+sd),data=q) b c p a N mean sd mean sd A 1 10 12.8 0.7888 52.1 8.020 B 2 10 16.3 3.0569 54.9 8.711 A 3 10 14.6 3.7771 56.5 6.980 I have been rather hoping that Duncan Murdoch would have noticed the earlier thread, but maybe he can comment on whether there is a more direct route/ This isn't something that the package is designed to handle: if you say p*a, it wants all combinations of p and a. If I wanted a table like that, I'd use a different hack. One possibility is to create that interaction column, but display it as just the initial letter, labelled p, and then add another column to contain the a values as data. It would be tricky to get the formatting right. Another possibility is to generate the whole table with the N=0 rows, and then post-process it to remove those rows, and adjust the row labels appropriately. This approach probably gives the nicer result, but the post-processing is quite messy: you need to delete some rows from the table, from its rowLabels attribute, and from the justification attributes of both the table and its rowLabels. (I should add a [ method to the package to hide this messiness.) I've done this now, in version 0.7.54 on R-forge. To leave out the rows with N=0, you can select a subset of the table where N (the first column) is non-zero: tab - tabular(((p=factor(p))*(a=**factor(a))+1) ~ (N = 1) + (b + c)*(mean+sd),data=q) tab[ tab[,1] 0, ] and it produces this: b c p a N mean sdmean sd A 1 10 16.20 3.458 56.3 10.155 3 10 13.60 2.119 58.1 8.075 B 2 10 14.40 2.547 51.2 9.438 All 30 14.73 2.888 55.2 9.419 Indexing of tables isn't as general as indexing of matrices, but most of the simple forms should work. I haven't tested yet, but I expect this will be fine in LaTeX or HTML (also new, not on CRAN yet) output as well. Duncan Murdoch [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pglm package: fitted values and residuals
On Wed, Apr 24, 2013 at 3:11 AM, alfonso.carf...@uniparthenope.it wrote: I'm using the package pglm and I'have estimated a random probit model. I need to save in a vector the fitted values and the residuals of the model but I can not do it. I tried with the command fitted.values using the following procedure without results: This is one of those ask the pglm authors questions. You should take it up with the authors of the package. There is a specialized email list R-sig-mixed where you will find more people working on this exact same thing. pglm looks like fun to me, but it is not quite done, so far as I can tell. Well, the authors have not gone the extra step to make their regression objects behave like other R regression objects. In case you need alternative software, ask in R-sig-mixed. You'll learn that most of these can be estimated with other packages. But I really like the homogeneous user interface that is spelled out in pglm, and I expect my students will run into the same questions that you have.. I just downloaded their source code, you probably ought to do that so you can understand what they are doing. They provide the fitting functions, but they do not do any of the other work necessary to make these functions fit together with the R class framework. There are no methods for predict, anova, and so forth. I'm in their R folder looking for implementations: pauljohn@pols110:/tmp/pglm/R$ grep summary * pauljohn@pols110:/tmp/pglm/R$ grep predict * pauljohn@pols110:/tmp/pglm/R$ grep class * pauljohn@pols110:/tmp/pglm/R$ grep fitted * pglm.R: # glm models can be fitted Run example(pglm) what can we do after that? plot(anb) Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' is a list, but does not have components 'x' and 'y' ## Nothing. ## We do get a regression summary object, that's better than some packages provide: anbsum - summary(anb) ## And a coefficient table coef(anbsum) Estimate Std. error t value Pr( t) (Intercept) -6.933764e-01 0.061391429 -11.294351205 1.399336e-29 wage 1.517009e-02 0.006375966 2.379261231 1.734738e-02 exper1.314229e-03 0.007400129 0.177595444 8.590407e-01 ruralyes-8.594328e-05 0.051334716 -0.001674175 9.986642e-01 model.matrix(anb) Error in terms.default(object) : no terms component nor attribute anova(anb) Error in UseMethod(anova) : no applicable method for 'anova' applied to an object of class c('maxLik', 'maxim') predict(anb) Error in UseMethod(predict) : no applicable method for 'predict' applied to an object of class c('maxLik', 'maxim') So, if you want those features with these models, you'll have to get busy and do a lot of coding! While working on regression support lately, I've reached the conclusion that if an R package that claims to do regression but does not provide methods for summary, predict, anova, nobs, fitted, logLik, AIC, and so forth, then it is not done yet. Otherwise, users like you who expect to be able to run methods like fitted or such have a bad experience, as you are having now. Maybe somebody reading this will remind us where the common list of R regression methods is listed. I know for sure I've seen a document about these things, but I'm baffled now trying to find it. But I'm sure there is one. pj library(pglm) m1_S-pglm(Feed ~ Cons_PC_1 + imp_gen_1 + LGDP_PC_1 + lnEI_1 + SH_Ren_1,data,family=binomial(probit),model=random,method=bfgs,index=c(Year,IDCountry)) m1_S$fitted.values residuals(m1) Can someone help me about it? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tables package - remove NAs and NaN
On 13-04-24 3:23 PM, Santosh wrote: Dear Rxperts, Sorry if I am posting a really really dumb request.. I am new to subversion and am trying to use subversion to download the tables package as suggested by Duncan. I installed subversion client(from collabnet) and tried to access tables package using the command below. svn checkout svn://scm.r-forge.r-project.org/svnroot/tables/ I get the following error message: C:\Users\santosh\tempsvn checkout svn:// scm.r-forge.r-project.org/svnroot/tables/ svn: E730060: Unable to connect to a repository at URL 'svn://scm.r-forge.r-proj ect.org/svnroot/tables' svn: E730060: Can't connect to host 'scm.r-forge.r-project.org': A connection at tempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. Is there anything additional I need to do with Subversion or with the commands? The spacing looks funny there: You should have no blanks in the path. Maybe you didn't, it's only the email. In that case, R-forge is probably just responding very slowly. You could try this path instead: svn checkout svn://scm.r-forge.r-project.org/svnroot/tables/pkg/tables This is the subdirectory that contains everything in the package. And you should be able to install the package directly from R-forge by setting your repository to http://R-forge.r-project.org, but it is very slow on updates, so it hasn't got to the current version yet. Duncan Murdoch Regards, Santosh On Tue, Apr 23, 2013 at 5:13 AM, Duncan Murdoch murdoch.dun...@gmail.comwrote: On 13-04-23 6:31 AM, Duncan Murdoch wrote: On 13-04-22 10:40 PM, David Winsemius wrote: On Apr 22, 2013, at 5:49 PM, Santosh wrote: Dear Rxperts, q - data.frame(p=rep(c(A,B),**each=10,len=30), a=rep(c(1,2,3),each=10),id=**seq(30), b=round(runif(30,10,20)), c=round(runif(30,40,70))) The operation below... tabular(((p=factor(p))*(a=**factor(a))+1) ~ (N = 1) + (b + c)* (mean+sd),data=q) yields some rows of NAs and NaN as shown below b c p a N mean sdmean sd A 1 10 16.30 2.497 52.30 9.358 20 NaNNA NaN NA 3 10 15.60 2.716 60.30 8.001 B 10 NaNNA NaN NA 2 10 15.40 2.366 57.70 10.414 30 NaNNA NaN NA All 30 15.77 2.473 56.77 9.601 How do I remove the rows having N=0 ? I would like the resulting table look like.. b c p a N mean sdmean sd A 1 10 16.30 2.497 52.30 9.358 3 10 15.60 2.716 60.30 8.001 B 2 10 15.40 2.366 57.70 10.414 All 30 15.77 2.473 56.77 9.601 Here's a bit of a hack: tabular( (`p a`=interaction(p,a, drop=TRUE, sep= )) ~ (N = 1) + (b + c)* (mean+sd),data=q) b c p a N mean sd mean sd A 1 10 12.8 0.7888 52.1 8.020 B 2 10 16.3 3.0569 54.9 8.711 A 3 10 14.6 3.7771 56.5 6.980 I have been rather hoping that Duncan Murdoch would have noticed the earlier thread, but maybe he can comment on whether there is a more direct route/ This isn't something that the package is designed to handle: if you say p*a, it wants all combinations of p and a. If I wanted a table like that, I'd use a different hack. One possibility is to create that interaction column, but display it as just the initial letter, labelled p, and then add another column to contain the a values as data. It would be tricky to get the formatting right. Another possibility is to generate the whole table with the N=0 rows, and then post-process it to remove those rows, and adjust the row labels appropriately. This approach probably gives the nicer result, but the post-processing is quite messy: you need to delete some rows from the table, from its rowLabels attribute, and from the justification attributes of both the table and its rowLabels. (I should add a [ method to the package to hide this messiness.) I've done this now, in version 0.7.54 on R-forge. To leave out the rows with N=0, you can select a subset of the table where N (the first column) is non-zero: tab - tabular(((p=factor(p))*(a=**factor(a))+1) ~ (N = 1) + (b + c)*(mean+sd),data=q) tab[ tab[,1] 0, ] and it produces this: b c p a N mean sdmean sd A 1 10 16.20 3.458 56.3 10.155 3 10 13.60 2.119 58.1 8.075 B 2 10 14.40 2.547 51.2 9.438 All 30 14.73 2.888 55.2 9.419 Indexing of tables isn't as general as indexing of matrices, but most of the simple forms should work. I haven't tested yet, but I expect this will be fine in LaTeX or HTML (also new, not on CRAN yet) output as well. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RDA permutest envfit
Thank you! A last question: Is it still the same explanation if I remove the condition first=TRUE (and then testing for all axis) and permutest gives the same result?  Rémi Lesmerises, biol. M.Sc., Candidat Ph.D. en Biologie Université du Québec à Rimouski remilesmeri...@yahoo.ca On Wed, 2013-04-24 at 10:38 -0700, Rémi Lesmerises wrote: Dear all, I did a RDA and when I looked to the signification of the test with permutest, the output was non-significant. But when I used the envfit function, some of the vectors are significant. All the test's conditions are respected. What it means? Is it an error in the script? The two functions are testing two fundamentally different things. `permutest` (as you invoked it --- `first = TRUE`) is testing the first axis of the RDA. That axis is a linear combination of the constraints. The `envfit` procedure is testing the individual correlations between the 2-d configuration of samples in the RDA space and the direction of maximal variance of the environmental data. Each variable is considered separately. I can easily imagine a case, where the variance explained on the first axis is not significant but variance over 2 axes is significant, as one where the vectors do not point solely along the first RDA axis but also point along the 2nd axis. By looking only at their contributions to the first axis and summing them you don't explain a whole lot, but when you look at the directions in 2D space each variable explains a significant amount of variance. HTH G Commands and output: permutest(rda.ind, perm=999, first=TRUE) Permutation test for rda Call: rda(formula = x ~ ARH_frs + CON_frs + PRP_frs + RUI_frs + VAM_frs, data = z) Permutation test for first constrained eigenvalue Pseudo-F:    4.093568 (with 1, 10 Degrees of Freedom) Significance:  0.413 Based on 999 permutations under reduced model. fit - envfit(rda.ind, z, perm = 999, display = lc) fit ***VECTORS        RDA1   RDA2  r2 Pr(r)  VAM_frs 0.145281 -0.989390 0.2388 0.147  ARH_frs -0.876494 -0.481413 0.6127 0.002 ** CON_frs 0.904278 0.426944 0.4846 0.013 * PRP_frs -0.997634 0.068755 0.9433 0.001 *** RUI_frs -0.648512 -0.761204 0.6243 0.004 ** --- Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1 P values based on 999 permutations.  Rémi Lesmerises, biol. M.Sc., Candidat Ph.D. en Biologie Université du Québec à Rimouski remilesmeri...@yahoo.ca    [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson      [t] +44 (0)20 7679 0522 ECRC, UCL Geography,     [f] +44 (0)20 7679 0565 Pearson Building,      [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London     [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT.        [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identify object that causes Error in loadNamespace(name) : there is no package called ‘R.utils’
On 13-04-24 10:57 AM, Duncan Murdoch wrote: On 13-04-24 10:12 AM, Martin Morgan wrote: On 04/24/2013 06:03 AM, Duncan Murdoch wrote: On 13-04-24 5:46 AM, Liviu Andronic wrote: Dear all, I've bumped into the: Error in loadNamespace(name) : there is no package called ‘R.utils’ error. I've already read a bit on this ( http://www.cybaea.net/Blogs/Data/A-warning-on-the-R-save-format.html ) but I have a follow-up question. Given a workspace that automatically loads a package that I don't really need/want (e.g. ‘R.utils’), how do I identify which object requires this package to load? I would like to avoid loading ‘R.utils’ every time I open an R session. That's not easy, because the code in R that triggers that error has no idea of the name of the object it is loading. Maybe traceback() can provide some hints? I did, more or less arbitrarily library(rms) a = list(fun=ie.setup) save(a, file=/tmp/a.rda) remove.packages(rms) and then in a new session load(/tmp/a.rda) Error in loadNamespace(name) : there is no package called 'rms' traceback() 7: stop(e) 6: value[[3L]](cond) 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 4: tryCatchList(expr, classes, parentenv, handlers) 3: tryCatch(loadNamespace(name), error = function(e) stop(e)) 2: getNamespace(c(rms, 3.6-3)) 1: load(/tmp/a.rda) with the line numbered 2 giving me the necessary hint. That tells you that some object needs the rms package, but I don't see how you would know it is a that is the problem. We already knew that rms was needed from the error message. What I've done sometimes in debugging is to change that error to a warning in the getNamespace() function, and add some tracing code to the serialization code to print the names of objects as they are loaded. (This goes in ReadItem in src/main/serialize.c.) I wouldn't expect Liviu to make those changes, but perhaps a verbose option could be added to load(), so that it could be available to users. I have added this in R-devel. The format of the printed output may well change before this is ever released, but it should be enough to identify the bad item already. You'll need a build of R-devel from r62658 or newer to see this. Then load(/tmp/a.rda, verbose=TRUE) will print the names of objects as they are read (the names are read after the attributes and before the value). If you want to see reams of mostly useless information, you can try verbose=n (for some number n=2 or more); this prints names and component numbers to a greater depth. Duncan Murdoch Duncan Murdoch Martin You could try a binary search to find out, but it will be tedious: 1. Install R.utils. 2. Load the workspace successfully. 3. Delete half the objects, and save it. 4. Uninstall R.utils, and see if you can load the workspace. At this point you'll know if there's an object needing R.utils still left or not, and you can repeat the steps until you find a single object that causes the problem. (But it might not be the only one, so deleting it from the original workspace might not solve your problem.) A better approach is to *never* save and load workspaces unless you know exactly what is in them. Always reply no to the question about saving your workspace (or set that as the default). If you accidentally end up with a workspace being loaded, delete it. Duncan Murdoch Regards, Liviu sessionInfo() R version 2.15.3 (2013-03-01) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] datasets grDevices splines graphics utils stats methods base other attached packages: [1] R.utils_1.23.2R.oo_1.13.0 R.methodsS3_1.4.2 tables_0.7.57 reshape2_1.2.2 [6] car_2.0-15nnet_7.3-6MASS_7.3-23 Hmisc_3.10-1 survival_2.37-2 [11] foreign_0.8-53 loaded via a namespace (and not attached): [1] cluster_1.14.3 grid_2.15.3 lattice_0.20-13 plyr_1.8 rstudio_0.97.312 [6] stringr_0.6.2tools_2.15.3 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RDA permutest envfit
On Wed, 2013-04-24 at 13:13 -0700, Rémi Lesmerises wrote: Thank you! A last question: Is it still the same explanation if I remove the condition first=TRUE (and then testing for all axis) and permutest gives the same result? No; again `permutest(, first = FALSE)` and `envfit()` are testing different models/fits but how they are different is not the same as with your previous question. `permutest(, first = FALSE)` is testing whether the variance explained by the k environmental variable (k = 4 I believe in your case) is sufficiently large as to be unusual when compared to the variance explained under a null model (achieved by permuting residuals, propagating those residuals plus fitted values to give new data and refitting the model on those new data). `envfit()` is doing what it always did; assessing whether the correlation between the vector and the 2-d configuration is unusually large. In the `permutest(, first = FALSE)` case, you have a model with 4df. In the `envfit()` case you have 4 models each with 1 degree of freedom. The two methods also differ in terms of the role played by the environmental data. In the `permutest` case, the environmental data are the predictors. In the `envfit` case the env data are the responses and the axis scores in the 2 dimensions chosen are the predictors. HTH G Rémi Lesmerises, biol. M.Sc., Candidat Ph.D. en Biologie Université du Québec à Rimouski remilesmeri...@yahoo.ca On Wed, 2013-04-24 at 10:38 -0700, Rémi Lesmerises wrote: Dear all, I did a RDA and when I looked to the signification of the test with permutest, the output was non-significant. But when I used the envfit function, some of the vectors are significant. All the test's conditions are respected. What it means? Is it an error in the script? The two functions are testing two fundamentally different things. `permutest` (as you invoked it --- `first = TRUE`) is testing the first axis of the RDA. That axis is a linear combination of the constraints. The `envfit` procedure is testing the individual correlations between the 2-d configuration of samples in the RDA space and the direction of maximal variance of the environmental data. Each variable is considered separately. I can easily imagine a case, where the variance explained on the first axis is not significant but variance over 2 axes is significant, as one where the vectors do not point solely along the first RDA axis but also point along the 2nd axis. By looking only at their contributions to the first axis and summing them you don't explain a whole lot, but when you look at the directions in 2D space each variable explains a significant amount of variance. HTH G Commands and output: permutest(rda.ind, perm=999, first=TRUE) Permutation test for rda Call: rda(formula = x ~ ARH_frs + CON_frs + PRP_frs + RUI_frs + VAM_frs, data = z) Permutation test for first constrained eigenvalue Pseudo-F:4.093568 (with 1, 10 Degrees of Freedom) Significance:0.413 Based on 999 permutations under reduced model. fit - envfit(rda.ind, z, perm = 999, display = lc) fit ***VECTORS RDA1 RDA2 r2 Pr(r) VAM_frs 0.145281 -0.989390 0.2388 0.147 ARH_frs -0.876494 -0.481413 0.6127 0.002 ** CON_frs 0.904278 0.426944 0.4846 0.013 * PRP_frs -0.997634 0.068755 0.9433 0.001 *** RUI_frs -0.648512 -0.761204 0.6243 0.004 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 P values based on 999 permutations. Rémi Lesmerises, biol. M.Sc., Candidat Ph.D. en Biologie Université du Québec à Rimouski remilesmeri...@yahoo.ca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pglm package: fitted values and residuals
On Wed, 24 Apr 2013, Paul Johnson wrote: On Wed, Apr 24, 2013 at 3:11 AM, alfonso.carf...@uniparthenope.it wrote: I'm using the package pglm and I'have estimated a random probit model. I need to save in a vector the fitted values and the residuals of the model but I can not do it. I tried with the command fitted.values using the following procedure without results: This is one of those ask the pglm authors questions. You should take it up with the authors of the package. There is a specialized email list R-sig-mixed where you will find more people working on this exact same thing. pglm looks like fun to me, but it is not quite done, so far as I can tell. Well, the authors have not gone the extra step to make their regression objects behave like other R regression objects. In case you need alternative software, ask in R-sig-mixed. You'll learn that most of these can be estimated with other packages. But I really like the homogeneous user interface that is spelled out in pglm, and I expect my students will run into the same questions that you have.. I just downloaded their source code, you probably ought to do that so you can understand what they are doing. They provide the fitting functions, but they do not do any of the other work necessary to make these functions fit together with the R class framework. There are no methods for predict, anova, and so forth. This is only partially true. In fact, pglm employs the framework provided by the maxLik (by Ott Toomet and Arne Henningsen) and hence it inherits some of the methods that maxLik provides for all of its fitted model objects. So there is summary(), coef(), vcov(), AIC() work and you can leverage tools like coeftest() from lmtest, linearHypothesis() from car or sandwich() from sandwich do work. But it is certainly true that even more features would be desirable and I think that Yves always planned on enhancing pglm at some point. (After all it's still the initial version 0.1-0 on CRAN...) I'm in their R folder looking for implementations: pauljohn@pols110:/tmp/pglm/R$ grep summary * pauljohn@pols110:/tmp/pglm/R$ grep predict * pauljohn@pols110:/tmp/pglm/R$ grep class * pauljohn@pols110:/tmp/pglm/R$ grep fitted * pglm.R: # glm models can be fitted Run example(pglm) what can we do after that? plot(anb) Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' is a list, but does not have components 'x' and 'y' ## Nothing. ## We do get a regression summary object, that's better than some packages provide: anbsum - summary(anb) ## And a coefficient table coef(anbsum) Estimate Std. error t value Pr( t) (Intercept) -6.933764e-01 0.061391429 -11.294351205 1.399336e-29 wage 1.517009e-02 0.006375966 2.379261231 1.734738e-02 exper1.314229e-03 0.007400129 0.177595444 8.590407e-01 ruralyes-8.594328e-05 0.051334716 -0.001674175 9.986642e-01 model.matrix(anb) Error in terms.default(object) : no terms component nor attribute anova(anb) Error in UseMethod(anova) : no applicable method for 'anova' applied to an object of class c('maxLik', 'maxim') predict(anb) Error in UseMethod(predict) : no applicable method for 'predict' applied to an object of class c('maxLik', 'maxim') So, if you want those features with these models, you'll have to get busy and do a lot of coding! While working on regression support lately, I've reached the conclusion that if an R package that claims to do regression but does not provide methods for summary, predict, anova, nobs, fitted, logLik, AIC, and so forth, then it is not done yet. Otherwise, users like you who expect to be able to run methods like fitted or such have a bad experience, as you are having now. Maybe somebody reading this will remind us where the common list of R regression methods is listed. I know for sure I've seen a document about these things, but I'm baffled now trying to find it. But I'm sure there is one. I'm sure that there are many. One of my attempts to write up a list is in Table 1 of vignette(betareg, package = betareg). Personally, I don't write anova() methods for my model objects because I can leverage lrtest() and waldtest() from lmtest and linearHypothesis() and deltaMethod() from car as long as certain standard methods are available, including coef(), vcov(), logLik(), etc. Similarly, an AIC() method is typically not needed as long as logLik() is available. And BIC() works if nobs() is available in addition. Best, Z pj library(pglm) m1_S-pglm(Feed ~ Cons_PC_1 + imp_gen_1 + LGDP_PC_1 + lnEI_1 + SH_Ren_1,data,family=binomial(probit),model=random,method=bfgs,index=c(Year,IDCountry)) m1_S$fitted.values residuals(m1) Can someone help me about it? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and
[R] getting started in parallel computing on a windows OS
Dear R help, I've what I think is a fairly simple parallel problem, and am getting bogged down in documentation and packages for much more complex situations. I have a big matrix (30^5,5]. I have a function that will act on each row of that matrix sequentially and output the 'best' result from the whole matrix (it compares the result from each row to the last and keeps the 'better' result). I would like to divide that first large matrix into chunks equal to the number of cores I have available to me, and work through each chunk, then output the results from each chunk. I'm really having trouble making head or tail of how to do this on a windows machine - lots of different false starts on several different packages now. Basically, I have the function, and I can of course easily divide the matrix into chunks. I just need a way to process each chunk in parallel (other than opening new R sessions for each core manually). Any help much appreciated - after two days of trying to get this to work I'm pretty burnt out. Thanks *Ben Caldwell* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tables package - remove NAs and NaN
yes, With Duncan's and Liviu's help, I was able to remove those NAs and NaNs from the tabular summary. svn .. thing has not worked for me yet.. would try this later.. Thanks so much! Regards, Santosh On Wed, Apr 24, 2013 at 1:46 PM, Santosh santosh2...@gmail.com wrote: Thanks so much.. I will try it out. Regards, Santosh On Wed, Apr 24, 2013 at 1:39 PM, Duncan Murdoch murdoch.dun...@gmail.comwrote: On 13-04-24 4:29 PM, Santosh wrote: Dear Duncan, Thanks for your email. For some reason svn does not seem to work on my machine.. not sure if it is due to firewall or due to my company's IS environment settings. I could, however, install the package after pointing the repos to r-forge site as you had suggested. But, yes, the versions installed were 0.7.47 and 0.7.51 and therefore is not the latest one. Could you, please, be able to send it to me as a zipped attachment that I can use in both 32-bit and 64-bit Windows? You can download it from http://www.stats.uwo.ca/**faculty/murdoch/temp/tables_0.**7.57.tar.gzhttp://www.stats.uwo.ca/faculty/murdoch/temp/tables_0.7.57.tar.gz for the source, or http://www.stats.uwo.ca/**faculty/murdoch/temp/tables_0.**7.57.ziphttp://www.stats.uwo.ca/faculty/murdoch/temp/tables_0.7.57.zip for the binary install. I think the latter should work in both 32 and 64 bit Windows, but I haven't tested it. Duncan Murdoch Thanks so much, Santosh On Wed, Apr 24, 2013 at 1:02 PM, Duncan Murdoch murdoch.dun...@gmail.com**wrote: On 13-04-24 3:23 PM, Santosh wrote: Dear Rxperts, Sorry if I am posting a really really dumb request.. I am new to subversion and am trying to use subversion to download the tables package as suggested by Duncan. I installed subversion client(from collabnet) and tried to access tables package using the command below. svn checkout svn://scm.r-forge.r-project.org/svnroot/tables/ http://**scm.r-forge.r-project.org/**svnroot/tables/http://scm.r-forge.r-project.org/svnroot/tables/ I get the following error message: C:\Users\santosh\tempsvn checkout svn:// scm.r-forge.r-project.org/svnroot/tables/http://scm.r-forge.r-project.org/**svnroot/tables/ http://scm.r-**forge.r-project.org/svnroot/**tables/http://scm.r-forge.r-project.org/svnroot/tables/ svn: E730060: Unable to connect to a repository at URL 'svn://scm.r-forge.r-proj ect.org/svnroot/tables' svn: E730060: Can't connect to host 'scm.r-forge.r-project.org': A connection at tempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. Is there anything additional I need to do with Subversion or with the commands? The spacing looks funny there: You should have no blanks in the path. Maybe you didn't, it's only the email. In that case, R-forge is probably just responding very slowly. You could try this path instead: svn checkout svn://scm.r-forge.r-project. org/svnroot/tables/pkg/tables**http://scm.r-forge.r-project.** org/svnroot/tables/pkg/tableshttp://scm.r-forge.r-project.org/svnroot/tables/pkg/tables This is the subdirectory that contains everything in the package. And you should be able to install the package directly from R-forge by setting your repository to http://R-forge.r-project.org, but it is very slow on updates, so it hasn't got to the current version yet. Duncan Murdoch Regards, Santosh On Tue, Apr 23, 2013 at 5:13 AM, Duncan Murdoch murdoch.dun...@gmail.com **wrote: On 13-04-23 6:31 AM, Duncan Murdoch wrote: On 13-04-22 10:40 PM, David Winsemius wrote: On Apr 22, 2013, at 5:49 PM, Santosh wrote: Dear Rxperts, q - data.frame(p=rep(c(A,B),**each=10,len=30), a=rep(c(1,2,3),each=10),id=**seq(30), b=round(runif(30,10,20)), c=round(runif(30,40,70))) The operation below... tabular(((p=factor(p))*(a=**factor(a))+1) ~ (N = 1) + (b + c)* (mean+sd),data=q) yields some rows of NAs and NaN as shown below b c p a N mean sdmean sd A 1 10 16.30 2.497 52.30 9.358 20 NaNNA NaN NA 3 10 15.60 2.716 60.30 8.001 B 10 NaNNA NaN NA 2 10 15.40 2.366 57.70 10.414 30 NaNNA NaN NA All 30 15.77 2.473 56.77 9.601 How do I remove the rows having N=0 ? I would like the resulting table look like.. b c p a N mean sdmean sd A 1 10 16.30 2.497 52.30 9.358 3 10 15.60 2.716 60.30 8.001 B 2 10 15.40 2.366 57.70 10.414 All 30 15.77 2.473 56.77 9.601 Here's a bit of a hack: tabular( (`p a`=interaction(p,a, drop=TRUE, sep= )) ~ (N = 1) + (b + c)* (mean+sd),data=q) b c p a N mean sd mean sd A 1 10 12.8 0.7888 52.1 8.020 B 2 10 16.3 3.0569 54.9 8.711 A 3 10 14.6 3.7771 56.5 6.980 I
Re: [R] Trouble Computing Type III SS in a Cox Regression
I should hope that there is trouble, since type III is an undefined concept for a Cox model. Since SAS Inc fostered the cult of type III they have recently added it as an option for phreg, but I am not able to find any hints in the phreg documentation of what exactly they are doing when you invoke it. If you can unearth this information, then I will be happy to tell you whether a. using the test (whatever it is) makes any sense at all for your data set b. if a is true, how to get it out of R I use the word cult on purpose -- an entire generation of users who believe in the efficacy of this incantation without having any idea what it actually does. In many particular instances the SAS type III corresponds to a survey sampling question, i.e., reweight the data so that it is balanced wrt factor A and then test factor B in the new sample. The three biggest problems with type III are that 1: the particular test has been hyped as better when in fact it sometimes is sensible and sometimes not, 2: SAS implemented it as a computational algorithm which unfortunately often works even when the underlying rationale does not hold and 3: they explain it using a notation that completely obscures the actual question. This last leads to the nonsense phrase test for main effects in the presence of interactions. There is a survey reweighted approach for Cox models, very closely related to the work on causal inference (marginal structural models), but I'd bet dollars to donuts that this is not what SAS is doing. (Per 2 -- type III was a particular order of operations of the sweep algorithm for linear models, and for backwards compatability that remains the core definition even as computational algorthims have left sweep behind. But Cox models can't be computed using the sweep algorithm). Terry Therneau On 04/24/2013 12:41 PM, r-help-requ...@r-project.org wrote: Hello All, Am having some trouble computing Type III SS in a Cox Regression using either drop1 or Anova from the car package. Am hoping that people will take a look to see if they can tell what's going on. Here is my R code: cox3grp- subset(survData, Treatment %in% c(DC, DA, DO), c(PTNO, Treatment, PFS_CENSORED, PFS_MONTHS, AGE, PS2)) cox3grp- droplevels(cox3grp) str(cox3grp) coxCV- coxph(Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data=cox3grp, method = efron) coxCV drop1(coxCV, test=Chisq) require(car) Anova(coxCV, type=III) And here are my results: cox3grp- subset(survData, + Treatment %in% c(DC, DA, DO), + c(PTNO, Treatment, PFS_CENSORED, PFS_MONTHS, AGE, PS2)) cox3grp- droplevels(cox3grp) str(cox3grp) 'data.frame': 227 obs. of 6 variables: $ PTNO: int 1195997 104625 106646 1277507 220506 525343 789119 817160 824224 82632 ... $ Treatment : Factor w/ 3 levels DC,DA,DO: 1 1 1 1 1 1 1 1 1 1 ... $ PFS_CENSORED: int 1 1 1 0 1 1 1 1 0 1 ... $ PFS_MONTHS : num 1.12 8.16 6.08 1.35 9.54 ... $ AGE : num 72 71 80 65 72 60 63 61 71 70 ... $ PS2 : Ord.factor w/ 2 levels YesNo: 2 2 2 2 2 2 2 2 2 2 ... coxCV- coxph(Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data=cox3grp, method = efron) coxCV Call: coxph(formula = Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data = cox3grp, method = efron) coef exp(coef) se(coef) z p AGE0.00492 1.005 0.00789 0.624 0.530 PS2.L -0.34523 0.708 0.14315 -2.412 0.016 Likelihood ratio test=5.66 on 2 df, p=0.0591 n= 227, number of events= 198 drop1(coxCV, test=Chisq) Single term deletions Model: Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2 DfAICLRT Pr(Chi) none 1755.2 AGE 1 1753.6 0.3915 0.53151 PS2 1 1758.4 5.2364 0.02212 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 require(car) Anova(coxCV, type=III) Analysis of Deviance Table (Type III tests) LR Chisq Df Pr(Chisq) AGE 0.3915 10.53151 PS2 5.2364 10.02212 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Both drop1 and Anova give me a different p-value than I get from coxph for both my two-level ps2 variable and for age. This is not what I would expect based on experience using SAS to conduct similar analyses. Indeed SAS consistently produces the same p-values. Namely the ones I get from coxph. My sense is that I'm probably misusing R in some way but I'm not sure what I'm likely to be doing wrong. SAS prodcues Wald Chi-Square results for its type III tests. Maybe that has something to do with it. Ideally, I'd like to get type III values that match those from coxph. If anyone could help me understand better, that would be greatly appreciated. Thanks, Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide
[R] Floating point precision causing undesireable behaviour when printing as.POSIXlt times with microseconds?
Dear list, When using as.POSIXlt with times measured down to microseconds the default format.POSIXlt seems to cause some possibly undesirable behaviour: According to the code in format.POSIXlt the maximum accuracy of printing fractional seconds is 1 microsecond, but if I do; options( digits.secs = 6 ) as.POSIXlt( 1.02 , tz=, origin=1970-01-01) as.POSIXlt( 1.98 , tz=, origin=1970-01-01) as.POSIXlt( 1.99 , tz=, origin=1970-01-01) I return respectively: [1] 1970-01-01 01:00:01.02 BST [1] 1970-01-01 01:00:01.98 BST [1] 1970-01-01 01:00:01 BST If options( digits.secs = 6 ) should I not expect to be able to print 1.99 seconds? This seems to be caused by the following code fragment in format.POSIXlt: np - getOption(digits.secs) if (is.null(np)) np - 0L else np - min(6L, np) if (np = 1L) for (i in seq_len(np) - 1L) if (all(abs(secs - round(secs, i)) 1e-06)) { np - i break } Specifically for (i in seq_len(np) - 1L) if (all(abs(secs - round(secs, i)) 1e-06)) Which in the case of 1.99 seconds will give: options( scipen = 10 ) np - 6 sapply( seq_len(np) - 1L , function(x) abs(1.99 - round(1.99, x)) ) [,1] [,2] [,3] [,4] [,5] [,6] [1,] 0.01 0.01 0.01 0.01 0.01 0.01 The logical test all( ... 1e-06) should evaluate to FALSE but due to floating point precision it evaluates TRUE: sprintf( %.20f , abs(1. 99 - round(1. 99,5))) [1] 0.00991773 If instead of: for (i in seq_len(np) - 1L) if (all(abs(secs - round(secs, i)) 1e-06)) in format.POSIXlt we had a comparison value that was half the minimum increment: for (i in seq_len(np) - 1L) if (all(abs(secs - round(secs, i)) 5e-07)) This behaviour disappears: mod.format.POSIXlt( as.POSIXlt( 1.99 , tz=, origin=1970-01-01) ) [1] 1970-01-01 01:00:01.99 But I am unsure if the original behaviour is what I should expect given the documentation (I have read it and I can't see a reason to expect 1.99 to round down to 1). And also if changing the formatting function would have other undesirable consequences? My sessionInfo(): R version 3.0.0 (2013-04-03) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 [2] LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base Thank you, Simon Simon O'Hanlon Postgraduate Researcher Helminth Ecology Research Group Department of Infectious Disease Epidemiology Imperial College London St. Mary's Hospital, Norfolk Place, London, W2 1PG, UK Office: +44 (0) 20 759 43229 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum up column values according to row id
HI Not sure whether this helps: ipso- read.table(text= id dbh h g 1 FPE0164 36 13.62 0.10178760 2 FPE0164 31 12.70 0.07547676 21 FPE1127 57 18.85 0.25517586 13 FPE1127 39 15.54 0.11945906 12 FPE1127 34 14.78 0.09079203 6 FPE1127 32 15.12 0.08042477 5 FPE1127 28 14.13 0.06157522 15 FPE1127 27 13.50 0.05725553 19 FPE1127 25 13.28 0.04908739 11 FPE1127 19 11.54 0.02835287 ,sep=,header=TRUE,stringsAsFactors=FALSE) library(plyr) ddply(ipso,.(id),summarize,g=mean(head(g,6))) # id g #1 FPE0164 0.08863218 #2 FPE1127 0.11078041 aggregate(g~id,ipso,function(x) mean(head(x,6))) # id g #1 FPE0164 0.08863218 #2 FPE1127 0.11078041 Dear All, here a problem I think many of you can solve in few minutes. I have a dataframe which contains values of plot id, diameters, heigths and basal area of trees, thus columns names are: id | dbh | h | g head(ipso, n=10) id dbh h g 1 FPE0164 36 13.62 0.10178760 2 FPE0164 31 12.70 0.07547676 21 FPE1127 57 18.85 0.25517586 13 FPE1127 39 15.54 0.11945906 12 FPE1127 34 14.78 0.09079203 6 FPE1127 32 15.12 0.08042477 5 FPE1127 28 14.13 0.06157522 15 FPE1127 27 13.50 0.05725553 19 FPE1127 25 13.28 0.04908739 11 FPE1127 19 11.54 0.02835287 from here I need to calculate the mean of the six greater g_ith for each id_ith. The clauses are that: if length(id) =6 do the mean of the first six greaters g else do the mean of all the g_ith in the id_ith (in head print above e.g. for the id==FPE0164 do the mean of just these two values of g). The g are already ordered by id ascending and g descending using: ipso - ipso[with(ipso, order(ipso$id, -ipso$g)), ] # Order for id ascending and g descending I tried a lot of for loops and tapply() without results. Can anyone help me to solve this? Thanks for your attention Best whishes Matteo __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting started in parallel computing on a windows OS
On 04/24/2013 02:50 PM, Benjamin Caldwell wrote: Dear R help, I've what I think is a fairly simple parallel problem, and am getting bogged down in documentation and packages for much more complex situations. I have a big matrix (30^5,5]. I have a function that will act on each row of that matrix sequentially and output the 'best' result from the whole matrix (it compares the result from each row to the last and keeps the 'better' result). I would like to divide that first large matrix into chunks equal to the number of cores I have available to me, and work through each chunk, then output the results from each chunk. I'm really having trouble making head or tail of how to do this on a windows machine - lots of different false starts on several different packages now. Basically, I have the function, and I can of course easily divide the matrix into chunks. I just need a way to process each chunk in parallel (other than opening new R sessions for each core manually). Any help much appreciated - after two days of trying to get this to work I'm pretty burnt out. Hi Ben -- in your code from this morning you had a function fitting - function(ndx.grd=two,dt.grd=one,ind.vr='ind',rsp.vr='res') { ## ... setup for(i in 1:length(ndx.grd[,1])){ ## ... do work } ## ... collate results } that you're trying to run in parallel. Obviously the ## ... represent lines I've removed. When you say something like y - foreach(icount(length(two))) %dopar% fitting() its saying that you want to run fitting() length(two) times. So you're actually doing the same thing length(two) times, whereas you really want to divide the work thats inside fitting() into chunks, and do those on separate cores! Conceptually what you'd like to do is fit_one - function(idx, ndx.grd, dt.grd, ind.vr, rsp.vr) { ## ... do work on row idx _ONLY_ } and then evaluate with ## ... setup y - foreach (idx = icount(nrow(two)) %dopar% one_fit(idx, two, one, ind, res) ## ... collate so that fit_one fits just one of your combinations. foreach will worry about distributing the work. Make sure that fit_one works first, before trying to run this in parallel; your use of try(), trying to fit different data types (character, integer, numeric) into a matrix rather than data.frame, and the type coercions all indicate that you're fighting with R rather than working with it. Hope that helps, Martin Thanks *Ben Caldwell* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question
On Tue, Apr 23, 2013 at 2:54 PM, nafiseh hagiaghamohammadi n_hajiaghamohammadi2...@yahoo.com wrote: Hi I fit one linear quantile regression with package quantreg and I want to khow this model is good or not.Is there method for checking it? Thanks your advice I ask this question because there is 2 models,f0 and f1 in (R1 - 1 - f1$rho/f0$rho ), is it true? but I fit 1 model and I want to check goodness of fit for 1 model . Please keep your responses on list so you can get a quick reply even when I'm otherwise busy. I think you could -- for a rough and ready comparison -- compare against a constant (empirical quantile) model (not unlike how basic OLS models compare against the constant mean predictor) but someone else might know if there's any subtleties about quantile regression that should be noted here. MW __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fonts not rendering during plot
On Apr 24, 2013, at 7:29 AM, George Dietz wrote: Hi, I am having a problem where sometimes fonts for text in plots don't get rendered-the text just shows up as boxes. If I call R from a certain directory the problem goes away, otherwise the fonts don't render (but plots get made). In both cases, my PANGO_RC and FONTCONFIG_FILE do not change… Offering the OS details is more courteous than omitting them. (And there are several other courtesies expected of you listed in the Fine Posting Guide.) If this happens to be on a Mac, it may mean your screen fonts are corrupt. You can see their names with: names(quartzFonts()). Using Fontbook.app will sometimes reveal that there are two copies of one of the screen graphics fonts and that one of them displays as blanks. These should be deleted with Fontbook.app. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about the loglm and glm---Re:Re: Regression on stratified count data
Understand. Many thanks for your detailed explaination. At 2013-04-24 22:22:55,Achim Zeileis achim.zeil...@uibk.ac.at wrote: On Wed, 24 Apr 2013, meng wrote: Hi,Achim: Can all the analysis you mentioned via loglm be performed via glm(...,family=poisson)? Yes. ## transform table back to data.frame df - as.data.frame(tab) ## fit models: conditional independence, no-three way interaction, ## and saturated g1 - glm(Freq ~ age/(drug + case), data = df, family = poisson) g2 - glm(Freq ~ (age + drug + case)^2, data = df, family = poisson) g3 - glm(Freq ~ age * drug * case, data = df, family = poisson) ## likelihood-ratio tests (against saturated) anova(g1, g3, test = Chisq) anova(g2, g3, test = Chisq) ## compare fitted frequencies (which are essentially equal) all.equal(as.numeric(fitted(g1)), as.data.frame(as.table(fitted(m1)))$Freq) all.equal(as.numeric(fitted(g2)), as.data.frame(as.table(fitted(m2)))$Freq) The difference is mainly that loglm() has a specialized user interface and that it uses a different optimizer (iterative proportional fitting rather than iterative reweighted least squares). Best, Z Many thanks. At 2013-04-24 19:37:10,Achim Zeileis achim.zeil...@uibk.ac.at wrote: On Wed, 24 Apr 2013, meng wrote: Hi all: For stratified count data,how to perform regression analysis? My data: age case oc count 1 1 121 1 1 226 1 2 117 1 2 259 2 1 118 2 1 288 2 2 1 7 2 2 2 95 age: 1:40y 2:40y case: 1:patient 2:health oc: 1:use drug 2:not use drug My purpose: Anaysis whether case and oc are correlated, and age is a stratified varia ble. My solution: 1,Mantel-Haenszel test by using function mantelhaen.test 2,loglinear regression by using function glm(count~case*oc,family=poisson ).But I don't know how to handle variable age,which is the stratified vari able. Instead of using glm(family = poisson) it is also convenient to use loglm() from package MASS for the associated convenience table. The code below shows how to set up the contingency table, visualize it using package vcd, and then fit two models using loglm. The models considered are conditional independence of case and drug given age and the no three-way interaction already suggested by Peter. Both models are also accompanied by visualizations of the residuals. ## contingency table with nice labels tab - expand.grid(drug = 1:2, case = 1:2, age = 1:2) tab$count - c(21, 26, 17, 59, 18, 88, 7, 95) tab$age - factor(tab$age, levels = 1:2, labels = c(40, 40)) tab$case - factor(tab$case, levels = 1:2, labels = c(patient, healthy)) tab$drug - factor(tab$drug, levels = 1:2, labels = c(yes, no)) tab - xtabs(count ~ age + drug + case, data = tab) ## visualize case explained by drug given age library(vcd) mosaic(case ~ drug | age, data = tab, split_vertical = c(TRUE, TRUE, FALSE)) ## test wheter drug and case are independent given age m1 - loglm(~ age/(drug + case), data = tab) m1 ## visualize corresponding residuals from independence model mosaic(case ~ drug | age, data = tab, split_vertical = c(TRUE, TRUE, FALSE), residuals_type = deviance, gp = shading_hcl, gp_args = list(interpolate = 1.2)) mosaic(case ~ drug | age, data = tab, split_vertical = c(TRUE, TRUE, FALSE), residuals_type = pearson, gp = shading_hcl, gp_args = list(interpolate = 1.2)) ## test whether there is no three-way interaction ## (i.e., dependence of case on drug is the same for both age groups) m2 - loglm(~ (age + drug + case)^2, data = tab) m2 ## visualization (with default pearson residuals) mosaic(case ~ drug | age, data = tab, expected = ~ (age + drug + case)^2, split_vertical = c(TRUE, TRUE, FALSE), gp = shading_hcl, gp_args = list(interpolate = 1.2)) Many thanks for your help. My best. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.h tml and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help me make faster R code for Kennard-Stone algorithm [My code is so slow from Matlab]
Hi all, Can you help me change my Kennard-Stone algorithm to faster one? [The original code can run fast in matlab, but when I change matlab code to R code, it is so slow.] Since my code so crude and too many loops (changed from matlab code), it is too slow. I hope that you can help to improve the performance. Thanks. kevin # ## # Kennard-Stone algorithm for selection of samples ksFun = function(x, N) { # Initial the vector of minimum distance dminmax = sample(0, N, replace = TRUE) # Default: FALSE M = nrow(x) samples = 1:M # Initializes the matrix of distances D = matrix(0, nrow = M, ncol = M) for (i in 1:(M - 1)) { xa = x[i, ] for (j in (i + 1):M) { xb = x[j, ] # D: Upper Trianglar Matrix # D[i, j] = Euclidean distance between object i and j (j i) D[i, j] = max(svd(xa - xb)$d) # the largest singular value } } cat(for (i in 1:(M - 1)) done! \n) maxD = apply(D, 2, max) index_row = apply(D, 2, function(x) which(x == max(x))[1]) dummy = max(maxD) index_column = which(maxD == dummy) m = vector() m[1] = index_row[index_column] m[2] = index_column dminmax[2] = D[m[1], m[2]] for (i in 3:N) { pool = setdiff(samples, m) dmin = sample(0, M-i+1, replace = TRUE) for (j in 1:(M-i+1)) { indexa = pool[j] d = sample(0, i-i, replace = TRUE) for ( k in 1:(i-1)) { indexb = m[k] if (indexa indexb) { d[k] = D[indexa, indexb] } else { d[k] = D[indexb, indexa] } } dmin[j] = min(d) } #cat(for (j in 1:(M -i+1)) done! \n) dminmax[i] = max(dmin) index = which(dmin == max(dmin)) m[i] = pool[index] } cat(for (i in 3:N) done! \n) res = list(m = m, dminmax = dminmax) return(res) } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question
good usually means good relative to something else, in this case the comparison seems, as Michael has already said, f0 - rq(y ~ 1, tau = ?) and then one can compute the R1 version that I originally suggested. But since there is still no explicit way to evaluate this, it is all a bit pointless. Roger Koenker rkoen...@illinois.edu On Apr 24, 2013, at 6:37 PM, R. Michael Weylandt wrote: On Tue, Apr 23, 2013 at 2:54 PM, nafiseh hagiaghamohammadi n_hajiaghamohammadi2...@yahoo.com wrote: Hi I fit one linear quantile regression with package quantreg and I want to khow this model is good or not.Is there method for checking it? Thanks your advice I ask this question because there is 2 models,f0 and f1 in (R1 - 1 - f1$rho/f0$rho ), is it true? but I fit 1 model and I want to check goodness of fit for 1 model . Please keep your responses on list so you can get a quick reply even when I'm otherwise busy. I think you could -- for a rough and ready comparison -- compare against a constant (empirical quantile) model (not unlike how basic OLS models compare against the constant mean predictor) but someone else might know if there's any subtleties about quantile regression that should be noted here. MW __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Stochastic Frontier: Finding the optimal scale/scale efficiency by frontier package
Hi, I am trying to find out the scale efficiency and optimal scale of banks by stochastic frontier analysis given the panel data of bank. I am free to choose any model of stochastic frontier analysis. The only approach I know to work with R is to estimate a translog production function by sfa or other related function in frontier package, and then use the Ray 1998 formula to find the scale efficiency. However, as the textbook Coelli et al 2005 point out that the concavity may not be satisfied, one needs to impose the nonpositive definiteness condition so that the scale efficiency 1. How can I do it with frontier package? Is there any other SFA model/function in R recommended to find out the scale efficiency and optimal scale? Thanks, Miao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the joy of spreadsheets (off-topic)
On 4/17/2013 5:18 AM, Kevin Wright wrote: On Tue, Apr 16, 2013 at 4:33 PM, Jim Lemon j...@bitwrit.com.au wrote: On 04/17/2013 03:25 AM, Sarah Goslee wrote: The final point does relate to Excel and any application that hides what is going on to the casual observer. I will treasure this URL to give to anyone who chastises my moaning when I have to perform some task in Excel. It is not an error in the application (although these certainly exist) but a salutory caution to those who think that if a reasonable looking number appears in a cell, it must be the correct answer. I have found not one, but two such errors in the simple calculation of a birthday age from the date of birth and date of death. Jim So there (maybe) was a bug in Excel. Maybe hidden from the casual observer. And since Excel is not R, and we are R snobs, Excel is evil, right? But, wait. Is it easier for a casual observer to detect a flaw in the formula in Excel, or to find an incorrect array index in an R script? If the person knows R, or can fake it, I think it is easier. You have to hunt around an Excel spreadsheet to see what the formulae are, and the cell references usually have no inherent meaning. Further, one of the errors they made, not including all the data in a range, is very easy to make in excel but would be very hard to make in R. As others have noted, the problem was not a bug in Excel the program (unless you consider the design a bug) but a bug induced by the use of Excel. I doubt the exclusion of the range was deliberate, although the other errors seem to have been. However, it is likely that if the result had not been to their liking the original authors would have rechecked their work and discovered the problem. One of the errors, equal weighting of countries regardless of how many years they spent in a given state, is arguably a judgement call. Selective exclusion and inclusion of data is also a judgement call, but that strikes me as less defensible. Someone wrote that the overall finding of a negative relation between debt and growth is intact. First of all, the headline summary was that if debt/GDP 90% you fall off a cliff. That is not intact; it is false. The remaining relation is quite weak. And the substantive conclusion that high debt *causes* weaker growth is a complete reading into a correlational finding. It is pretty hard to sort out causal ordering, but some evidence suggests it is more the reverse: http://krugman.blogs.nytimes.com/2013/04/18/correlation-causality-and-casuistry/. See Krugman and Delongs blogs generally for gleeful commentary, or the original critique in http://www.peri.umass.edu/236/hash/31e2ff374b6377b2ddec04deaa6388b1/publication/566/. At any rate, a policy-relevant conclusion would need to be based on a much more careful analysis than was done, careful not only in the mechanics but in using methods that at least attempted to sort out the causal relations. The irony is that the substantively most trivial mistake is also the most clearly an error, while the more important issues are at least a little less clear-cut. Ross __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to make a raster image in R
Hi R-User [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] creating raster image in R
Hi R-User I was trying to make a raster map with WGS84 projection in R, but I could not make it. I found one data set in Google that data is almost the same format as of mine. I wanted to make a raster map of temperature with 1 degree spatial resolution for the global scale. I could make it in the GIS software but I do have many variables (to be many raster images) and ultimately I am importing them into R for further analysis. Therefore, I wanted to make them in R, if possible. But, I am wondering how I can create a raster map in R. (I have provided link of data for your references, this is an example of data set). I am really appropriating for your help. #-- I downloaded the following packages #create a raster map from scratch install.packages(raster, dependencies=TRUE) library(raster) # raster data install.packages(rgdal, dependencies=TRUE) library(rgdal) # input/output, projections install.packages(rgeos, dependencies=TRUE) library(rgeos) # geometry ops install.packages(spdep, dependencies=TRUE) library(spdep) # spatial dependence install.packages(pastecs, dependencies=TRUE) library(pastecs) pts-read.table.url(https://www.betydb.org//miscanthusyield.csv;, header=T, sep=,) proj4string(pts)=- CRS(+proj=longlat +datum=WGS84) after that, what I am supposed to do for creating a raster map? any suggestions? #--- Cheers, Kristi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Distance matrices Combinations
HI, Your code is not very clear. mata-m[,c(u)] # assume that m is mat1 or w (as m was not defined) #also assume that 124 as nrow of each matrix # For 10 distance matrices (#10C4) set.seed(25) lst1- lapply(1:10,function(i) dist(matrix(sample(1:50,5*124,replace=TRUE),nrow=124))) names(lst1)- seq_along(lst1) lst2-lapply(seq_len(ncol(combn(names(lst1),4))),function(i) {x-combn(names(lst1),4)[,i];as.matrix((lst1[[x[1]]])^2+(lst1[[x[2]]])^2+(lst1[[x[3]]])^2+(lst1[[x[4]]])^2)}) X- do.call(cbind,lapply(lst2,function(x) {w- sqrt(x);do.call(rbind,lapply(seq_len(nrow(w)),function(i) {r-matrix(sort(x[i,],index.return=TRUE)$ix,ncol=1); u- r[2:6,]; mata-w[,u];b- matrix(apply(mata,1,mean),ncol=1); e-sum(abs(b-w[,i]))}))})) o-mean(apply(X,2,sd)) o #[1] 268.7831 A.K. - Original Message - From: eliza botto eliza_bo...@hotmail.com To: r-help@r-project.org r-help@r-project.org Cc: Sent: Wednesday, April 24, 2013 2:07 PM Subject: [R] Distance matrices Combinations Dear UseRs, MY PROBLEM IS A SMALL PIECE OF A REAL BIG AND A COMPLICATED PROBLEM. IF I DELIBERATE IN A VERY SIMPLE WAY THEN ALL I WANT IS TO PUT ALL THE POSSIBLE COMBINATIONS OF 75 DISTANCE MATRICES (BY TAKING 4 MATRICES, MORE COMMONLY 75C4), in the following equation. t-as.matrix((MAT1)^2+(MAT2)^2+(MAT3)^2+(MAT4)^2+,upper=T,diag=T)) Then 1215450 values of t(one for each combination) should one by one be inculcated in the following loop(to calculate the o value) and in the end want those 10 combinations of distance matrices which have lowest o values. e - vector(list, 124) w-sqrt(t) mat1-w for (i in 1:124){ r-matrix(sort(mat1[i,],index.return=TRUE)$ix,ncol=1) u-r[c(2,3,4,5,6),1] mata-m[,c(u)] ##(shifted) amata-apply(mata,1,mean) amata-data.frame(amata) aavg-as.matrix(amata, ncol=1) b-aavg e[[i]]-print(sum(abs(b-m[,i]))) } x-do.call(rbind,e) Y-x z - apply(Y,2,sd) o-mean(Y) Does my question make any scene? Thanks in advance Elisa [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fonts not rendering during plot
I apologize for not going in to too much detail-I didn't want to confuse people with the gory details of what I'm trying to do (trying to make a portable R installation bundled in to a mac app, among other things). I am on a mac, Mountain Lion. Anyhow, I figured out the problem was with the paths pango looks uses to search for font configuration files. I solved it by writing a script that sets these paths at run-time. George On Apr 24, 2013, at 7:45 PM, David Winsemius dwinsem...@comcast.net wrote: On Apr 24, 2013, at 7:29 AM, George Dietz wrote: Hi, I am having a problem where sometimes fonts for text in plots don't get rendered-the text just shows up as boxes. If I call R from a certain directory the problem goes away, otherwise the fonts don't render (but plots get made). In both cases, my PANGO_RC and FONTCONFIG_FILE do not change… Offering the OS details is more courteous than omitting them. (And there are several other courtesies expected of you listed in the Fine Posting Guide.) If this happens to be on a Mac, it may mean your screen fonts are corrupt. You can see their names with: names(quartzFonts()). Using Fontbook.app will sometimes reveal that there are two copies of one of the screen graphics fonts and that one of them displays as blanks. These should be deleted with Fontbook.app. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Assigning a variable value based on multiple columns
Hi All, I'm hoping someone can help me with a relatively simple problem. Take the following dataset: IDDiabetesESRDHIVContact 100NA0 210NA0 3NA 100 40NA 01 51110 I want to generate a column called TSTcutoff based on the values in the row. TSTcutoff would be the lower of 15 (if Diabetes=ESRD=HIV=Contact=0), 10 (if Diabetes or ESRD=1 AND HIV=Contact=0), or 5 (if HIV OR Contact=1). I was thinking this could be done with a series of IFELSE statements, but the NA values make this more challenging. I want to ignore NA values when calculating TSTcutoff. So the final dataset should look like this: IDDiabetesESRDHIVContact TSTcutoff 100NA015 210NA0 10 3NA 10010 40NA 015 511105 Thanks for any suggestions. Jason Stout, MD, MHS Box 102359-DUMC Durham, NC 27710 FAX 919-681-7494 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Loop for main title in a plot
Hi all, I have a problem in including my plot in a loop. Here is a simple example for one plot: # Plot simple graph with super- and subscript a-c(1,2,3,4) b-c(1,2,3,4) plot(x=a,y=b, ylab=expression(paste(Apple[P])), xlab=expression(paste(Banana^th)), main=expression(paste(italic(i-)~4^th~choice))) Now I would like to include the titel (main) as a function of the number of trails for (trial in 1:nTrials) { plot( main=expression(paste(italic(i-)~trial^th~choice))) } e.g. nTrials = 5 The title should look like this: 5th plot: i ^th choice 4th plot: i-1 ^th choice 3th plot: i-2 ^th choice and so on I have problems to create that, could you please help me? Thank you!! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fonts not rendering during plot
On Apr 24, 2013, at 8:42 PM, George Dietz wrote: I apologize for not going in to too much detail-I didn't want to confuse people with the gory details of what I'm trying to do You've slighted us. Kept us from our due. We LOVE gory OS and code details. Chop that chicken up. Throw dem bones. Without the complete set of entrails our divinations are incomplete and we are unable to perform our wizardly deeds of daring-do. -- David. (trying to make a portable R installation bundled in to a mac app, among other things). I am on a mac, Mountain Lion. Anyhow, I figured out the problem was with the paths pango looks uses to search for font configuration files. I solved it by writing a script that sets these paths at run-time. George On Apr 24, 2013, at 7:45 PM, David Winsemius dwinsem...@comcast.net wrote: On Apr 24, 2013, at 7:29 AM, George Dietz wrote: Hi, I am having a problem where sometimes fonts for text in plots don't get rendered-the text just shows up as boxes. If I call R from a certain directory the problem goes away, otherwise the fonts don't render (but plots get made). In both cases, my PANGO_RC and FONTCONFIG_FILE do not change… Offering the OS details is more courteous than omitting them. (And there are several other courtesies expected of you listed in the Fine Posting Guide.) If this happens to be on a Mac, it may mean your screen fonts are corrupt. You can see their names with: names(quartzFonts()). Using Fontbook.app will sometimes reveal that there are two copies of one of the screen graphics fonts and that one of them displays as blanks. These should be deleted with Fontbook.app. -- David Winsemius Alameda, CA, USA David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loop for main title in a plot
On Apr 24, 2013, at 9:22 PM, Eva Günther wrote: Hi all, I have a problem in including my plot in a loop. Here is a simple example for one plot: # Plot simple graph with super- and subscript a-c(1,2,3,4) b-c(1,2,3,4) plot(x=a,y=b, ylab=expression(paste(Apple[P])), xlab=expression(paste(Banana^th)), main=expression(paste(italic(i-)~4^th~choice))) Now I would like to include the titel (main) as a function of the number of trails for (trial in 1:nTrials) { plot( main=expression(paste(italic(i-)~trial^th~choice))) } I have not quite figured out what the exact goal is but this shows how to loop with expression that have evaluated component: bquote is your friend: for (trial in 1:length(a)) { plot(x=a,y=b, # you were at the very least missing an x and y argument main=bquote( italic(i)-.(trial)^th~choice)) } ... and you had too many quoted elements. Plotmath `paste` is usually not needed. Learn to use * and ~. David. e.g. nTrials = 5 The title should look like this: 5th plot: i ^th choice 4th plot: i-1 ^th choice 3th plot: i-2 ^th choice and so on I have problems to create that, could you please help me? Thank you!! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Linear Interpolation : Missing rates
Dear R forum I have data.frame as df = data.frame(rate_name = c(USD_1w, USD_1w, USD_1w, USD_1w, USD_1m, USD_1m, USD_1m, USD_1m, USD_2m, USD_2m, USD_2m, USD_2m, GBP_1w, GBP_1w, GBP_1w, GBP_1w, GBP_1m, GBP_1m, GBP_1m, GBP_1m, GBP_2m, GBP_2m, GBP_2m, GBP_2m, EURO_1w, EURO_1w, EURO_1w, EURO_1w, EURO_2w, EURO_2w, EURO_2w, EURO_2w, EURO_2m, EURO_2m, EURO_2m, EURO_2m), rates = c(2.05, 2.07, 2.06, 2.06, 2.22, 2.24, 2.23, 2.23, 2.31, 2.33, 2.33, 2.31, 1.06, 1.08, 1.08, 1.08, 1.21, 1.21, 1.23, 1.21, 1.41, 1.39, 1.39, 1.37, 1.82, 1.82, 1.81, 1.80, 1.98, 1.98, 1.97, 1.97, 2.1, 2.09, 2.09, 2.11)) currency = c(EURO, GBP, USD) tenor = c(1w, 2w, 1m, 2m, 3m) # _ df rate_name rates rate_name rates 1 USD_1w 2.05 2 USD_1w 2.07 3 USD_1w 2.06 4 USD_1w 2.06 5 USD_1m 2.22 6 USD_1m 2.24 7 USD_1m 2.23 8 USD_1m 2.23 9 USD_2m 2.31 10 USD_2m 2.33 11 USD_2m 2.33 12 USD_2m 2.31 13 GBP_1w 1.06 14 GBP_1w 1.08 15 GBP_1w 1.08 16 GBP_1w 1.08 17 GBP_1m 1.21 18 GBP_1m 1.21 19 GBP_1m 1.23 20 GBP_1m 1.21 21 GBP_2m 1.41 22 GBP_2m 1.39 23 GBP_2m 1.39 24 GBP_2m 1.37 25 EURO_1w 1.82 26 EURO_1w 1.82 27 EURO_1w 1.81 28 EURO_1w 1.80 29 EURO_2w 1.98 30 EURO_2w 1.98 31 EURO_2w 1.97 32 EURO_2w 1.97 33 EURO_2m 2.10 34 EURO_2m 2.09 35 EURO_2m 2.09 36 EURO_2m 2.11 As can be seen that USD_2w, GBP_2w and EURO_1m are missing and I need to INTERPOLATE these rates, which can be done using approx or approxfun. In reality I can have many currencies with many tenors. Problem is when the data.frame df is read or accessed in R, I am not aware which tenor is missing. For a given currency, it is possible that mare than 1 consecutive tenors may be missing e.g. in case of EURO, I may have EURO_1w, EURO_2w and then EURO_4m. So EURO_1m, EURO_2m and EURO_3m are missing. I understand it's sort of vague question from me and do apologize for the same. Any suggestion please. Regards Katherine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.