[R] p-Value
Hi Sir When we use Kendall Package to obtain Kendall's Tau statistic. Then we also get two-sided p value. What does two-sided p-value mean? The word two-sided is confusing to understand. Kindly provide help in this regard. -- AMINA SHAHZADI Department of Statistics GC University Lahore, Pakistan. Email: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (coxph, se) Obtaining standard errors of coefficients from coxph to store
David, It would be helpful to give an example of what you would like to extract. I guess you know how to extract elements from vectors and lists. However, sometimes the objects returned by functions can be rather complex (output of coxph() is...) A general method to capture printed output is via capture.output(). Maybe not fast, but if you have no other solution... Joris a - rnorm(10,1,1) b - rnorm(10,1,1) mod - lm(a~b) smod - summary(mod) smod Call: lm(formula = a ~ b) Residuals: Min 1Q Median 3Q Max -1.7482 -0.5991 0.1211 0.8341 1.4975 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 1.6210 0.5332 3.040 0.0161 * b-0.7667 0.5037 -1.522 0.1664 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.142 on 8 degrees of freedom Multiple R-Squared: 0.2246, Adjusted R-squared: 0.1277 F-statistic: 2.317 on 1 and 8 DF, p-value: 0.1664 output - capture.output(print(smod)) output [1] [2] Call: [3] lm(formula = a ~ b) [4] [5] Residuals: [6] Min 1Q Median 3Q Max [7] -1.7482 -0.5991 0.1211 0.8341 1.4975 [8] [9] Coefficients: [10] Estimate Std. Error t value Pr(|t|) [11] (Intercept) 1.6210 0.5332 3.040 0.0161 * [12] b-0.7667 0.5037 -1.522 0.1664 [13] --- [14] Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 [15] [16] Residual standard error: 1.142 on 8 degrees of freedom [17] Multiple R-Squared: 0.2246,\tAdjusted R-squared: 0.1277 [18] F-statistic: 2.317 on 1 and 8 DF, p-value: 0.1664 [19] David Lloyd [EMAIL PROTECTED] lloyd.com To Sent by: r-help@stat.math.ethz.ch [EMAIL PROTECTED] cc at.math.ethz.ch Subject [R] (coxph, se) Obtaining standard 16/08/2007 11:31 errors of coefficients from coxph to store Hi all, I'm wanting to be able to find and store the z-score of coxph below: - modz=coxph(Surv(TSURV,STATUS)~RAGE+DAGE+REG_WTIME_M+CLD_ISCH+POLY_VS, data=kidneyT,method=breslow) I know summary(modz) will give me this, but how do i extract the standard error or z-score values in a similar way to obtaining the coefficients by coef(modz) ? I think it must be something to do with modz$var but I'm having a complete mental blank. I need this info so I can write a function to use within a bootstrap so I can record the number of times (proportion) each variable in the Cox PH model is actually significant over all the bootstrap resamples. Any assistance is greatly appreciated DL Click to find local singles for dating, romance and fun. http://tagline.bidsystem.com/fc/Ioyw36XJJVs581mfqGSywy0Z69Mq8VM03oVytPu 8otqP84CBZmNX2G/ span id=m2wTlpfont face=Arial, Helvetica, sans-serif size=2 style=font-size:13.5 px___BRGet the Free email that has everyone talking at a href=http://www.mail2world.com target=newhttp://www.mail2world.com/abr font color=#99Unlimited Email Storage #150; POP3 #150; Calendar #150; SMS #150; Translator #150; Much More!/font/font/span [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with save or/and if (I think but maybe not ...)
Hi, I recently discovered the R program and I thought it could be useful to me. I have to analyse data saved as .Px file (x between 0 and 8 - .P0 files have 18 lines at the beginning that I have to skip). New files are generated everyday. This is my strategy : In order to analyse the data, I first want to copy the new data in a database in MySQL (which already contains the previous data). So the first task is to compare the list of the files in the directory (object : rfichiers) to the list of the files already saved (object : tfichiers). The list containing the new files is then given by nfichiers-setdiff(rfichiers, tfichiers). It sounds easy ... ... but it doesn't work !!! Up to now, I'm am able to connect to MySQL and, if the file tfichiers.r doesn't exist, I can copy data files to the MySQL database. But if tfichiers.r already exists and there is no new file to save, it ignores the condition if (nfichiers!=0) and save all the files of the directory to the database. Is it a problem with the way I save tfichiers or is it a problem with the condition if (nfichiers!=0) ? Could you please give me some advices to correct my script (written with Tinn-R) ? I thank you in advance for your help. Have a nice week, Ptit Bleu. PS : Ptit Bleu means something like Full Newbye in french. So thanks to be patient :-) PPS : I hope you understand my french english -- # Connexion a la base de donnees database de MySQL library(DBI) library(RMySQL) drv-dbDriver(MySQL) con-dbConnect(drv, username=user, password=password, dbname=database, host=localhost) # Creation des objets contenant la liste des fichiers (rel pour chemin relatif) # - dans le repertoire : objet rfichiers # - deja traites : objet tfichiers # - nouveaux depuis la derniere connexion : objet nfichiers # chemin est le repertoire de stockage des donnees # RWork est le repertoire de travail de R # sep='' pour eviter l'ajout d'un espace apres Mydata/ setwd(D:/RWork) chemin-d:/Mydata/ relrfichiers-dir(chemin, pattern=.P) rfichiers-paste(chemin,relrfichiers, sep='') if (file.exists(tfichiers.r)) { tfichiers-load(tfichiers.r) nfichiers-setdiff(rfichiers,tfichiers) } else { nfichiers-rfichiers } # p0fichiers : fichiers avec l'extension .P0 (fichiers contenant des lignes d'infos à ne pas charger) # pxfichiers : fichiers avec les extensions P1, ..., P8 (sans infos au debut) if (nfichiers!=0) { p0fichiers-nfichiers[grep(.P0, nfichiers)] pxfichiers-setdiff(nfichiers, p0fichiers) # Fusion des colonnes jour et heure pour permettre de tracer des variations en fonction du temps # Chaque fichier contenu dans l'objet p0fichiers est chargé, en supprimant les 18 premieres lignes, # et on met dans l'objet jourheure la fusion de la colonne jour (V1) et de la colonne heure (V2) # L'objet jourheure est recopie dans la premiere colonne de donnees # On supprime ensuite la deuxieme colonne (contenant les heures) qui est maintenant superflue # L'objet donnees est copié dans la base de donnees MySQL Mydata # Remarque : R comprend le format jour/mois/annee - MySQL : annee/mois/jour - stockage en CHAR dans MySQL for (i in 1:length(p0fichiers)) { donnees-read.table(p0fichiers[i], quote=\, sep=;, dec=,, skip=18) jourheure-paste(donnees$V1, donnees$V2, sep= ) donnees[1]-jourheure donnees-donnees[,-2] # assignTable(con, Datatable, donnees, append=TRUE) - Ne marche pas dbWriteTable(con, Datatable, donnees, append=TRUE) rm(donnees, jourheure) } # Idem avec les fichiers d'extension .Px en chargant toutes les lignes (skip=0) # Amelioration possible : creer une fonction avec en argument p0fichiers ou pxfichiers for (i in 1:length(pxfichiers)) { donnees-read.table(pxfichiers[i], quote=\, sep=;, dec=,, skip=0) jourheure-paste(donnees$V1, donnees$V2, sep= ) donnees[1]-jourheure donnees-donnees[,-2] # assignTable(con, Datatable, donnees, append=TRUE) - Ne marche pas dbWriteTable(con, Datatable, donnees, append=TRUE) rm(donnees, jourheure) } } tfichiers-rfichiers save(rfichiers, file=tfichiers.r, ascii=TRUE) rm(list=ls()) # Deconnexion à MySQL dbDisconnect(con) -- View this message in context: http://www.nabble.com/Problem-with-save-or-and-if-%28I-think-but-maybe-not-...%29-tf4333945.html#a12343236 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] I again with shorter message and script
Hi, I realized that my first message and the script were (maybe) too long and difficult to read. So I tested this shorter one : - setwd(D:/RWork) chemin-d:/Mydata/ relrfichiers-dir(chemin, pattern=.P) rfichiers-paste(chemin,relrfichiers, sep='') tfichiers-rfichiers save(tfichiers, file=tfichiers.r, ascii=TRUE) if (file.exists(tfichiers.r)) { tfichiers-load(tfichiers.r) nfichiers-setdiff(rfichiers,tfichiers) } The result is : nfichiers is equal to rfichiers and when I ask tfichiers, I obtain ... tfichiers :-( I read the ?save and saw the warning about the arguments but I have no idea how to solve this problem which must be a basic one (but do not forget that I'm a newbye and that I'm french :-) Thans again for your comments and help, Ptit Bleu. -- View this message in context: http://www.nabble.com/Problem-with-save-or-and-if-%28I-think-but-maybe-not-...%29-tf4333945.html#a12343633 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with save or/and if (I think but maybe not ...)
On Mon, 27 Aug 2007, Ptit_Bleu wrote: Hi, I recently discovered the R program and I thought it could be useful to me. I have to analyse data saved as .Px file (x between 0 and 8 - .P0 files have 18 lines at the beginning that I have to skip). New files are generated everyday. relrfichiers-dir(chemin, pattern=.P) does not do that, though. Better to use dir(chemin, pattern=\\.[0-8]$, full.names=TRUE) or Sys.glob(file.path(chemin, *.P[0-8])) This is my strategy : In order to analyse the data, I first want to copy the new data in a database in MySQL (which already contains the previous data). So the first task is to compare the list of the files in the directory (object : rfichiers) to the list of the files already saved (object : tfichiers). The list containing the new files is then given by nfichiers-setdiff(rfichiers, tfichiers). It sounds easy ... ... but it doesn't work !!! Up to now, I'm am able to connect to MySQL and, if the file tfichiers.r doesn't exist, I can copy data files to the MySQL database. But if tfichiers.r already exists and there is no new file to save, it ignores the condition if (nfichiers!=0) and save all the files of the directory to the database. What did you intend there? It is not a test of no difference, but a test that each element of the difference is not 0, and furthermore if() expects a test of length one, not the length of nfichiers. I suspect you intended to test length(nfichiers) 0. It often helps to print (or use str on) the objects you create. Try this on nfichiers nfichiers!=0 Is it a problem with the way I save tfichiers or is it a problem with the condition if (nfichiers!=0) ? Saving in R save format with extension .r is going to confuse others. Extension .rda is conventional for save format (and I doubt you need an ascii save). Could you please give me some advices to correct my script (written with Tinn-R) ? I thank you in advance for your help. Have a nice week, Ptit Bleu. PS : Ptit Bleu means something like Full Newbye in french. So thanks to be patient :-) PPS : I hope you understand my french english -- # Connexion a la base de donnees database de MySQL library(DBI) library(RMySQL) drv-dbDriver(MySQL) con-dbConnect(drv, username=user, password=password, dbname=database, host=localhost) # Creation des objets contenant la liste des fichiers (rel pour chemin relatif) # - dans le repertoire : objet rfichiers # - deja traites : objet tfichiers # - nouveaux depuis la derniere connexion : objet nfichiers # chemin est le repertoire de stockage des donnees # RWork est le repertoire de travail de R # sep='' pour eviter l'ajout d'un espace apres Mydata/ setwd(D:/RWork) chemin-d:/Mydata/ relrfichiers-dir(chemin, pattern=.P) rfichiers-paste(chemin,relrfichiers, sep='') if (file.exists(tfichiers.r)) { tfichiers-load(tfichiers.r) nfichiers-setdiff(rfichiers,tfichiers) } else { nfichiers-rfichiers } # p0fichiers : fichiers avec l'extension .P0 (fichiers contenant des lignes d'infos à ne pas charger) # pxfichiers : fichiers avec les extensions P1, ..., P8 (sans infos au debut) if (nfichiers!=0) { p0fichiers-nfichiers[grep(.P0, nfichiers)] pxfichiers-setdiff(nfichiers, p0fichiers) # Fusion des colonnes jour et heure pour permettre de tracer des variations en fonction du temps # Chaque fichier contenu dans l'objet p0fichiers est chargé, en supprimant les 18 premieres lignes, # et on met dans l'objet jourheure la fusion de la colonne jour (V1) et de la colonne heure (V2) # L'objet jourheure est recopie dans la premiere colonne de donnees # On supprime ensuite la deuxieme colonne (contenant les heures) qui est maintenant superflue # L'objet donnees est copié dans la base de donnees MySQL Mydata # Remarque : R comprend le format jour/mois/annee - MySQL : annee/mois/jour - stockage en CHAR dans MySQL for (i in 1:length(p0fichiers)) { donnees-read.table(p0fichiers[i], quote=\, sep=;, dec=,, skip=18) jourheure-paste(donnees$V1, donnees$V2, sep= ) donnees[1]-jourheure donnees-donnees[,-2] # assignTable(con, Datatable, donnees, append=TRUE) - Ne marche pas dbWriteTable(con, Datatable, donnees, append=TRUE) rm(donnees, jourheure) } # Idem avec les fichiers d'extension .Px en chargant toutes les lignes (skip=0) # Amelioration possible : creer une fonction avec en argument p0fichiers ou pxfichiers for (i in 1:length(pxfichiers)) { donnees-read.table(pxfichiers[i], quote=\, sep=;, dec=,, skip=0) jourheure-paste(donnees$V1, donnees$V2, sep= ) donnees[1]-jourheure donnees-donnees[,-2] # assignTable(con, Datatable, donnees, append=TRUE) - Ne marche pas dbWriteTable(con, Datatable, donnees, append=TRUE) rm(donnees, jourheure) } } tfichiers-rfichiers save(rfichiers, file=tfichiers.r, ascii=TRUE) rm(list=ls()) # Deconnexion à MySQL dbDisconnect(con) -- Brian D. Ripley,
Re: [R] Calculating diameters of cirkels in a picture.
Hi All, I really like to thank you for the answers, while I was searching for some edge detection and clustering algorithms, Moshe came with a simple but effective solution: use the area to find the diameter! But I tried Moshe's solution, but I couldn't figure out what you mean with morphological closing and the labeling to split the images. Could you please clarify this a bit? Thanks for your support Bart Moshe Olshansky-2 wrote: Hi Bart, One more comment: You do not really need the morphological closing to close the holes inside the circles. Another possibility is to reverse the black-and-withe picture, i.e. make the holes and background be 1 and the circles 0, label the connected components and then only the component which touches the boundaries is the background while all other components are holes and you can make them white (1) in the original black-and-white image. --- Moshe Olshansky [EMAIL PROTECTED] wrote: Hi Bart, I have never used image processing software in R (I was doing this with Matlab), but here is what I would have done algorithmically: 1) convert the picture to gray-scale 2) find a threshold value which separates the circles from the background and convert your image to black and white 3) if the circles are far apart use morphological closing to fill in small holes inside the circles (may be do this several times) 4) use labeling to split the image into connected components 5) for each connected component get it's area (the number of pixels) and use the formula S = Pi*R^2 to find the approximate radii. Regards, Moshe. --- Julian Burgos [EMAIL PROTECTED] wrote: Hi Bart, If you only have 36 circles, the fastest way would be to use some image processing software and measure the circles by hand. One option is to use ImageJ, which you can download here http://rsb.info.nih.gov/ij/ Julian Bart Joosen wrote: Hi, Maybe this is more a programming questions than a specific R-project question, but maybe there is someone who can point me in the right direction. I have a picture of cirkels which I took with a digital camera. Now I want to use the diameter of the cirkels on the picture for analysis in R. I can use pixmap to import the picture, but how do I find the outside cirkels and calculate the diameter? I pointed out that I can use the edci package, but then I need to preprocess the data to reduce the points, otherwise it takes a long time, and my computer crashes. If you want to see such a picture, I cropped a larger one, and highlighted the cirkel which is of interest. In a real world, this is a plate with 36 cirkels, which all should be measured. www.users.skynet.be/fa244930/fotos/outlined.jpg Thanks for your time Bart [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Calculating-diameters-of-cirkels-in-a-picture.-tf4319669.html#a12343143 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with save or/and if (I think but maybe not ...)
Dear Prof Ripley, I thank you for your fast answer. In order to follow your advices : I deleted all the objects and the tfichiers.r already created. I changed all the tfichiers.t of the script into tfichiers.rda Then I launched the script twice. The first time, as tfichiers.rda didnt' exist, it created one. During the script, I got this warning : 1: la condition a une longueur 1 et seul le premier élément est utilisé in: if (nfichiers != 0) (translate with my words : the condition has a length superior to 1 and only the first element is used in ...) Below, you will find the results. The second launch gave the same results for nfichiers and rfichiers but for tfichiers I obtained tfichiers. Have you some ideas to help me (because I really have none ...) Again thank you, Ptit Bleu. -- FIRST LAUNCH nfichiers [1] d:/Mydata/31_07_07.P0 d:/Mydata/31_07_2007.P0 [3] d:/Mydata/31_07_2007.P1 d:/Mydata/31_07_2007.P2 [5] d:/Mydata/31_07_2007.P3 nfichiers!=0 [1] TRUE TRUE TRUE TRUE TRUE rfichiers [1] d:/Mydata/31_07_07.P0 d:/Mydata/31_07_2007.P0 [3] d:/Mydata/31_07_2007.P1 d:/Mydata/31_07_2007.P2 [5] d:/Mydata/31_07_2007.P3 tfichiers [1] d:/Mydata/31_07_07.P0 d:/Mydata/31_07_2007.P0 [3] d:/Mydata/31_07_2007.P1 d:/Mydata/31_07_2007.P2 [5] d:/Mydata/31_07_2007.P3 -- SECOND LAUNCH with these changes in order not to change tfichiers.rda #tfichiers-rfichiers #save(tfichiers, file=tfichiers.rda) nfichiers [1] d:/Mydata/31_07_07.P0 d:/Mydata/31_07_2007.P0 [3] d:/Mydata/31_07_2007.P1 d:/Mydata/31_07_2007.P2 [5] d:/Mydata/31_07_2007.P3 nfichiers!=0 [1] TRUE TRUE TRUE TRUE TRUE rfichiers [1] d:/Mydata/31_07_07.P0 d:/Mydata/31_07_2007.P0 [3] d:/Mydata/31_07_2007.P1 d:/Mydata/31_07_2007.P2 [5] d:/Mydata/31_07_2007.P3 tfichiers tfichiers -- View this message in context: http://www.nabble.com/Problem-with-save-or-and-if-%28I-think-but-maybe-not-...%29-tf4333945.html#a12344036 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Column naming mystery
Hi, I hope somebody could help me explain what seems mysterious to me? I use this line on a dataframe ae: summaryBy(total_inflated+total~gr1, data=ae, FUN=sum, na.rm=T) and it returns 3 columns as expected and columns gr1 and total_inflated.sumare correct but the total.sum column consists of only zeros which is not correct. The same happens when I rename the total_inflated to total.inflated or totalinflated but not when I rename it to ttotal_inflated. In the latter case I get the correct result also for the total.sum column. Could anyone explain the rules for the column naming to me? Thank you very much in advance! Werner Machen Sie Yahoo! zu Ihrer Startseite. Los geht's: __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with save or/and if (I think but maybe not ...)
Dera Prof Ripley, You wrote : What did you intend there? It is not a test of no difference, but a test that each element of the difference is not 0, and furthermore if() expects a test of length one, not the length of nfichiers. I suspect you intended to test length(nfichiers) 0. And of course, you were right. With the condition length(nfichiers) 0, there is no more warning. And I tested manually length(nfichiers)0 for different cases and it gave the result I expected. But still I have a problem with the save and the retrieve of tfichiers. I keep on looking at help file and testing manually alternative scripts ... Hoping to read you again, Ptit Bleu. -- View this message in context: http://www.nabble.com/Problem-with-save-or-and-if-%28I-think-but-maybe-not-...%29-tf4333945.html#a12344227 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [SOLVED] save/load - I finally found (to be honnest : jholtman found)
The post of jholtman gave me the solution : http://www.nabble.com/problems-saving-and-loading-%28PLMset%29-objects-tf4179541.html#a11885136 Like Quin Wills, I was trying to assign tfichiers.rda to tfichiers. I've just write load(tfichiers.rda) instead of tfichiers-(tfichiers.rda) And now it works ... for this part (because if new files are only .P0, there is a problem when the script try to read .P(not 0) file as there is none. But this is not so difficult to solve even for me (I think, well, I hope). Thanks to Prof Ripley and to all people helping people like me (maybe one day I will also be able to help people). Have a nice week, Ptit Bleu. -- View this message in context: http://www.nabble.com/Problem-with-save-or-and-if-%28I-think-but-maybe-not-...%29-tf4333945.html#a12345123 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Monmonier algorithm
Hello, Here is a late answer, but an answer nonetheless to the question I asked almost one year ago on this list: On Wed, 29 Mar 2006, Thibaut Jombart wrote: Hello list, http://tolstoy.newcastle.edu.au/R/help/06/03/24318.html#24322qlink1 / / / does anyone know if Monmonier algorithm is available in R? I've checked / / several spatial libraries, but I didn't find anything related to it. / / However, there is a huge documentation and I may have missed it. / / / / Before coding it, I'd like to be sure it doesn't already exist. / Googling, I found: http://www-med-physik.vu-wien.ac.at/staff/rub/abstracts/ISCB_2005.pdf which is a poster, and refers to using R for boundary finding, and other software for data management and display. Perhaps the authors are able to help by making code available, the poster looks like a nice example of spatial data analysis. -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: [EMAIL PROTECTED] Basically, Monmonier algorithm aims at finding maximum-difference boundaries between geo-referenced objects. It requires a set of georeferenced objects along with matrix of distances among these objects. Monmonier algorithm is now implemented in the adegenet package (http://pbil.univ-lyon1.fr/software/adegenet/). Main functions are 'monmonier' and 'optimize.monmonier'. Despite the package is devoted to genetic data analysis, these functions can handle other kind of data as well. The main difference I can see between this implementation and the original algorithm is that here, the function uses objects connected on a neighbouring graph rather than polygons of a Voronoi tesselation. Thus, Delaunay triangulation shall be used to recover the original version of the algorithm, but other graphs are also possible (e.g. Gabriel's graph). Regards, Thibaut. -- ## Thibaut JOMBART CNRS UMR 5558 - Laboratoire de Biométrie et Biologie Evolutive Universite Lyon 1 43 bd du 11 novembre 1918 69622 Villeurbanne Cedex Tél. : 04.72.43.29.35 Fax : 04.72.43.13.88 [EMAIL PROTECTED] http://lbbe.univ-lyon1.fr/-Jombart-Thibault-.html?lang=en __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confidence intervals for ccf()
Hello, This is not a purely R-question, but perhaps someone can help me anyway. I am trying to estimate the correlation between two time series (which are both basically different types of measurements of the same phenomena), using both cor.test() (with pearson as method) and ccf(). Now, cor.test gives a confidence interval for the pearson correlation, while ccf does not. I've tried to use bootstrap methods to get confidence interval for the ccf function, but no luck. It is a bit tricky, since the time series are non-stationary, and so I'm not sure how to go about to generate the bootstrap-sample. Does anyone have any ideas on how to do this, i.e get a confidence interval for the ccf at different time lags? Many thanks in advance, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] proftools package now available from CRAN
PROFILE OUTPUT PROCESSING TOOLS FOR R = This package provides some simple tools for examining Rprof output and, in particular, extracting and viewing call graph information. Call graph information, including which direct calls where observed and how much time was spent in these calls, can be very useful in identifying performance bottlenecks. One important caution: because of lazy evaluation a nested call f(g(x)) will appear on the profile call stack as if g had been called by f or one of f's callees, because it is the point at which the value of g(x) is first needed that triggers the evaluation. EXPORTED FUNCTIONS The package exports five functions: readProfileData reads the data in the file produced by Rprof into a data structure used by the other functions in the package. The format of the data structure is subject to change. flatProfile is similar to summaryRprof. It returns either a matrix with output analogous to gprof's flat profile or a matrix like the by.total component returned by summaryRprof; which is returned depends on the value of an optional second argument. printProfileCallGraph produces a printed representation of the call graph. It is analogous to the call graph produced by gprof with a few minor changes. Reading the gprof manual section on the call graph should help understanding this output. The output is similar enough to gprof output for the cgprof (http://mvertes.free.fr/) script to be able to produce a call graph via Graphviz. profileCallGraph2Dot prints out a Graphviz .dot file representing the profile graph. Times spent in calls can be mapped to node and edge colors. The resulting files can then be viewed with the Graphviz command line tools. plotProfileCallGraph uses the graph and Rgraphviz packages to produce call graph visualizations within R. You will need to install these packages to use this function. A SIMPLE EXAMPLE Collect profile information for the examples for glm: Rprof(glm.out) example(glm) Rprof() pd - readProfileData(glm.out) Obtain flat profile information: flatProfile(pd) flatProfile(pd, FALSE) Obtain a printed call graph on the standard output: printProfileCallGraph(pd) If you have the cgprof script and the Graphviz command line tools available on a UNIX-like system, then you can save the printed graph to a file, printProfileCallGraph(pd, glm.graph) and either use cgprof -TX glm.graph to display the graph in the interactive graph viewer dotty, or use cgprof -Tps glm.graph glm.ps gv glm.ps to create a PostScript version of the call graph and display it with gv. Instead of using the printed graph and cgprof you can use create a Graphviz .dot file representation of the call graph with profileCallGraph2Dot(pd, filename = glm.dot, score = total) and view the graph interactively with dotty using dotty glm.dot or as a postscript file with dot -Tps glm.dot glm.ps gv glm.ps Finally, if you have the graph package from CRAN and the Rgraphviz package from Bioconductor installed, then you can view the call graph within R using plotProfileCallGraph(pd, score = total) The default settings for this version need some work.] OPEN ISSUES My intention was to handle cycles roughly the same way that gprof does. I am not completely sure that I have managed to do this; I am also not completely sure this is the best approach. The graphs produced by cgprof and by plotProfileGraph and friends when mergeEdges is false differ a bit. I think this is due to the heuristics of cgprof not handling cycle entries ideally and that the plotProfileGraph graphs are actually closer to what is wanted. When mergeEdges is true the resulting graphs are DAGs, which simplifies interpretation, but at the cost of lumping all cycle members together. gprof provides options for pruning graph printouts by omitting specified nodes. It may be useful to allow this here as well. Probably more use should be made of the graph package. IMPLEMENTATION NOTES The implementation is extremely crude (a real mess would be more accurate) and will hopefully be improved over time--at this point it is more of an existence proof than a final product. Performance is less than ideal, though using these tools it was possible to identify some problem points and speed up computing the profile data by a factor of two (in other words, it may be bad now but it used to be worse). More careful design of the data structures and memoizing calculations that are now repeated is likely to improve performance substantially. -- Luke Tierney Chair, Statistics and
Re: [R] FAQ 7.x when 7 does not exist. Useability question
--- Duncan Murdoch [EMAIL PROTECTED] wrote: Deepayan Sarkar wrote: On 8/23/07, Duncan Murdoch [EMAIL PROTECTED] wrote: On 8/23/2007 11:28 AM, Prof Brian Ripley wrote: On Thu, 23 Aug 2007, John Kane wrote: The FAQ Section 7 is a very useful place for new users to find out any number of R idiosycracies. However there is no numbering on the FAQ Table of Content or on the Sections Tables of Contents. Hmm, doc/FAQ does have a numbered table of contents and numbered sections and doc/manual/R-FAQ.html does have numbered sections and my browser's search finds 7.10 straight away. I think the suggestion is to change the contents lists in HTML from ul lists to ol lists. Then one would see 1. Introduction 2. R Basics 3. R and S 4. R Web Interfaces 5. R Add-On Packages 6. R and Emacs 7. R Miscellanea 8. R Programming 9. R Bugs 10. Acknowledgments instead of * Introduction * R Basics * R and S * R Web Interfaces * R Add-On Packages * R and Emacs * R Miscellanea * R Programming * R Bugs * Acknowledgments in a browser, and I agree that would be preferable (assuming the numbering is consistent with what we get in the other formats). However, I don't see how to tell makeinfo --html to do this. Adding --number-sections isn't enough. A simple CSS hack is to have ul{ list-style-type: decimal; } in the style. The result can be seen in http://dsarkar.fhcrc.org/R/RFAQ-1.png A more sophisticated hack is to have something like --- body{ counter-reset: chapter; counter-reset: section; } h2.chapter { counter-increment: chapter; counter-reset: section; } ul { list-style-type: none; } li:before { counter-increment: section; content: counter(chapter) . counter(section) ; } - which results in http://dsarkar.fhcrc.org/R/RFAQ-2.png The only problem here is that there is no way to distinguish between the chapter listing and the section listings (both are ul class=menu). If that could be made to have a different class, the chapter listing could be improved. I like the first, simple suggestion best; I'll put it into R-devel. (With the slight change to use ul.menu instead of just ul, because FAQ 2.7 includes a plain ul list.) Duncan Murdoch Thanks Deepayan and Duncan. It is not a make or break point in using R but it does seem to make the FAQ a bit more user-friendly. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Column naming mystery
Sorry that the problem description was not sufficient. Here is a self-contained code replicating the problem: require(doBy) x - as.data.frame(matrix(ncol=3,seq(1,12),dimnames=list(c(),c(hh,total,total.inf summaryBy(total+total.inf~hh,x,FUN=sum) What surprises me are the zeros in the resulting total.sum column. The problem remains if total.inf is renamed to totalinf or total_inf but not if renamed to ttotal.inf . Can anyone explain to me what the rules for naming columns are so that I can avoid such mistakes in the future? Thanks a lot! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to write nicely a condition on a loop for (that is, not like I did)
Hi again, This is the follow of my post Problem with save or/and if (I think but maybe not ...). In this post, I wrote that I solved my main problem. And it is true. I also wrote that there was still another problem, which I managed to solve. But I think there must be another way to solve it taking advantages of the R language (which I don't master at all), that is with less if tests. To sum up : nfichiers is a list of files (with .P0 or .Px (x0) extension) I have to copy to a database. nfichiers can also be 0 if there is no file to copy p0fichiers is the list of files having the .P0 extension if there are such files to copy And p0fichiers can also be 0 if there are only .Px files to copy So, before doing the for loop, I want to test if p0fichiers really contains something. Thanks for your comments and your advices to improve this script. Ptit Bleu. - So here is my solution : p0fichiers-0 #initialization of p0fichiers if (length(nfichiers)0) # if nfichiers contains file names { if (length(grep(.P0, nfichiers))0) {p0fichiers-nfichiers[grep(.P0, nfichiers)]} #look if there is .P0 if (p0fichiers[1]0) # if .P0 has been updated with the test above { for (i in 1:length(p0fichiers)) # do the loop for { donnees-read.table(p0fichiers[i], quote=\, sep=;, dec=,, skip=18) jourheure-paste(donnees$V1, donnees$V2, sep= ) donnees[1]-jourheure donnees-donnees[,-2] rm(donnees, jourheure) } } } -- View this message in context: http://www.nabble.com/how-to-write-nicely-a-condition-on-a-loop-%22for%22-%28that-is%2C-not-like-I-did%29-tf4335310.html#a12347016 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How can I interpret this test hypothesis test
Since no reply has been posted yet I will give it a shot. runs.test uses the normal approximation and in your case it returned a z score of -1.8732. This z score has a cumulative probability of pnorm(-1.8732,0,1) [1] 0.03052039 If you are concerned about having too many runs and too few runs you would select the two.sided option for runs.test, which gives a p-value of 0.0610 (0.0305 in each tail of the normal distribution). If you are concerned only with too few runs you would select the less option, which will give a p-value of 0.0305. Finally, if you are concerned only with too many runs you would select the greater option which will give a p-value of 1-0.0305 = 0.9693. If your significance level is 0.05, you would compare 0.05 to 0.0610 and not reject the null hypothesis for the two-sided case and compare 0.05 to 0.0305 in the one-sided case and reject the null hypothesis. Note that the normal approximation is OK for large samples but may give unacceptable results for small samples. I am unaware of any packages in R that perform an exact runs test. Tom I have used runs.test (Package tseries) for computes the runs test for randomness , but I get this result: Runs test -1.8732 P-value = 0.0610 Alternative Hypothesis : Two sided How can I interpret this result ? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FAQ 7.x when 7 does not exist. Useability question
On 8/27/2007 8:52 AM, John Kane wrote: --- Duncan Murdoch [EMAIL PROTECTED] wrote: I like the first, simple suggestion best; I'll put it into R-devel. (With the slight change to use ul.menu instead of just ul, because FAQ 2.7 includes a plain ul list.) Duncan Murdoch Thanks Deepayan and Duncan. It is not a make or break point in using R but it does seem to make the FAQ a bit more user-friendly. I'm about to commit the change, but it's not perfect. I've applied the change to the css used in all the manuals, not just the FAQ, so the HTML versions of the manuals now end up with numbered contents listings too. However, appendices continue the chapter numbering, rather than switching to letters. I think this is preferable to no numbering at all, but if others object to it, we can make this change for the FAQ only. Another way to do this is what's used in the texinfo manual http://www.gnu.org/software/texinfo/manual/texinfo/texinfo.html but I find that ugly and inconsistent. The contents listing gets the numbering and lettering right (but not well formatted), but within each chapter the menus are unnumbered. The texinfo format is just a bit limited for this kind of thing. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset using noncontiguous variables by name (not index)
Gabor, That works great! I think this would be a very helpful addition to the main R distribution. Perhaps with a single colon representing numerical order (exactly as you have written it) and two colons representing the order of the variables as they appear in the data frame (your first example). That's analogous to SAS' x1-xN, which you know gets those N variables, and a--z, which selects an unknown number of variables a through z. How many that is depends upon their order in the data frame. That would not only be very useful in general, but it would also make transitioning to R from SAS or SPSS less confusing. Is R still being extended in such basic ways, or does that muck up existing programs too much? Thanks, Bob -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Sunday, August 26, 2007 8:52 PM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] subset using noncontiguous variables by name (not index) Try this: %:% - function(x, y) { +prex - gsub([0-9], , x); postx - gsub([^0-9], , x) +prey - gsub([0-9], , y); posty - gsub([^0-9], , y) +stopifnot(prex == prey) +paste(prex, seq(from = as.numeric(postx), to = as.numeric(posty)), sep = ) + } x2 %:% x4 [1] x2 x3 x4 On 8/26/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Thanks Bert Gabor for two very interesting solutions! It would be very handy in R if string1:stringN generated string1,string2...stringN it would make selections like this much more obvious. I know it's easy to with the colon operator and paste function but that's quite a step up in complexity compared to SAS' x1 x3-x4 y2 or SPSS' x1,x3 to x4, y2. And it's complexity that beginners face early in learning R. While on the subject of the colon operator, why doesn't anscombe[[1:4]] select the x variables in list form as anscombe[,1:4] or anscombe[1:4] do in data frame form? Thanks, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: Bert Gunter [mailto:[EMAIL PROTECTED] Sent: Sunday, August 26, 2007 6:50 PM To: 'Gabor Grothendieck'; Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: RE: [R] subset using noncontiguous variables by name (not index) The problem is that x3:x5 does not mean what you think it means. The only reason it does the right thing in subset() is because a clever trick is used there (read the code -- it's not hard to understand) to ensure that it does. Gabor has essentially mimicked that trick in his solution. However, it is not necessary do this. You can construct the call directly as you tried to do. Using the anscombe example, here's how: chooz - c(x1,x3:x4,y2) ## enclose the desired expression in quotes do.call (subset, list( x = anscombe, select = parse(text = chooz))) -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck Sent: Sunday, August 26, 2007 2:10 PM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] subset using noncontiguous variables by name (not index) Using builtin data frame anscombe try this. First we set up a data frame anscombe.seq which has one row containing 1, 2, 3, ... . Then select out from that data frame and unlist it to get the desired index vector. anscombe.seq - replace(anscombe[1,], TRUE, seq_along(anscombe)) idx - unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2))) anscombe[idx] x1 x3 x4 y2 1 10 10 8 9.14 2 8 8 8 8.14 3 13 13 8 8.74 4 9 9 8 8.77 5 11 11 8 9.26 6 14 14 8 8.10 7 6 6 8 6.13 8 4 4 19 3.10 9 12 12 8 9.13 10 7 7 8 7.26 11 5 5 8 4.74 On 8/26/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Hi All, I'm using the subset function to select a list of variables, some of which are contiguous in the data frame, and others of which are not. It works fine when I use the form: subset(mydata,select=c(x1,x3:x5,x7) ) In reality, my list is far more complex. So I would like to store it in a variable to substitute in for c(x1,x3:x5,x7) but cannot get
Re: [R] R 2.5.1 - Rscript through tee
On 26 August 2007 at 22:47, François Pinard wrote: | I met a little problem for which someone might have a solution. Let's | say I have an executable file (named pp.R) with this contents: | |#!/usr/bin/Rscript |options(echo=TRUE) |a - 1 |Sys.sleep(3) |a - 2 | | If I execute ./pp.R at the shell prompt, the output shows the timely | progress of the script as expected. If I use ./pp.R | tee OUT | instead, the output seems buffered and I see it all at once at the end. | | The problem does not come from the tee program, as if I use this | command: | |(echo a; sleep 5; echo b) | tee OUT | | the output is timely, not batched. | | So, is there a way to tell R (or Rscript) that standard output should be | unbuffered, even if it is not directly connected to a terminal? Use explicit print statements, e.g. print(a - 1) Also, you still have little as an alternate, at least on Unix [1]. Littler5D actually won't show anything unless you explicitly call cat() or print(), but then it does: qa-v40z1:~/svn/hancock/app/aggposview cat /tmp/fp2.r #!/usr/bin/env r options(echo=TRUE) cat(a - 1, \n) Sys.sleep(3) cat(a - 2, \n) foo:~ /tmp/fp2.r | tee /tmp/fp2.r.out 1 2 foo:~ Littler is an 'all-in' binary and starts and runs demonstrably faster than Rscript. Hth, Dirk [1] And despite the rather petty refusal of Rscript's main author to a least give a reference to littler in Rscript's documentation, let alone credit as 'we were there first', the fact remains that littler became available in Sep 2006 whereas Rscript was not released until R 2.5.0 a good six month later. Oh well. | In case useful, here is local R information: | | Version: | platform = x86_64-unknown-linux-gnu | arch = x86_64 | os = linux-gnu | system = x86_64, linux-gnu | status = | major = 2 | minor = 5.1 | year = 2007 | month = 06 | day = 27 | svn rev = 42083 | language = R | version.string = R version 2.5.1 (2007-06-27) | | Locale: | LC_CTYPE=fr_CA.UTF-8;LC_NUMERIC=C;LC_TIME=fr_CA.UTF-8;LC_COLLATE=fr_CA.UTF-8;LC_MONETARY=fr_CA.UTF-8;LC_MESSAGES=fr_CA.UTF-8;LC_PAPER=fr_CA.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=fr_CA.UTF-8;LC_IDENTIFICATION=C | | Search Path: | .GlobalEnv, package:stats, package:utils, package:datasets, fp.etc, package:graphics, package:grDevices, package:methods, Autoloads, package:base | | -- | François Pinard http://pinard.progiciels-bpi.ca | | __ | R-help@stat.math.ethz.ch mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. -- Three out of two people have difficulties with fractions. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] validate (package Design): error message subscript out of bounds
Dear R users I use Windows XP, R2.5.1 (I have read the posting guide, I have contacted the package maintainer first, it is not homework). In a research project on renal cell carcinoma we want to compute Harrell's c index, with optimism correction, for a multivariate Cox regression and also for some univariate Cox models. For some of these univariate models I have encountered an error message (and no result produced) from the function validate i Frank Harrell's Design package: Error in Xb(x[, xcol, drop = FALSE], coef, non.slopes, non.slopes.in.x, : subscript out of bounds The following is an artificial example wherein I have been able to reproduce this error message (actual data has been changed to preserve confidentiality): library(Design) # an example data frame: frame.bc - data.frame(time1 = c(9,24,28,43,58,62,66,107,116,118,123, 127,129,131,137,138,139,140,148,169,176,179,188,196,210,218, 1,1,1,2,2,3,4,8,23,32,33,34,43,44,48,51,52,54,59,59,60,60,62, 65,65,68,70,72,73,74,81,84,88,98,99,106,107,115,115,117,119, 120,122,122,122,122,126,128,130,135,136,136,138,149,151,154, 157,159,161,164,164,164,166,172,172,176,179,180,183,183,184, 187,190,197,201,201,203,203,203,209,210,214,219,227,233,4,18, 49,113,147,1,1,2,2,2,2,2,3,4,6,6,6,6,6,6,6,6,9,9,9,9,9,10,10, 10,11,12,12,12,13,14,14,17,18,18,19,19,20,20,21,21,21,21,22,23, 23,24,28,28,29,29,32,34,35,38,38,48,48,52,52,54,54,56,64,67,67, 69,70,70,72,84,88,90,114,115,140,142,154,171,195), status1 = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1), bc1 = factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2), labels=c('bc.1','bc.2')), age = c(58,68,23,20,50,43,41,69,20,48,19,27,39,20,65,49,70,59,31,43,25, 61,60,45,34,59,32,58,30,62,26,44,52,29,40,57,33,18,50,50,55,51,38,34, 69,56,67,38,66,21,48,39,62,62,29,68,66,19,60,39,55,42,24,29,56,61,40, 52,19,40,33,67,66,51,48,63,60,58,68,60,53,20,45,62,37,38,61,63,43,67, 49,39,43,67,49,69,32,37,32,63,33,47,66,39,23,57,26,61,20,49,69,30,40, 29,38,66,60,69,69,44,65,25,41,53,18,55,45,59,49,27,51,29,67,26,24,26, 47,23,50,27,35,45,32,26,45,45,63,39,39,22,38,27,31,27,49,65,66,49,39, 21,51,49,55,63,19,26,50,21,24,34,65,33,55,33,36,53,48,25,54,58,60,34, 47,23,34,60,39,34,22,30,41,55,64,48,34,54)) frame.bc # preparing for a simple univariate Cox regression: dd.bc - datadist(frame.bc[, c('bc1','age')], adjto.cat='first') options(datadist = 'dd.bc') # a univariate Cox regression: cph.bc - cph(formula = Surv(time1,status1)~bc1, data = frame.bc, x=TRUE, y=TRUE, surv=TRUE) anova(cph.bc) cph.bc summary(cph.bc) # the validate command for the Cox model: val.cph.bc - validate(cph.bc, B=200, dxy=TRUE , pr=TRUE) -- Output from the validate command: training test Dxy -0.124360 -0.1423409 R2 1.00 1.000 Slope 1.00 0.7919584 D 0.016791 0.0147536 U -0.002395 0.0006448 Q 0.019186 0.0141088 training test Dxy -0.191875 -0.1423409 R2 1.00 1.000 Slope 1.00 0.8936724 D 0.022397 0.0147536 U -0.002339 0.0001367 Q 0.024736 0.0146169 training test Dxy -0.199514 -0.1423409 R2 1.00 1.000 Slope 1.00 0.8075246 D 0.025717 0.0147536 U -0.002447 0.0005348 Q 0.028163 0.0142188 Error in Xb(x[, xcol, drop = FALSE], coef, non.slopes, non.slopes.in.x, : subscript out of bounds Any help/suggestions will be highly appreciated. Sincerely, Tore Wentzel-Larsen statistician Centre for Clinical research Armauer Hansen house Haukeland University Hospital N-5021 Bergen tlf +47 55 97 55 39 (a) faks +47 55 97 60 88 (a) email [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset using noncontiguous variables by name (not index)
On Mon, 27 Aug 2007, Muenchen, Robert A (Bob) wrote: Gabor, That works great! I think this would be a very helpful addition to the main R distribution. Perhaps with a single colon representing numerical order (exactly as you have written it) and two colons representing the order of the variables as they appear in the data frame (your first example). That's analogous to SAS' x1-xN, which you know gets those N variables, and a--z, which selects an unknown number of variables a through z. How many that is depends upon their order in the data frame. That would not only be very useful in general, but it would also make transitioning to R from SAS or SPSS less confusing. Is R still being extended in such basic ways, or does that muck up existing programs too much? In principle base R can be extended like that, but a strong case is needed for non-standard evaluation rules and for depleting the restricted supply of short binary operator names. The reason for subset() and its behaviour is that 'variables as they appear the in data frame' is typically ambiguous -- which data frame? In SPSS you have only one and in SAS there is a default one, so there is no ambiguity in X1--Y2, but in R it needs another argument specifying the data frame, so it can't really be a binary operator. The double colon :: and triple colon ::: are already used for namespaces, and a search of r-help reveals two previous, different, suggestions for %:%. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-2.5.1 RedHat EL5 compilation failed
Original Message Subject: [R] R-2.5.1 RedHat EL5 compilation failed From: Wang Chengbin [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Date: 26.08.2007 15:22 I can't get R-2.5.1 compiled under RedHat EL5 with gcc 4.1.1. Configure failed at the following: You don't need to compile, you could also use the Fedora Core 6 Extras repository package(s) of R (current: is R-2.5.1-2.fc6.i386.rpm) to install the necessary rpm packages from there. (Best is to use the smart package manager, there you can easily activate channels which are repositories.) As far as I understood FC6 is the base of RHEL 5. Stefan -=-=- ... Time is an illusion, lunchtime doubly so. (Ford Prefect) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] p-Value
On Mon, Aug 27, 2007 at 11:49:19AM +0500, amna khan wrote: Hi Sir When we use Kendall Package to obtain Kendall's Tau statistic. Then we also get two-sided p value. What does two-sided p-value mean? The word two-sided is confusing to understand. Two-sided is sometimes also called two-tailed... It refers to the probability if being farther away from 0 than the observed value *in either direction* -- Daniel Lakeland [EMAIL PROTECTED] http://www.street-artists.org/~dlakelan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Robust Standar Errors in Zero-Truncated Poisson
Hi. I would like to know if is it possible to estimate zero-truncated count models with robust standard errors in R. In Stata that is possible. I already made some searches and attempts but not obtained it. In R I made the estimation of the truncated poisson by the vglm command of VGAM package . -- View this message in context: http://www.nabble.com/Robust-Standar-Errors-in-Zero-Truncated-Poisson-tf4336437.html#a12351638 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sequential Rank Test
Hi R-Masters I need use a sequential approach in serie of cases, but may data is not normal. If data is normal distribution is very easy create analysis using likelihood ratio like of Wald test. But in my case I need use a non-parametric test (Mann-Whitney). I was use: RSiteSearch(sequential rank test) but not solve my problem. Do you know routine or package implement sequential rank test in R? Thanks in advance -- Bernardo Rangel Tura, M.D,Ph.D National Institute of Cardiology Brazil __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sequential Rank Test
Hi Bernardo, I think that ?wilcox.test will help you. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O On 27/08/07, Bernardo Rangel Tura [EMAIL PROTECTED] wrote: Hi R-Masters I need use a sequential approach in serie of cases, but may data is not normal. If data is normal distribution is very easy create analysis using likelihood ratio like of Wald test. But in my case I need use a non-parametric test (Mann-Whitney). I was use: RSiteSearch(sequential rank test) but not solve my problem. Do you know routine or package implement sequential rank test in R? Thanks in advance -- Bernardo Rangel Tura, M.D,Ph.D National Institute of Cardiology Brazil __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sequential Rank Test
I looked for the same topic today and found ?wilcox.test in the stats package. B Am 27.08.2007 um 17:33 schrieb Bernardo Rangel Tura: Hi R-Masters I need use a sequential approach in serie of cases, but may data is not normal. If data is normal distribution is very easy create analysis using likelihood ratio like of Wald test. But in my case I need use a non-parametric test (Mann-Whitney). I was use: RSiteSearch(sequential rank test) but not solve my problem. Do you know routine or package implement sequential rank test in R? Thanks in advance -- Bernardo Rangel Tura, M.D,Ph.D National Institute of Cardiology Brazil __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Birgit Lemcke Institut für Systematische Botanik Zollikerstrasse 107 CH-8008 Zürich Switzerland Ph: +41 (0)44 634 8351 [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] FW: subset using noncontiguous variables by name (not index)
Thomas, that's a good point. I was thinking of anscombe[x1::y1] making it clear which one, but you would then want just x1::y1 to have unambiguous meaning on its own, which is impossible. As for x1:xN, it's unambiguous on its own. I thought one of the great advantages of R was that it could use different methods so that a new operator would not be needed. The colon operator would just have a new method for when stringN appeared. One that would be very useful have obvious meaning. Thanks, Bob -Original Message- From: Thomas Lumley [mailto:[EMAIL PROTECTED] Sent: Monday, August 27, 2007 10:25 AM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] subset using noncontiguous variables by name (not index) On Mon, 27 Aug 2007, Muenchen, Robert A (Bob) wrote: Gabor, That works great! I think this would be a very helpful addition to the main R distribution. Perhaps with a single colon representing numerical order (exactly as you have written it) and two colons representing the order of the variables as they appear in the data frame (your first example). That's analogous to SAS' x1-xN, which you know gets those N variables, and a--z, which selects an unknown number of variables a through z. How many that is depends upon their order in the data frame. That would not only be very useful in general, but it would also make transitioning to R from SAS or SPSS less confusing. Is R still being extended in such basic ways, or does that muck up existing programs too much? In principle base R can be extended like that, but a strong case is needed for non-standard evaluation rules and for depleting the restricted supply of short binary operator names. The reason for subset() and its behaviour is that 'variables as they appear the in data frame' is typically ambiguous -- which data frame? In SPSS you have only one and in SAS there is a default one, so there is no ambiguity in X1--Y2, but in R it needs another argument specifying the data frame, so it can't really be a binary operator. The double colon :: and triple colon ::: are already used for namespaces, and a search of r-help reveals two previous, different, suggestions for %:%. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Formatting Sweave in R-News
Thank you Paul for your response. Unfortunately that did not work. A figure environment frames it neatly, but still contained in only one column. I have tried various methods, but they all seem to not work, or if the solutions involve manually setting the size, the grey column separator still runs through the middle of the page. I know a solution exists, because on page 21, Vol 1/1 of R-News, there is an image that spans both columns. Do you know where I could get the Rnw source files for R-news articles? That would at least allow me to trawl for a solution. Best regards, Arjun On 8/22/07, Paul Murrell [EMAIL PROTECTED] wrote: Hi Arjun Ravi Narayan wrote: Hi, I am editing a document for submission to the R-news newsletter, and in my article my Sweave code inserts a dynamically generated PDF report that my R program generates. However, when I insert the PDF using the following Sweave code: \newpage \includegraphics[scale=1.0]{\Sexpr{print(location)}} \newpage (in tex this looks like): \newpage \includegraphics[scale=1.0]{/home/arjun/sample.pdf} \newpage Try putting your image in a figure* environment (should go full width of the page). Paul However, the r-news style package over-rides everything that I can set (including using the minipage option) to make my included PDF small sized. Part of the problem is that the R-news style specifies a two-column formatting, and so the PDF is shrunk to fit in one column. How can I, for just one page, over-ride the styles to include the PDF? Even if I hard-hack the graphics to be scaled up in size, that does not get rid of the vertical line that in between the two columns, and thus breaking my image. I realise that this is not an R problem, but more a latex problem, but I am hoping that somebody has faced similar problems with the Rnews styles and has an idea on how to do this. Thank you, Yours sincerely, -- Dr Paul Murrell Department of Statistics The University of Auckland Private Bag 92019 Auckland New Zealand 64 9 3737599 x85392 [EMAIL PROTECTED] http://www.stat.auckland.ac.nz/~paul/ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Formatting Sweave in R-News
Dear Paul, I stand corrected. Your solution was the right way. The following code now works: (Apparently I still need to specify the width command as my pdf is incorrectly sized by default) \begin{figure*}[b] \begin{center} \includegraphics[width=8in]{generatedPDF.pdf} \end{center} \end{figure*} There is a full explanation in the template.tex file which can be found in the RNews tutorial here: http://cran.r-project.org/doc/Rnews/template.tex Thank you for your time. Best regards, Arjun Try putting your image in a figure* environment (should go full width of the page). Paul Dr Paul Murrell Department of Statistics The University of Auckland Private Bag 92019 Auckland New Zealand 64 9 3737599 x85392 [EMAIL PROTECTED] http://www.stat.auckland.ac.nz/~paul/ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Max vs summary inconsistency
Hello, I'm having the following questionable behavior: summary(m) Min. 1st Qu. MedianMean 3rd Qu.Max. 1 13000 26280 25890 38550 50910 max(m) [1] 50912 typeof(m) [1] integer class(m) [1] integer ...it seems to me like max() and summary(m)[6] ought to return the same number. Am I doing something wrong? I'm running R 2.5.1 (2007-06-27), installed on MacOSX from the dmg file found on CRAN. -- Adam D. I. Kramer Ph.D. Student, University of Oregon [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subset question
I would like to code records in a dataset with a 1 if any of the columns 9-67 contain a particular code, and zero if they don't. I've been working with subset and it seems that something like subset(data, data[9:67]--12345) would work, but I have been unsuccessful so far. It seems like a simple problem - any help is appreciated! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Max vs summary inconsistency
On Mon, 27 Aug 2007, Adam D. I. Kramer wrote: Hello, I'm having the following questionable behavior: summary(m) Min. 1st Qu. MedianMean 3rd Qu.Max. 1 13000 26280 25890 38550 50910 max(m) [1] 50912 typeof(m) [1] integer class(m) [1] integer ...it seems to me like max() and summary(m)[6] ought to return the same number. Am I doing something wrong? They do return the same number, they just print it differently. summary() prints four significant digits by default. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset using noncontiguous variables by name (not index)
Thanks for helping me see why R doesn't have the obvious! -Bob -Original Message- From: Thomas Lumley [mailto:[EMAIL PROTECTED] Sent: Monday, August 27, 2007 2:12 PM To: Muenchen, Robert A (Bob) Subject: RE: [R] subset using noncontiguous variables by name (not index) On Mon, 27 Aug 2007, Muenchen, Robert A (Bob) wrote: Thomas, that's a good point. I was thinking of anscombe[x1::y1] making it clear which one, but you would then want just x1::y1 to have unambiguous meaning on its own, which is impossible. As for x1:xN, it's unambiguous on its own. It actually isn't. We already have a meaning. Consider x1-4 xN-6 x1:xN It also breaks R's argument passing rules by treating x1 as string rather than a name. What would be unambiguous at the moment is x1:x4, provided there was a sufficiently precise set of rules on what was allowed. Consider x1:x-1(negative?) x1:x3.14 (non-integer?) x3.12:x3.14 (is the prefix x or x3.?) x1:X4 (the prefix changes) 01:14 (is the prefix empty or 0?) x09:xA2 (is this illegal decimal or legal hexadecimal?) IL23R1:IL23R4 (what is the prefix?) x1a:x4a(infix numbering?) -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Max vs summary inconsistency
[Adam D. I. Kramer] I'm having the following questionable behavior: summary(m) Min. 1st Qu. MedianMean 3rd Qu.Max. 1 13000 26280 25890 38550 50910 max(m) [1] 50912 ...it seems to me like max() and summary(m)[6] ought to return the same number. Am I doing something wrong? Some may say that you did not scrutinize the documentation enough, as summary artificially limits the number of significant digits. However, this question reoccurs often and regularly in these mailing lists, so at last, maybe something should be done about it, beyond documenting how it works. Overall, too many users got mislead, that one may not so bluntly assert they are all wrong. For example, resorting to scientific notation whenever non significant zero digits would have otherwise been printed. This should clarify a bit that the printing precision got artificially limited. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Max vs summary inconsistency
On Mon, 27 Aug 2007, François Pinard wrote: summary(m) Min. 1st Qu. MedianMean 3rd Qu.Max. 1 13000 26280 25890 38550 50910 max(m) [1] 50912 ...it seems to me like max() and summary(m)[6] ought to return the same number. Am I doing something wrong? Some may say that you did not scrutinize the documentation enough, as summary artificially limits the number of significant digits. Indeed, several have said so in private email as well as email to the list. Thanks to all, apologies for my lack of scrutiny. However, this question reoccurs often and regularly in these mailing lists, so at last, maybe something should be done about it, beyond documenting how it works. Overall, too many users got mislead, that one may not so bluntly assert they are all wrong. I would agree, and not only because I was misled: Several people are scrutinizing the RESPONSE of summary()'s output, and noticing it is incorrect. However, it is very VERY likely that many more are NOT scrutinizing it, and as such are forming false beliefs about their data sets, which may be subsequently published or used in further analyses. Taking a small step in the implementation of summary() to potentially prevent the publication of incorrect data seems worthwhile. Certainly, any researcher should check their output in many ways, but it makes no sense to me that summary() would round its output to 4 significant digits by default. For example, resorting to scientific notation whenever non significant zero digits would have otherwise been printed. This should clarify a bit that the printing precision got artificially limited. I think this is a great solution, though I'm not sure whether scripts that use summary() would break if passed a number in scientific notation. That said, scripts that use summary() are probably assuming that the number reported is maximally precise, and thus are making the same mistake I did...and thus should indeed break! -- Adam Kramer Ph.D. Student, Social Psychology University of Oregon [EMAIL PROTECTED]__ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to include bar values in a barplot?
Donatas G. wrote: On Tuesday 07 August 2007 22:09:52 Donatas G. wrote: How do I include bar values in a barplot (or other R graphics, where this could be applicable)? To make sure I am clear I am attaching a barplot created with OpenOffice.org which has barplot values written on top of each barplot. Here is the barplot mentioned above: http://dg.lapas.info/wp-content/barplot-with-values.jpg it appeaars that this list does not allow attachments... That is a TERRIBLE graphic. Can't we finally leave this subject alone? Frank Harrell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] use apply function with which
Dear R-users, For a data frame (say in this example X) I want to look up the corresponding value in a 'look-up data frame' (in this example Y). The for-loop works but is very time-consuming because 'X' in reality is very big. Therefore I would like to have a solution with apply. However, I do not succeed. Any suggestions? Thanks in advance, Hanneke c1=c('a','a','b') c2=c('j','k','k') V1=c('a','a','a','a','b','b','b','b')) V2=c('i','j','k','l','i','j','k','l') V3=c(4,3,2,1,8,5,2,-1) X=NULL X$c1=c1 X$c2=c2 X=as.data.frame(X) Y=NULL Y$V1=V1 Y$V2=V2 Y$V3=V3 Y=as.data.frame(Y) result=NULL for (i in 1:dim(X)[1]) { result=rbind(result, Y$V3[which(Y$V1==as.character(X[i,]$c1) Y$V2==as.character(X[i,]$c2))]) } ### which.search=function(X,Y,c1,c2,V1,V2,V3) Y$V3[which(Y$V1==as.character(X$c1) Y$V2==as.character(X$c2))] apply(X,1,which.search,X=X,Y=Y,c1='c1',c2='c2',V1='V1',V2='V2',V3='V3') ### sessionInfo() R version 2.5.1 (2007-06-27) i386-pc-mingw32 locale: LC_COLLATE=Dutch_Netherlands.1252;LC_CTYPE=Dutch_Netherlands.1252;LC_MONETARY=Dutch_Netherlands.1252;LC_NUMERIC=C;LC_TIME=Dutch_Netherlands.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grouping scat1d/rug and plotting to 2 axes
Hi, I'm wondering if anybody can offer a bit of guidance on how to add a couple of features to a plot. I'm using Frank Harrell's Design library to model some survival data in R (2.3.1, windows platform). I'm fairly comfortable with the survival modeling in Design, but am still at a frustratingly low level of competence when it comes to creating anything beyond simple plots in R. A simplified version of the model is: fit - cph(Surv(survtime,deceased) ~ rcs(smw,4), data=survdata,x=T,y=T,surv=T ) And the basic plot is: plot(fit,smw=NA, fun=function(x) 1/(1+exp(-x))) I know that if I add scat1d(smw) I get a nice jittered rug plot of all values of the predictor smw on the top axis. What I'd like to do, however, is to plot on bottom axis the values of smw for only those participants who are alive, and then on the top axis, plot the values of smw for those who are deceased. I'd appreciate any tips as to how I might approach this. Thanks, Mike Babyak __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use apply function with which
On Mon, 27 Aug 2007, [EMAIL PROTECTED] wrote: Dear R-users, For a data frame (say in this example X) I want to look up the corresponding value in a 'look-up data frame' (in this example Y). The for-loop works but is very time-consuming because 'X' in reality is very big. Therefore I would like to have a solution with apply. However, I do not succeed. Any suggestions? Thanks in advance, Hanneke c1=c('a','a','b') c2=c('j','k','k') V1=c('a','a','a','a','b','b','b','b')) You have a syntax error in the previous line - '))' V2=c('i','j','k','l','i','j','k','l') V3=c(4,3,2,1,8,5,2,-1) X=NULL X$c1=c1 X$c2=c2 X=as.data.frame(X) Y=NULL Y$V1=V1 Y$V2=V2 Y$V3=V3 Y=as.data.frame(Y) result=NULL for (i in 1:dim(X)[1]) { result=rbind(result, Y$V3[which(Y$V1==as.character(X[i,]$c1) Y$V2==as.character(X[i,]$c2))]) } ### which.search=function(X,Y,c1,c2,V1,V2,V3) Y$V3[which(Y$V1==as.character(X$c1) Y$V2==as.character(X$c2))] apply(X,1,which.search,X=X,Y=Y,c1='c1',c2='c2',V1='V1',V2='V2',V3='V3') ^^^^... You use X twice in this expression. If you delete 'X=X,' and revise which.search to which.search - function( X, Y, c1, c2, V1, V2, V3 ) Y$V3[ which( Y$V1==as.character( X[c1] ) Y$V2 == as.character( X[ c2 ] ) ) ] to get rid of the $ operator which is deprecated for atomic vectors, (and fix the above syntax error) then this expression agrees with 'result' If you know that the matches are unique (only one row in Y will match any row of X), then match( paste( X$c1, X$c2 ) , paste( Y$V1, Y$V2 )) will be fast. If nrow(Y) is small, which( outer(Y$V1, as.character(X$c1), == ) outer(Y$V2, as.character(X$c2), == ), arr.ind = TRUE ) will also be quick. Otherwise something like unlist( lapply( paste( X$c1, X$c2 ), match, paste( Y$V1, Y$V2 )) ) may be a good bet. Please learn to use the space key to format your code in a more readable fashion! HTH, Chuck ### sessionInfo() R version 2.5.1 (2007-06-27) i386-pc-mingw32 locale: LC_COLLATE=Dutch_Netherlands.1252;LC_CTYPE=Dutch_Netherlands.1252;LC_MONETARY=Dutch_Netherlands.1252;LC_NUMERIC=C;LC_TIME=Dutch_Netherlands.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rmpi and x86
Dear R Gurus: Is there a problem with Rmpi on x86 with SUSE 10.1, please? I've tried everything and it still won't load. Has anyone else dealt with this please? Thanks, Edna Bell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping scat1d/rug and plotting to 2 axes
Mike wrote: Hi, I'm wondering if anybody can offer a bit of guidance on how to add a couple of features to a plot. I'm using Frank Harrell's Design library to model some survival data in R (2.3.1, windows platform). I'm fairly comfortable with the survival modeling in Design, but am still at a frustratingly low level of competence when it comes to creating anything beyond simple plots in R. A simplified version of the model is: fit - cph(Surv(survtime,deceased) ~ rcs(smw,4), data=survdata,x=T,y=T,surv=T ) And the basic plot is: plot(fit,smw=NA, fun=function(x) 1/(1+exp(-x))) or plot(fit, smw=NA, fun=plogis). But what does the logistic model have to do with the Cox model you fitted? You can instead do plot(fit, smw=NA, time=1) to plot estimated 1-year survival prob. I know that if I add scat1d(smw) I get a nice jittered rug plot of all values of the predictor smw on the top axis. What I'd like to do, however, is to plot on bottom axis the values of smw for only those participants who are alive, and then on the top axis, plot the values of smw for those who are deceased. I'd appreciate any tips as to how I might approach this. That isn't so well defined because of variable follow-up time. I would not get very much out of such a plot. Frank Thanks, Mike Babyak __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] oddity with method definition
Just wondered about this curious behaviour. I'm trying to learn about classes. Basically setMethod works the first time, but does not seem to work the second time. Faheem. * setClass(foo, representation(x=numeric)) bar - function(object) { return(0) } bar.foo - function(object) { print([EMAIL PROTECTED]) } setMethod(bar, foo, bar.foo) bar(f) # bar(f) gives 1. bar - function(object) { return(0) } bar.foo - function(object) { print([EMAIL PROTECTED]) } setMethod(bar, foo, bar.foo) f = new(foo, x= 1) bar(f) # bar(f) gives 0, not 1. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] oddity with method definition
On 27/08/2007 5:47 PM, Faheem Mitha wrote: Just wondered about this curious behaviour. I'm trying to learn about classes. Basically setMethod works the first time, but does not seem to work the second time. Faheem. * setClass(foo, representation(x=numeric)) bar - function(object) { return(0) } bar.foo - function(object) { print([EMAIL PROTECTED]) } setMethod(bar, foo, bar.foo) This changes the definition of bar: now it becomes a generic function instead of a simple function. bar(f) # bar(f) gives 1. (You forgot the f = new(foo, x= 1) line, but that's somewhat obvious.) bar - function(object) { return(0) } Now bar is a regular function again. bar.foo - function(object) { print([EMAIL PROTECTED]) } setMethod(bar, foo, bar.foo) Now the generic would call that method, but you've wiped out the generic. f = new(foo, x= 1) bar(f) # bar(f) gives 0, not 1. The problem is that setting a method on a regular function automagically creates a generic for it, but redefining a function doesn't remove the generic. It's still there, somewhere in R's insides, and if you could find it to call it your method would get called. But you're calling the plain old bar() instead. This behaviour makes more sense if you think about generics in other packages. There's a generic called show in the methods package. But you can define your own function called show, and in your workspace, you'd want to call that, not the one from methods. I'd recommend using setGeneric() to create a generic, rather than depending on the automatic creation, to avoid this kind of confusion. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] oddity with method definition
On Mon, 27 Aug 2007, Faheem Mitha wrote: Just wondered about this curious behaviour. I'm trying to learn about classes. Basically setMethod works the first time, but does not seem to work the second time. Faheem. * setClass(foo, representation(x=numeric)) bar - function(object) { return(0) } bar.foo - function(object) { print([EMAIL PROTECTED]) } setMethod(bar, foo, bar.foo) bar(f) # bar(f) gives 1. Not for me. It gives bar(f) Error: object f not found Error in bar(f) : error in evaluating the argument 'object' in selecting a method for function 'bar' However, if I do f = new(foo, x= 1) first, it gives 1. bar - function(object) { return(0) } Here you have masked the generic bar() with a new function bar(). Redefining bar() is the problem, not the second setMethod(). bar.foo - function(object) { print([EMAIL PROTECTED]) } setMethod(bar, foo, bar.foo) Because there was a generic bar(), even though it is overwritten by the new bar(), setMethod() doesn't automatically create another generic. f = new(foo, x= 1) bar(f) # bar(f) gives 0, not 1. Because bar() isn't a generic function bar function(object) { return(0) } If you had used setGeneric() before setMethod(), as recommended, your example would have done what you expected, but it would still have wiped out any previous methods for bar() -- eg, try setMethod(bar,baz, function(object) print(baz)) before you redefine bar(), and notice that getMethod(bar,baz) no longer finds it. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating diameters of cirkels in a picture.
Hi Bart, Let's assume that you situation was simpler - you have a BW (Black and White) image containing circles (in white) and you need to find the diameter of each circle (and of course to know how many circles you have). This can be done with labeling of connected components. You say that two pixels are neighbors if they have common edge (4-connectivity) or at least a common vertex (8-connectivity). So now you can treat your image (white pixels) as a graph (with edges connecting any two neighbors). Then each connected component of that graph corresponds to a circle. There exists a well know algorithm to do this. It takes the original BW image (where every image pixel has the value of 1 and background pixel the value of 0) and produces an image where every background pixel still has the value of 0, every pixel of the first connected component has the value of 1, every pixel of the second connected component has the value of 2, etc. So no you can process each connected component (circle in your case) separately. Basically this is all you need. You can either count the number of pixels having the value of k to find the area (and then the diameter) or just take (maximal x value) - (minimal x value) + 1. In your case it can happen that after you convert your image into BW image some circles will have holes inside with some small objects inside these holes, and you do not want to consider these small objects as additional circles. So I thought of using morphological closing to get rid of small holes, but as I wrote in the following note you do not need this. When you get the BW image take the complimentary one (i.e. background pixels have the value of 1 and image pixels the value of 0). Label the connected components of the background. Only one of them is real background - all others are inside circles. Real background touches the image boundaries. Now go to the original BW image and give all the pixels outside the real background the value of 1. Now all your circles are full (no holes) and you can proceed as above. Best regards, Moshe. --- Bartjoosen [EMAIL PROTECTED] wrote: Hi All, I really like to thank you for the answers, while I was searching for some edge detection and clustering algorithms, Moshe came with a simple but effective solution: use the area to find the diameter! But I tried Moshe's solution, but I couldn't figure out what you mean with morphological closing and the labeling to split the images. Could you please clarify this a bit? Thanks for your support Bart Moshe Olshansky-2 wrote: Hi Bart, One more comment: You do not really need the morphological closing to close the holes inside the circles. Another possibility is to reverse the black-and-withe picture, i.e. make the holes and background be 1 and the circles 0, label the connected components and then only the component which touches the boundaries is the background while all other components are holes and you can make them white (1) in the original black-and-white image. --- Moshe Olshansky [EMAIL PROTECTED] wrote: Hi Bart, I have never used image processing software in R (I was doing this with Matlab), but here is what I would have done algorithmically: 1) convert the picture to gray-scale 2) find a threshold value which separates the circles from the background and convert your image to black and white 3) if the circles are far apart use morphological closing to fill in small holes inside the circles (may be do this several times) 4) use labeling to split the image into connected components 5) for each connected component get it's area (the number of pixels) and use the formula S = Pi*R^2 to find the approximate radii. Regards, Moshe. --- Julian Burgos [EMAIL PROTECTED] wrote: Hi Bart, If you only have 36 circles, the fastest way would be to use some image processing software and measure the circles by hand. One option is to use ImageJ, which you can download here http://rsb.info.nih.gov/ij/ Julian Bart Joosen wrote: Hi, Maybe this is more a programming questions than a specific R-project question, but maybe there is someone who can point me in the right direction. I have a picture of cirkels which I took with a digital camera. Now I want to use the diameter of the cirkels on the picture for analysis in R. I can use pixmap to import the picture, but how do I find the outside cirkels and calculate the diameter? I pointed out that I can use the edci package, but then I need to preprocess the data to reduce the points, otherwise it takes a long time, and my computer crashes. If you want to see such a picture, I cropped a larger one, and highlighted the cirkel which is of interest. In a real world, this is a plate with 36 cirkels, which all should be
Re: [R] subset question
Here is one way of checking to see if a row contains a particular value and setting the contents of a new column: n - 20 # create test data x - data.frame(sample(letters,n),sample(letters,n),sample(letters,n),sample(letters,n)) # add a column indicating if the row contains 'a', 'b' or 'c' x$a - apply(x[, 1:4], 1, function(.row) any(.row %in% c('a','b','c'))) + 0 On 8/27/07, Kirsten Beyer [EMAIL PROTECTED] wrote: I would like to code records in a dataset with a 1 if any of the columns 9-67 contain a particular code, and zero if they don't. I've been working with subset and it seems that something like subset(data, data[9:67]--12345) would work, but I have been unsuccessful so far. It seems like a simple problem - any help is appreciated! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] validate (package Design): error message subscript out of bounds
Wentzel-Larsen, Tore wrote: Dear R users I use Windows XP, R2.5.1 (I have read the posting guide, I have contacted the package maintainer first, it is not homework). In a research project on renal cell carcinoma we want to compute Harrell's c index, with optimism correction, for a multivariate Cox regression and also for some univariate Cox models. For some of these univariate models I have encountered an error message (and no result produced) from the function validate i Frank Harrell's Design package: Error in Xb(x[, xcol, drop = FALSE], coef, non.slopes, non.slopes.in.x, : subscript out of bounds The following is an artificial example wherein I have been able to reproduce this error message (actual data has been changed to preserve confidentiality): I could not reproduce the error on R 2.5.1 on linux using version 2.0-12 of Design (you did not provide this information). Your code involved a good deal of extra typing. Here is a streamlined version: bc - data.frame(time1 = c(9,24,28,43,58,62,66,107,116,118,123, 127,129,131,137,138,139,140,148,169,176,179,188,196,210,218, bc library(Design) dd - with(bc, datadist(bc1, age, adjto.cat='first')) options(datadist = 'dd') f - cph(Surv(time1,status1) ~ bc1, data = bc, x=TRUE, y=TRUE, surv=TRUE) anova(f) f summary(f) val - validate(f, B=200, dxy=TRUE) I don't get much value of putting the type of an object as part of the object's name, as information within objects defines the object type/class. There is little reason to validate a one degree of freedom model. Frank library(Design) # an example data frame: frame.bc - data.frame(time1 = c(9,24,28,43,58,62,66,107,116,118,123, 127,129,131,137,138,139,140,148,169,176,179,188,196,210,218, 1,1,1,2,2,3,4,8,23,32,33,34,43,44,48,51,52,54,59,59,60,60,62, 65,65,68,70,72,73,74,81,84,88,98,99,106,107,115,115,117,119, 120,122,122,122,122,126,128,130,135,136,136,138,149,151,154, 157,159,161,164,164,164,166,172,172,176,179,180,183,183,184, 187,190,197,201,201,203,203,203,209,210,214,219,227,233,4,18, 49,113,147,1,1,2,2,2,2,2,3,4,6,6,6,6,6,6,6,6,9,9,9,9,9,10,10, 10,11,12,12,12,13,14,14,17,18,18,19,19,20,20,21,21,21,21,22,23, 23,24,28,28,29,29,32,34,35,38,38,48,48,52,52,54,54,56,64,67,67, 69,70,70,72,84,88,90,114,115,140,142,154,171,195), status1 = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1), bc1 = factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2), labels=c('bc.1','bc.2')), age = c(58,68,23,20,50,43,41,69,20,48,19,27,39,20,65,49,70,59,31,43,25, 61,60,45,34,59,32,58,30,62,26,44,52,29,40,57,33,18,50,50,55,51,38,34, 69,56,67,38,66,21,48,39,62,62,29,68,66,19,60,39,55,42,24,29,56,61,40, 52,19,40,33,67,66,51,48,63,60,58,68,60,53,20,45,62,37,38,61,63,43,67, 49,39,43,67,49,69,32,37,32,63,33,47,66,39,23,57,26,61,20,49,69,30,40, 29,38,66,60,69,69,44,65,25,41,53,18,55,45,59,49,27,51,29,67,26,24,26, 47,23,50,27,35,45,32,26,45,45,63,39,39,22,38,27,31,27,49,65,66,49,39, 21,51,49,55,63,19,26,50,21,24,34,65,33,55,33,36,53,48,25,54,58,60,34, 47,23,34,60,39,34,22,30,41,55,64,48,34,54)) frame.bc # preparing for a simple univariate Cox regression: dd.bc - datadist(frame.bc[, c('bc1','age')], adjto.cat='first') options(datadist = 'dd.bc') # a univariate Cox regression: cph.bc - cph(formula = Surv(time1,status1)~bc1, data = frame.bc, x=TRUE, y=TRUE, surv=TRUE) anova(cph.bc) cph.bc summary(cph.bc) # the validate command for the Cox model: val.cph.bc - validate(cph.bc, B=200, dxy=TRUE , pr=TRUE) -- Output from the validate command: training test Dxy -0.124360 -0.1423409 R2 1.00 1.000 Slope 1.00 0.7919584 D 0.016791 0.0147536 U -0.002395 0.0006448 Q 0.019186 0.0141088 training test Dxy -0.191875 -0.1423409 R2 1.00 1.000 Slope 1.00 0.8936724 D 0.022397 0.0147536 U -0.002339 0.0001367 Q 0.024736 0.0146169 training test Dxy -0.199514 -0.1423409 R2 1.00
Re: [R] How to provide argument when opening RGui from an external application
Thanks everyone. I actually thought about ?Rscript.exe but, having used only Rgui, I thought it was a instruction specific to this interface. I will look into it. Sebastien Gabor Grothendieck a écrit : There are also some batch files that can be used with Rscript on XP and info in the README here: http://batchfiles.googlecode.com On 8/26/07, Sébastien [EMAIL PROTECTED] wrote: Thanks for your reply. When you say look into Rscript.exe, do you have a specific document in mind ? I tried to google it but could not find much... I forgot to mention in my first email that I am working under the Windows XP environment. Prof Brian Ripley a écrit : Look into Rscript.exe (on Windows), which is a flexible way to run scripts. Neither using a GUI nor using source() are recommended. On Fri, 24 Aug 2007, Sébastien wrote: Dear R-users, I have written a small application (in visual basic) that automatically generate some R scripts. I would like to execute these scripts when my application is being closed. My problem is that I don't know how to pass the 'source(c:/.../myscript.r)' instruction when I programmatically start RGui. Tinn-R is capable of doing such things, so I guess there must be a way to pass arguments to RGui. Any advice or link to relevant references would be greatly appreciated. Sebastien __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Excel
A common process when data is obtained in an Excel spreadsheet is to save the spreadsheet as a .csv file then read it into R. Experienced users might have learned to be wary of dates (as I have) but possibly have not experienced what just happened to me. I thought I might just share it with r-help as a cautionary tale. I received an Excel file giving patient details. Each patient had an ID code in the form of three letters followed by four digits. (Actually a New Zealand National Health Identification.) I saved the .xls file as .csv. Then I opened up the .csv (with Excel) to look at it. In the column of ID codes I saw: Aug-99. Clicking on that entry it showed 1/08/2699. In a column of character data, Excel had interpreted AUG2699 as a date. The .csv did not actually have a date in that cell, but if I had saved the .csv file it would have. David Scott _ David Scott Department of Statistics, Tamaki Campus The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000 Email: [EMAIL PROTECTED] Graduate Officer, Department of Statistics Director of Consulting, Department of Statistics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with lme using glht for multiple comparisons
Hi everyone, I am new to R and have a question that relates to unplanned post-hoc comparisons using the multcomp package after a mixed effects model. I couldn't find the answer to it in the archive or in any manual. I have a dataset in which several plants have been treated in a particular way and a continuous response variable has been measured depending on several leaves per plant. I am now interested in the effect of the treatment depending on the age of the leaves examined. So the dataset (L1) consists of a continuous response variable (EFN), a fixed factor (Leafage), and a random factor (Plant). I have set up the following mixed effects model, which works fine: LM-lme(EFN~Leafage,L1,~1|Plant) Now all I want to do is a post-hoc analysis (multiple comparisons) for the fixed factor EFN. I tried the following code. According to the documentation this should work: Post - glht(LM, linfct = mcp(Leafage = Tukey)) However, I get this error message and don't know what to do: Error in mcp2matrix(model, linfct = linfct) : Factor(s) Leafage have been specified in ‘linfct’ but cannot be found in ‘model’! The factor is specified, right? So what is the problem? If I do the same with an normal Anova (command: aov), it works. What is the problem with the lme command? Thank you very much in advance for your help. Cheers, Christian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Excel
If you format the column as Text, you won't have this problem. By leaving the cells as General, you leave it up to Excel to guess at the correct interpretation. You will note that the conversion to a date occurs immediately in Excel when you enter the value. There are many formats to enter dates. Either pre-format the column as Text, or prefix the individual entry with an ' to indicate text. A similar problem occurs in R's read.table() function when a factor has levels that can be interpreted as numbers. At 10:11 PM 8/27/2007, David wrote: A common process when data is obtained in an Excel spreadsheet is to save the spreadsheet as a .csv file then read it into R. Experienced users might have learned to be wary of dates (as I have) but possibly have not experienced what just happened to me. I thought I might just share it with r-help as a cautionary tale. I received an Excel file giving patient details. Each patient had an ID code in the form of three letters followed by four digits. (Actually a New Zealand National Health Identification.) I saved the .xls file as .csv. Then I opened up the .csv (with Excel) to look at it. In the column of ID codes I saw: Aug-99. Clicking on that entry it showed 1/08/2699. In a column of character data, Excel had interpreted AUG2699 as a date. The .csv did not actually have a date in that cell, but if I had saved the .csv file it would have. David Scott Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Excel
On Tue, 28 Aug 2007, Robert A LaBudde wrote: If you format the column as Text, you won't have this problem. By leaving the cells as General, you leave it up to Excel to guess at the correct interpretation. Not true actually. I had converted the column to Text because I saw the interpretation as a date in the .xls file. I saved the .csv file *after* the column had been converted to Text. Looking at the .csv file in a text editor, the entry is correct. I have just rechecked this. On reopening the .csv using Excel, the entry AUG2699 had been interpreted as a date, and was showing as Aug-99. Most bizarre is that the NHI value of AUG1838 has *not* been interpreted as a date. David Scott You will note that the conversion to a date occurs immediately in Excel when you enter the value. There are many formats to enter dates. Either pre-format the column as Text, or prefix the individual entry with an ' to indicate text. A similar problem occurs in R's read.table() function when a factor has levels that can be interpreted as numbers. At 10:11 PM 8/27/2007, David wrote: A common process when data is obtained in an Excel spreadsheet is to save the spreadsheet as a .csv file then read it into R. Experienced users might have learned to be wary of dates (as I have) but possibly have not experienced what just happened to me. I thought I might just share it with r-help as a cautionary tale. I received an Excel file giving patient details. Each patient had an ID code in the form of three letters followed by four digits. (Actually a New Zealand National Health Identification.) I saved the .xls file as .csv. Then I opened up the .csv (with Excel) to look at it. In the column of ID codes I saw: Aug-99. Clicking on that entry it showed 1/08/2699. In a column of character data, Excel had interpreted AUG2699 as a date. The .csv did not actually have a date in that cell, but if I had saved the .csv file it would have. David Scott Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ David Scott Department of Statistics, Tamaki Campus The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000 Email: [EMAIL PROTECTED] Graduate Officer, Department of Statistics Director of Consulting, Department of Statistics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Excel
As far as I understand, changing the format changes the way data is displayed by Excel but this does not change the data itself - if while reading the data Excel decided that it was a date, it is being converted to an integer (the number of days since January 1, 1900 - and they mistakenly think that 1900 was a leap year) and it is stored this way. --- David Scott [EMAIL PROTECTED] wrote: On Tue, 28 Aug 2007, Robert A LaBudde wrote: If you format the column as Text, you won't have this problem. By leaving the cells as General, you leave it up to Excel to guess at the correct interpretation. Not true actually. I had converted the column to Text because I saw the interpretation as a date in the .xls file. I saved the .csv file *after* the column had been converted to Text. Looking at the .csv file in a text editor, the entry is correct. I have just rechecked this. On reopening the .csv using Excel, the entry AUG2699 had been interpreted as a date, and was showing as Aug-99. Most bizarre is that the NHI value of AUG1838 has *not* been interpreted as a date. David Scott You will note that the conversion to a date occurs immediately in Excel when you enter the value. There are many formats to enter dates. Either pre-format the column as Text, or prefix the individual entry with an ' to indicate text. A similar problem occurs in R's read.table() function when a factor has levels that can be interpreted as numbers. At 10:11 PM 8/27/2007, David wrote: A common process when data is obtained in an Excel spreadsheet is to save the spreadsheet as a .csv file then read it into R. Experienced users might have learned to be wary of dates (as I have) but possibly have not experienced what just happened to me. I thought I might just share it with r-help as a cautionary tale. I received an Excel file giving patient details. Each patient had an ID code in the form of three letters followed by four digits. (Actually a New Zealand National Health Identification.) I saved the .xls file as .csv. Then I opened up the .csv (with Excel) to look at it. In the column of ID codes I saw: Aug-99. Clicking on that entry it showed 1/08/2699. In a column of character data, Excel had interpreted AUG2699 as a date. The .csv did not actually have a date in that cell, but if I had saved the .csv file it would have. David Scott Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ David Scott Department of Statistics, Tamaki Campus The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000 Email:[EMAIL PROTECTED] Graduate Officer, Department of Statistics Director of Consulting, Department of Statistics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.