Re: [R] Efficiency of C Compiler in R CMD SHLIB
Since the context is missing in this message, from others this is about 32-bit Windows. On Thu, 15 Apr 2010, yehengxin wrote: Thanks for your response. I found the folder to modify the compiler for C source codes. C++ 6.0 is an old C programming environment (1994~1998) but it is efficient. When compiling C source codes in C programming environment, one needs to choose between debug or release modes. That's true for just one family of compilers in my experience. release mode is much faster than debug mode. But in R's R CMD SHLIB, I did not see such an option. Then I suspect you did not look in the obvioud place (mentioned on the help page), for I see % R CMD SHLIB --help ... Windows only: -d, --debug build a debug DLL An optimized ('release') build is standard, and in any case gcc is capable of both optimizing and including debug information, unlike some other compilers. With gcc debug code is normally the same speed, just a larger compiled file. I want to try alternative compilers to see if I can reach that level of efficiency in R's DLL. This is *your* DLL, not one in R, surely? Note that people have compiled R with Visual C++ 6.0 (to use the correct name) and it ran slower and less accurately than using gcc. So finding VC++ to produce faster code is not usual, and this seems to be something special about your C code. The default level of optimization for gcc in R for Windows is -O3, and you could try raising it: also if you want to target only recent non-Atom chips set -tune= appropriately. x86 is a very widely used architecture with a competitive field of commercial compilers. On Linux (and AFAIK on Windows) gcc produces some of the best-performing code (see the comments in the 'R Administration and Installation Manual'). Most of the ways to produce faster code lose compliance with IEC60559 and accuracy (VC++ 6 never has those). And the same code compiled with gcc runs on the same hardware only slightly slower on Windows than on Linux unless I/O is involved (where Windows is much slower). Later, I may try using OPENMP in my C codes to do parallel computing. gcc 4.2.1 supports OpenMP, and later versions support it better (OpenMP 3). So I need to figure out how to change compiler to generate DLL for R. Could you give me some suggestions? Thanks a lot! A DLL is a DLL: you can compile it any way you like (although cdecl calling conventions work best, and compilers do differ in their conventions for function return values -- but those are not used in the .C interface). There is a file README.packages in the R distribution with notes about using other compilers under Windows -- but the R developers have not used other than VC++ and Intel's ICC (not mentioned there) for several years. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regression using R
Samuel Bravo wrote: I'm working on a very large project in which we do many calculations which include many types of regression such as, Liner, Quadratic, Cubic, Exponential, Sinusoidal, and Logarithmic. Students are often looking at the wrong place. It's not intuitive that quadratic, cubic can be found under lm, because these are often termed non-linear in basic university courses. Dieter -- View this message in context: http://n4.nabble.com/Regression-using-R-tp1934475p1951658.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Replace / with - in date
Why don't you try something like : Xd$x=as.date(xd$x,format=%y/%m/%d). -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Christian Raschke Sent: Thursday, April 15, 2010 8:28 PM To: r-help@r-project.org Subject: Re: [R] Replace / with - in date Is there anything that speaks against just applying gsub to the factor levels if one would like to keep everything as factors (and not consider true Date classes or character vectors)? I.e: x - c(2000/01/01, 2001/02/01) xd - as.data.frame(x) levels(xd$x) - gsub(/, -, levels(xd$x)) Christian On 04/15/2010 01:08 PM, David Winsemius wrote: On Apr 15, 2010, at 1:51 PM, prem_R wrote: Hi,every one .I have searched the solutions in the forum for replacing my date value which is in a data frame ,01/01/2000 to 01-01-2000 using replace function but got the following warning message x-2000/01/01 xd-as.data.frame(x) xd$x-replace(xd$x,xd$x==/,-) The replace function does not work with factors, it works with (complete) vectors, not substrings. It's also a real hassle to do such operations on factors, so just use character vectors and try gsub instead: x-2000/01/01 xd-as.data.frame(x, stringsAsFactors=FALSE) xd$x2-gsub(/,-, xd$x) xd x x2 1 2000/01/01 2000-01-01 Warning message: In `[-.factor`(`*tmp*`, list, value = -) : invalid factor level, NAs generated Is there any other method of doing it? or am i missing something?. please let me know if you need any more information. Thanks. Prem -- View this message in context: http://n4.nabble.com/Replace-with-in-date-tp1911391p1911391.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Christian Raschke Department of Economics and ISDS Research Lab (HSRG) Louisiana State University Patrick Taylor Hall, Rm 2128 Baton Rouge, LA 70803 cras...@lsu.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Does sink stand for anything?
On Fri, Apr 16, 2010 at 1:49 AM, Sharpie ch...@sharpsteen.net wrote: Sink captures R output and directs it elsewhere- common places are a file or device such as /dev/null Personally it always connected with the concept of a sink in a mathematical system as something that removes constituants from the system. Also, note that 'sink' has nothing to do with floating point numbers... -- blog: http://geospaced.blogspot.com/ web: http://www.maths.lancs.ac.uk/~rowlings web: http://www.rowlingson.com/ twitter: http://twitter.com/geospacedman pics: http://www.flickr.com/photos/spacedman __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
thanks thierry, i considered this transformations already, but variance is not stabilized and/or normality is neither achieved. i guess i'll have to look out for non-parametrics? best regards, kay -- View this message in context: http://n4.nabble.com/glmer-with-non-integer-weights-tp1837179p1965623.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data frame manipulation
Dear group, Here is my data.frame : df - structure(list(DESCRIPTION = c(PRM HGH GD ALU, PRM HGH GD ALU, PRIMARY NICKEL, PRIMARY NICKEL, PRIMARY NICKEL, PRIMARY NICKEL, STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD), CREATED.DATE = structure(c(14708, 14708, 14672, 14673, 14678, 14678, 14700, 14700, 14700, 14700, 14700, 14700, 14700, 14705, 14707, 14707, 14707, 14708, 14708, 14708, 14708, 14708, 14622, 14634), class = Date), QUANITY = c(-1, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1, -1, 1, 1, 1, -1), CLOSING.PRICE = c(2,415.9000, 2,415.9000, 25,755.7100, 25,755.7100, 25,760.8600, 25,760.8600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,357.1200, 2,420.7300, 2,420.7300, 2,420.7300, 2,421.0500, 2,421.0500, 2,421.0500, 2,421.0500, 2,421.0500, 2,388.4300, 2,388.4300)), .Names = c(DESCRIPTION, CREATED.DATE, QUANITY, CLOSING.PRICE), row.names = 26:49, class = data.frame) I am looking at summarize it in something like this : op DESCRIPTION POSITION DATE 1 PRIMARY NICKEL0 2010-03-10 2 PRM HGH GD ALU0 2010-04-09 3 SPCL HIGH GRAD2 2010-04-09 4 STANDARD LEAD 0 2010-04-06 To obtain op, I wrote this following line : op=ddply(df, c(DESCRIPTION), summarise, POSITION= sum(QUANITY),DATE=max(CREATED.DATE)). Until there, fine. But I need to have one more column, CLOSING.PRICE. If I write this line : op1=ddply(c, c(DESCRIPTION,CLOSING.PRICE), summarise, POSITION= sum(QUANITY),DATE=max(CREATED.DATE)) Here is what I get: op1 DESCRIPTION CLOSING.PRICE POSITION DATE 1 PRIMARY NICKEL 25,755.71000 2010-03-05 2 PRIMARY NICKEL 25,760.86000 2010-03-10 3 PRM HGH GD ALU2,415.90000 2010-04-09 4 SPCL HIGH GRAD2,388.43000 2010-01-25 5 SPCL HIGH GRAD2,420.73001 2010-04-08 6 SPCL HIGH GRAD2,421.05001 2010-04-09 7 STANDARD LEAD 2,355.9600 -1 2010-04-01 8 STANDARD LEAD 2,357.12001 2010-04-06 Not exactly what I want. Can anyone help? TY __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
thank you thomas for the helpful hint! yours, kay -- View this message in context: http://n4.nabble.com/glmer-with-non-integer-weights-tp1837179p1965827.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in library(gplots) : there is no package called 'gplots'
Thanks for your suggestion Tal. Unfortunately, still no luck with me ... still get the usual error message: Error in library(gplots) : there is no package called 'gplots' , whatever I try to install. This is a mystery to me with respect to why /how. I am really stuck with that problem. Best, Valère -- View this message in context: http://n4.nabble.com/Error-in-library-gplots-there-is-no-package-called-gplots-tp1690367p1968197.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in library(gplots) : there is no package called 'gplots'
Hi Vava, What version of R are you using? I'm not sure but I think that R will refuse to install a package in this way if the version of gplots is incompatiable with the version of R you're using. You can check the depends of packages on CRAN. Regards, James Vava wrote: Thanks for your suggestion Tal. Unfortunately, still no luck with me ... still get the usual error message: Error in library(gplots) : there is no package called 'gplots' , whatever I try to install. This is a mystery to me with respect to why /how. I am really stuck with that problem. Best, Valère __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Return a variable name
Hello, how can I return the name of a variable, say a$b, from a function? fun - function(x){ return(substitute(x)); } a - data.frame(b=1:10); fun(a$b) ... returns a$b, but this is a type language, thus I can't use it as a character string, can I? How? Thanks for help, Sören __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] merge
I have a problem with the merge function: I need to merge the data.frames that you will find as arrachmente...I try all the possible combinationsbut none seems to work properly Does anyone knows how to do it?? thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error at R CMD check
Hi, I generated an R package but at running R CMD check, I got the following error message for the first data file: *** installing help indices Building/Updating help pages for package 'jamda' Formats: text html latex example f1 texthtmllatex example f2texthtmllatex example f3 texthtmllatex example f4texthtmllatex example f5texthtmllatex example f6 texthtmllatex example too many pairs of braces in file 'data1.Rd' at /usr/lib64/R/share/per l/R/Rdconv.pm line 295, $rdfile line 7076. ERROR: building help failed for package ‘my_package’ Should the data sets be in a specific format? Mine contains data in float seperated by tab with column names and row names. No description in DESCRIPTIOn file yet. Thanks Carol __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with the version of R
Dear users, I am using R in UBUNTU , but the version is 9.1. How can I upgrade it to R 10.1? -- Arindam Fadikar M.Stat Indian Statistical Institute. New Delhi, India [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bootstrapping a repeated measures ANOVA
Hello everyone, i have a question regarding the sampling process in boot(). I try to bootstrap F-values for a repeated measures ANOVA to get a confidence interval of F-values. Unfortunately, while the aov works fine, it fails in the boot()-function. I think the problem might be that the resampling process fails to select both lines of data representing the 2 measuring times for one subject and I therefore get missing cases. The data is organised like this: subject ortmz PHQ 1 1 1 x 1 1 2 y 2 1 1 z 2 1 2 zz ... Is there any way to specify, that both lines need to be selected? Thanks a lot! Felix Fischer P.S. If you need to have a look to my code: F_values - function(formula, data, indices) { d - data[indices,] # allows boot to select sample fit=aov(formula,data=d) #fit model return(c(summary(fit)[1][[1]][[1]]$`F value`, summary(fit)[2][[1]][[1]]$`F value`)) #return F-values } results - boot(data=anova.daten, statistic=F_values, R=10, formula=PHQ_Sum_score~mz*ort+Error(subject/mz)) Dipl. Psych. Felix Fischer Medizinische Klinik mit Schwerpunkt Psychosomatik Charité -- Universitätsmedizin Berlin Luisenstr. 13a 10117 Berlin Tel.: 030 - 450 553575 Email: felix.fisc...@charite.demailto:felix.fisc...@charite.de [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [SPAM] Re: Error in library(gplots) : there is no package called 'gplots'
Dear James, i have tried to install the package RODBC1-3.1 with R version 2.9.0 and 2.10.1 (using Opensuse 11.1, 64bits). I have tried install.packages locally (package downloaded and stored locally on the computer) or directly from Internet (using different mirrors !). Same results each time ... Same outcome either if I try to install other packages like for instance e1071. So it appears this is not linked to the package itself but rather with R-base or R-devel (both are installed) ... Regards, Valère -Ursprüngliche Nachricht- Von: james [mailto:ja...@ipec.co.uk] Gesendet: Freitag, 16. April 2010 10:47 An: Martin Valere Cc: R Help List Betreff: [SPAM] Re: [R] Error in library(gplots) : there is no package called 'gplots' Wichtigkeit: Niedrig Hi Vava, What version of R are you using? I'm not sure but I think that R will refuse to install a package in this way if the version of gplots is incompatiable with the version of R you're using. You can check the depends of packages on CRAN. Regards, James Vava wrote: Thanks for your suggestion Tal. Unfortunately, still no luck with me ... still get the usual error message: Error in library(gplots) : there is no package called 'gplots' , whatever I try to install. This is a mystery to me with respect to why /how. I am really stuck with that problem. Best, Valère -Ursprüngliche Nachricht- Von:Martin Valere Gesendet: Donnerstag, 25. März 2010 10:58 An: 'r-help@R-project.org' Betreff:Error in library(gplots) : there is no package called 'gplots' Dear all, I have an issue trying to install new packages (have tried with RODBC_1.3-1, gplots_2.6.1, gtools_2.7.4 packages) and get the same error message : Error in library(gplots) : there is no package called 'gplots' Only clue I have found so far on the Web is related to Perl (Perl modules are installed on my computer, but which one is related to gplots if any ?); no gplots in usr/lib or /usr/lib64 at least ... I am somewhat lost here, having no idea about Perl (if Perl is really the issue ?). I am using OpenSuse 11.1 (64bits); and R version 2.9.0. Installation of package is performed offline as Root. Valère, Switzerland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error at R CMD check
carol white wrote: Hi, I generated an R package but at running R CMD check, I got the following error message for the first data file: *** installing help indices Building/Updating help pages for package 'jamda' Formats: text html latex example f1 texthtmllatex example f2texthtmllatex example f3 texthtmllatex example f4texthtmllatex example f5texthtmllatex example f6 texthtmllatex example too many pairs of braces in file 'data1.Rd' at /usr/lib64/R/share/per l/R/Rdconv.pm line 295, $rdfile line 7076. ERROR: building help failed for package ‘my_package’ Should the data sets be in a specific format? Mine contains data in float seperated by tab with column names and row names. No description in DESCRIPTIOn file yet. data1.Rd shouldn't be a dataset, it should be a help file describing a dataset. In a more recent version of R you might get a more informative error message, telling you where the error was in that file. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R CMD REMOVE etc. query
Assuming its the last package that you loaded detach() without arguments will detach it. On Thu, Apr 15, 2010 at 1:45 PM, Prof. John C Nash nas...@uottawa.ca wrote: Brian Ripley pointed out that the library() documentation (third screen, however) says that library() and require() check current environment to see if a package is loaded and only load if it is not present. I may have oversimplified, and clarifications welcome. But this is clearly NOT what I want, since I need the latest package version to test. Tentative solution is outlined, but suggestions welcome on string cleanup issue mentioned. As I need to remove a package and its dependencies before reloading, I can use tool::pkgDepends to get a list. I found that a character string extracted from the dependency vector gives 'invalid name' error in detach(). That is, I can create a variable myfoo=package:foo, but detach(myfoo) gives the error while typing detach(package:foo) works fine. Workaround seems to be slist-search() idx-which(slist==myfoo) detach(idx) There's still a nuisance issue of how to strip off the (=0.7.11) descriptors in the dependency list. strsplit() will work, but I seem to need to loop through the list to use it when only some of the packages are restricted by qualifiers. If someone has already dealt with this type of issue, I'd be happy to know. For example, if there is a forceLoad() somewhere, it would save the effort above and could be useful for developers to ensure they are using the right version of a package. JN From: Prof. John C Nash nashjc_at_uottawa.ca Date: Thu, 15 Apr 2010 10:17:46 -0400 I've been working on a fairly complex package that is a wrapper for several optimization routines. In this work, I've attempted to do the following: * edit the package code foo.R * in a root terminal at the right directory location R CMD REMOVE foo R CMD INSTALL foo However, I don't get the right code. In fact, if I just do the remove, library(foo) does not throw an error. If I stop my R session and restart it, I do. Is this expected behaviour? For information, I run scripted tests that start with rm(list=ls()) library(foo) to ensure I'm getting new code each time. If desired I can provide a minimal package to show this, but I expect that it is a known issue for which I've missed the documentation. Perhaps there is a command to reset the session. I did a brief search, but appropriate keywords pick up a lot of irrelevant material. JN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame manipulation
Hi, I'm not sure I understand what you want exactly. My best guess is that you want something like op=ddply(DF, c(DESCRIPTION), summarise, POSITION= sum(QUANITY),DATE=max(CREATED.DATE), CLOSING.PRICE = CLOSING.PRICE[CREATED.DATE == max(CREATED.DATE)]) op - unique(op) Does that do it? -Ista On Fri, Apr 16, 2010 at 4:16 AM, arnaud Gaboury arnaud.gabo...@gmail.comwrote: Dear group, Here is my data.frame : df - structure(list(DESCRIPTION = c(PRM HGH GD ALU, PRM HGH GD ALU, PRIMARY NICKEL, PRIMARY NICKEL, PRIMARY NICKEL, PRIMARY NICKEL, STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD), CREATED.DATE = structure(c(14708, 14708, 14672, 14673, 14678, 14678, 14700, 14700, 14700, 14700, 14700, 14700, 14700, 14705, 14707, 14707, 14707, 14708, 14708, 14708, 14708, 14708, 14622, 14634), class = Date), QUANITY = c(-1, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1, -1, 1, 1, 1, -1), CLOSING.PRICE = c(2,415.9000, 2,415.9000, 25,755.7100, 25,755.7100, 25,760.8600, 25,760.8600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,357.1200, 2,420.7300, 2,420.7300, 2,420.7300, 2,421.0500, 2,421.0500, 2,421.0500, 2,421.0500, 2,421.0500, 2,388.4300, 2,388.4300)), .Names = c(DESCRIPTION, CREATED.DATE, QUANITY, CLOSING.PRICE), row.names = 26:49, class = data.frame) I am looking at summarize it in something like this : op DESCRIPTION POSITION DATE 1 PRIMARY NICKEL0 2010-03-10 2 PRM HGH GD ALU0 2010-04-09 3 SPCL HIGH GRAD2 2010-04-09 4 STANDARD LEAD 0 2010-04-06 To obtain op, I wrote this following line : op=ddply(df, c(DESCRIPTION), summarise, POSITION= sum(QUANITY),DATE=max(CREATED.DATE)). Until there, fine. But I need to have one more column, CLOSING.PRICE. If I write this line : op1=ddply(c, c(DESCRIPTION,CLOSING.PRICE), summarise, POSITION= sum(QUANITY),DATE=max(CREATED.DATE)) Here is what I get: op1 DESCRIPTION CLOSING.PRICE POSITION DATE 1 PRIMARY NICKEL 25,755.71000 2010-03-05 2 PRIMARY NICKEL 25,760.86000 2010-03-10 3 PRM HGH GD ALU2,415.90000 2010-04-09 4 SPCL HIGH GRAD2,388.43000 2010-01-25 5 SPCL HIGH GRAD2,420.73001 2010-04-08 6 SPCL HIGH GRAD2,421.05001 2010-04-09 7 STANDARD LEAD 2,355.9600 -1 2010-04-01 8 STANDARD LEAD 2,357.12001 2010-04-06 Not exactly what I want. Can anyone help? TY __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting number of values by row (text, not numbers)
Hi everyone! I am very new to R and I am having some difficulties. My data set looks something like this: ABCD E cat monkey cat dogcat cat __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Return a variable name
Hi Sören Somehow this feels dirty, but you can do fun - function(x){ result - capture.output(print(substitute(x))) return(result); } -Ista On Fri, Apr 16, 2010 at 5:26 AM, soeren.vo...@eawag.ch wrote: Hello, how can I return the name of a variable, say a$b, from a function? fun - function(x){ return(substitute(x)); } a - data.frame(b=1:10); fun(a$b) ... returns a$b, but this is a type language, thus I can't use it as a character string, can I? How? Thanks for help, Sören __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with the version of R
Hi Arindam, Follow the instructions at http://lib.stat.cmu.edu/R/CRAN/bin/linux/ubuntu/ -Ista On Fri, Apr 16, 2010 at 5:54 AM, arindam fadikar arindam.fadi...@gmail.comwrote: Dear users, I am using R in UBUNTU , but the version is 9.1. How can I upgrade it to R 10.1? -- Arindam Fadikar M.Stat Indian Statistical Institute. New Delhi, India [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Counting number of values by row (text, not numbers)
Hi everyone! I am very new to R and I am having some difficulties. My data set looks something like this: subjectA B C D E 1 cat monkey cat dog cat 2 cat cat cat cat dog I want to create three new variables, that count the amount of cat, monkey and dog per subject subjectABC D Ecat dog monkey 1 cat monkey cat dogcat 3 1 1 2 cat cat cat cat dog41 0 I have been looking at rowSums, rowsum, apply, grep, and doing some searches, but I can only find count for numerical values or NA values. Thanks in advance, L __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame manipulation
When I pass your command line, here is what I get : op=ddply(df,c(DESCRIPTION),summarise,POSITION=sum(QUANITY),DATE=max(CREAT ED.DATE),SETTLEMENT=CLOSING.PRICE[CREATED.DATE=max(CREATED.DATE)]) op DESCRIPTION POSITION DATE SETTLEMENT 1 PRIMARY NICKEL0 2010-03-10 NA 2 PRM HGH GD ALU0 2010-04-09 NA 3 SPCL HIGH GRAD2 2010-04-09 NA 4 STANDARD LEAD 0 2010-04-06 NA That is exactly what I want, but not with the NA ! the SETTLEMENT column should show the corresponding CLOSING.PRICE for the CREATED.DATE *** Arnaud Gaboury Mobile: +41 79 392 79 56 BBM: 255B488F *** From: Ista Zahn [mailto:istaz...@gmail.com] Sent: Friday, April 16, 2010 1:05 PM To: arnaud Gaboury Cc: r-help@r-project.org Subject: Re: [R] data frame manipulation Hi, I'm not sure I understand what you want exactly. My best guess is that you want something like op=ddply(DF, c(DESCRIPTION), summarise, POSITION= sum(QUANITY),DATE=max(CREATED.DATE), CLOSING.PRICE = CLOSING.PRICE[CREATED.DATE == max(CREATED.DATE)]) op - unique(op) Does that do it? -Ista On Fri, Apr 16, 2010 at 4:16 AM, arnaud Gaboury arnaud.gabo...@gmail.com wrote: Dear group, Here is my data.frame : df - structure(list(DESCRIPTION = c(PRM HGH GD ALU, PRM HGH GD ALU, PRIMARY NICKEL, PRIMARY NICKEL, PRIMARY NICKEL, PRIMARY NICKEL, STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD), CREATED.DATE = structure(c(14708, 14708, 14672, 14673, 14678, 14678, 14700, 14700, 14700, 14700, 14700, 14700, 14700, 14705, 14707, 14707, 14707, 14708, 14708, 14708, 14708, 14708, 14622, 14634), class = Date), QUANITY = c(-1, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1, -1, 1, 1, 1, -1), CLOSING.PRICE = c(2,415.9000, 2,415.9000, 25,755.7100, 25,755.7100, 25,760.8600, 25,760.8600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,357.1200, 2,420.7300, 2,420.7300, 2,420.7300, 2,421.0500, 2,421.0500, 2,421.0500, 2,421.0500, 2,421.0500, 2,388.4300, 2,388.4300)), .Names = c(DESCRIPTION, CREATED.DATE, QUANITY, CLOSING.PRICE), row.names = 26:49, class = data.frame) I am looking at summarize it in something like this : op DESCRIPTION POSITION DATE 1 PRIMARY NICKEL 0 2010-03-10 2 PRM HGH GD ALU 0 2010-04-09 3 SPCL HIGH GRAD 2 2010-04-09 4 STANDARD LEAD 0 2010-04-06 To obtain op, I wrote this following line : op=ddply(df, c(DESCRIPTION), summarise, POSITION= sum(QUANITY),DATE=max(CREATED.DATE)). Until there, fine. But I need to have one more column, CLOSING.PRICE. If I write this line : op1=ddply(c, c(DESCRIPTION,CLOSING.PRICE), summarise, POSITION= sum(QUANITY),DATE=max(CREATED.DATE)) Here is what I get: op1 DESCRIPTION CLOSING.PRICE POSITION DATE 1 PRIMARY NICKEL 25,755.7100 0 2010-03-05 2 PRIMARY NICKEL 25,760.8600 0 2010-03-10 3 PRM HGH GD ALU 2,415.9000 0 2010-04-09 4 SPCL HIGH GRAD 2,388.4300 0 2010-01-25 5 SPCL HIGH GRAD 2,420.7300 1 2010-04-08 6 SPCL HIGH GRAD 2,421.0500 1 2010-04-09 7 STANDARD LEAD 2,355.9600 -1 2010-04-01 8 STANDARD LEAD 2,357.1200 1 2010-04-06 Not exactly what I want. Can anyone help? TY __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Return a variable name
On 16/04/2010 5:26 AM, soeren.vo...@eawag.ch wrote: Hello, how can I return the name of a variable, say a$b, from a function? Use deparse(substitute(x)), not just substitute(x). By the way, to be picky, a$b is not the name of a variable. It is an expression that extracts the b element of a. Duncan Murdoch fun - function(x){ return(substitute(x)); } a - data.frame(b=1:10); fun(a$b) ... returns a$b, but this is a type language, thus I can't use it as a character string, can I? How? Thanks for help, Sören __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: Counting number of values by row (text, not numbers)
Hi Laura, Usually this kind of thing is easier if you put your data into a long format. I would use something like Dat - read.table(textConnection(subject A B C D E 1 cat monkey cat dog cat 2 cat cat cat cat dog), header=TRUE) library(reshape) m.Dat - melt(Dat, id=subject) xtabs(~subject+value, m.Dat) value subject cat monkey dog 1 3 1 1 2 4 0 1 but you could use the reshape function instead of reshape::melt. -Ista On Fri, Apr 16, 2010 at 7:18 AM, Laura Ferrero-Miliani laur...@gmail.comwrote: Hi everyone! I am very new to R and I am having some difficulties. My data set looks something like this: subjectABC D E 1 cat monkey cat dogcat 2 cat cat cat cat dog I want to create three new variables, that count the amount of cat, monkey and dog per subject subjectABC D Ecat dog monkey 1 cat monkey cat dogcat 3 1 1 2 cat cat cat cat dog41 0 I have been looking at rowSums, rowsum, apply, grep, and doing some searches, but I can only find count for numerical values or NA values. Thanks in advance, L __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame manipulation
It works for me... DF - + structure(list(DESCRIPTION = c(PRM HGH GD ALU, PRM HGH GD ALU, + PRIMARY NICKEL, PRIMARY NICKEL, PRIMARY NICKEL, PRIMARY NICKEL, + STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , + STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , + SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, + SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, + SPCL HIGH GRAD, SPCL HIGH GRAD), CREATED.DATE = structure(c(14708, + 14708, 14672, 14673, 14678, 14678, 14700, 14700, 14700, 14700, + 14700, 14700, 14700, 14705, 14707, 14707, 14707, 14708, 14708, + 14708, 14708, 14708, 14622, 14634), class = Date), QUANITY = c(-1, + 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1, + -1, 1, 1, 1, -1), CLOSING.PRICE = c(2,415.9000, 2,415.9000, + 25,755.7100, 25,755.7100, 25,760.8600, 25,760.8600, 2,355.9600, + 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, + 2,355.9600, 2,357.1200, 2,420.7300, 2,420.7300, 2,420.7300, + 2,421.0500, 2,421.0500, 2,421.0500, 2,421.0500, 2,421.0500, + 2,388.4300, 2,388.4300)), .Names = c(DESCRIPTION, CREATED.DATE, + QUANITY, CLOSING.PRICE), row.names = 26:49, class = data.frame) library(plyr) op=ddply(DF, c(DESCRIPTION), summarise, POSITION= + sum(QUANITY),DATE=max(CREATED.DATE), SETTLEMENT = CLOSING.PRICE[CREATED.DATE == max(CREATED.DATE)]) op - unique(op) op DESCRIPTION POSITION DATE SETTLEMENT 1 PRIMARY NICKEL0 2010-03-10 25,760.8600 3 PRM HGH GD ALU0 2010-04-09 2,415.9000 5 SPCL HIGH GRAD2 2010-04-09 2,421.0500 10 STANDARD LEAD 0 2010-04-06 2,357.1200 -Ista On Fri, Apr 16, 2010 at 7:21 AM, arnaud Gaboury arnaud.gabo...@gmail.comwrote: When I pass your command line, here is what I get : op=ddply(df,c(DESCRIPTION),summarise,POSITION=sum(QUANITY),DATE=max(CREAT ED.DATE),SETTLEMENT=CLOSING.PRICE[CREATED.DATE=max(CREATED.DATE)]) op DESCRIPTION POSITION DATE SETTLEMENT 1 PRIMARY NICKEL0 2010-03-10 NA 2 PRM HGH GD ALU0 2010-04-09 NA 3 SPCL HIGH GRAD2 2010-04-09 NA 4 STANDARD LEAD 0 2010-04-06 NA That is exactly what I want, but not with the NA ! the SETTLEMENT column should show the corresponding CLOSING.PRICE for the CREATED.DATE *** Arnaud Gaboury Mobile: +41 79 392 79 56 BBM: 255B488F *** From: Ista Zahn [mailto:istaz...@gmail.com] Sent: Friday, April 16, 2010 1:05 PM To: arnaud Gaboury Cc: r-help@r-project.org Subject: Re: [R] data frame manipulation Hi, I'm not sure I understand what you want exactly. My best guess is that you want something like op=ddply(DF, c(DESCRIPTION), summarise, POSITION= sum(QUANITY),DATE=max(CREATED.DATE), CLOSING.PRICE = CLOSING.PRICE[CREATED.DATE == max(CREATED.DATE)]) op - unique(op) Does that do it? -Ista On Fri, Apr 16, 2010 at 4:16 AM, arnaud Gaboury arnaud.gabo...@gmail.com wrote: Dear group, Here is my data.frame : df - structure(list(DESCRIPTION = c(PRM HGH GD ALU, PRM HGH GD ALU, PRIMARY NICKEL, PRIMARY NICKEL, PRIMARY NICKEL, PRIMARY NICKEL, STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD), CREATED.DATE = structure(c(14708, 14708, 14672, 14673, 14678, 14678, 14700, 14700, 14700, 14700, 14700, 14700, 14700, 14705, 14707, 14707, 14707, 14708, 14708, 14708, 14708, 14708, 14622, 14634), class = Date), QUANITY = c(-1, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1, -1, 1, 1, 1, -1), CLOSING.PRICE = c(2,415.9000, 2,415.9000, 25,755.7100, 25,755.7100, 25,760.8600, 25,760.8600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,357.1200, 2,420.7300, 2,420.7300, 2,420.7300, 2,421.0500, 2,421.0500, 2,421.0500, 2,421.0500, 2,421.0500, 2,388.4300, 2,388.4300)), .Names = c(DESCRIPTION, CREATED.DATE, QUANITY, CLOSING.PRICE), row.names = 26:49, class = data.frame) I am looking at summarize it in something like this : op DESCRIPTION POSITION DATE 1 PRIMARY NICKEL0 2010-03-10 2 PRM HGH GD ALU0 2010-04-09 3 SPCL HIGH GRAD2 2010-04-09 4 STANDARD LEAD 0 2010-04-06 To obtain op, I wrote this following line : op=ddply(df, c(DESCRIPTION), summarise, POSITION= sum(QUANITY),DATE=max(CREATED.DATE)). Until there, fine. But I need to have one more column, CLOSING.PRICE. If I write this line : op1=ddply(c, c(DESCRIPTION,CLOSING.PRICE), summarise, POSITION= sum(QUANITY),DATE=max(CREATED.DATE)) Here is what I get: op1 DESCRIPTION CLOSING.PRICE POSITION DATE 1 PRIMARY NICKEL 25,755.71000 2010-03-05 2 PRIMARY NICKEL 25,760.86000 2010-03-10 3 PRM HGH GD ALU
Re: [R] data frame manipulation
Excellent!! You saved me hours and hours of turning around and around. TY so much. From: Ista Zahn [mailto:istaz...@gmail.com] Sent: Friday, April 16, 2010 1:37 PM To: arnaud Gaboury Cc: r-help@r-project.org Subject: Re: [R] data frame manipulation It works for me... DF - + structure(list(DESCRIPTION = c(PRM HGH GD ALU, PRM HGH GD ALU, + PRIMARY NICKEL, PRIMARY NICKEL, PRIMARY NICKEL, PRIMARY NICKEL, + STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , + STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , + SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, + SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, + SPCL HIGH GRAD, SPCL HIGH GRAD), CREATED.DATE = structure(c(14708, + 14708, 14672, 14673, 14678, 14678, 14700, 14700, 14700, 14700, + 14700, 14700, 14700, 14705, 14707, 14707, 14707, 14708, 14708, + 14708, 14708, 14708, 14622, 14634), class = Date), QUANITY = c(-1, + 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1, + -1, 1, 1, 1, -1), CLOSING.PRICE = c(2,415.9000, 2,415.9000, + 25,755.7100, 25,755.7100, 25,760.8600, 25,760.8600, 2,355.9600, + 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, + 2,355.9600, 2,357.1200, 2,420.7300, 2,420.7300, 2,420.7300, + 2,421.0500, 2,421.0500, 2,421.0500, 2,421.0500, 2,421.0500, + 2,388.4300, 2,388.4300)), .Names = c(DESCRIPTION, CREATED.DATE, + QUANITY, CLOSING.PRICE), row.names = 26:49, class = data.frame) library(plyr) op=ddply(DF, c(DESCRIPTION), summarise, POSITION= + sum(QUANITY),DATE=max(CREATED.DATE), SETTLEMENT = CLOSING.PRICE[CREATED.DATE == max(CREATED.DATE)]) op - unique(op) op DESCRIPTION POSITION DATE SETTLEMENT 1 PRIMARY NICKEL 0 2010-03-10 25,760.8600 3 PRM HGH GD ALU 0 2010-04-09 2,415.9000 5 SPCL HIGH GRAD 2 2010-04-09 2,421.0500 10 STANDARD LEAD 0 2010-04-06 2,357.1200 -Ista On Fri, Apr 16, 2010 at 7:21 AM, arnaud Gaboury arnaud.gabo...@gmail.com wrote: When I pass your command line, here is what I get : op=ddply(df,c(DESCRIPTION),summarise,POSITION=sum(QUANITY),DATE=max(CREAT ED.DATE),SETTLEMENT=CLOSING.PRICE[CREATED.DATE=max(CREATED.DATE)]) op DESCRIPTION POSITION DATE SETTLEMENT 1 PRIMARY NICKEL 0 2010-03-10 NA 2 PRM HGH GD ALU 0 2010-04-09 NA 3 SPCL HIGH GRAD 2 2010-04-09 NA 4 STANDARD LEAD 0 2010-04-06 NA That is exactly what I want, but not with the NA ! the SETTLEMENT column should show the corresponding CLOSING.PRICE for the CREATED.DATE *** Arnaud Gaboury Mobile: +41 79 392 79 56 BBM: 255B488F *** From: Ista Zahn [mailto:istaz...@gmail.com] Sent: Friday, April 16, 2010 1:05 PM To: arnaud Gaboury Cc: r-help@r-project.org Subject: Re: [R] data frame manipulation Hi, I'm not sure I understand what you want exactly. My best guess is that you want something like op=ddply(DF, c(DESCRIPTION), summarise, POSITION= sum(QUANITY),DATE=max(CREATED.DATE), CLOSING.PRICE = CLOSING.PRICE[CREATED.DATE == max(CREATED.DATE)]) op - unique(op) Does that do it? -Ista On Fri, Apr 16, 2010 at 4:16 AM, arnaud Gaboury arnaud.gabo...@gmail.com wrote: Dear group, Here is my data.frame : df - structure(list(DESCRIPTION = c(PRM HGH GD ALU, PRM HGH GD ALU, PRIMARY NICKEL, PRIMARY NICKEL, PRIMARY NICKEL, PRIMARY NICKEL, STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , STANDARD LEAD , SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD, SPCL HIGH GRAD), CREATED.DATE = structure(c(14708, 14708, 14672, 14673, 14678, 14678, 14700, 14700, 14700, 14700, 14700, 14700, 14700, 14705, 14707, 14707, 14707, 14708, 14708, 14708, 14708, 14708, 14622, 14634), class = Date), QUANITY = c(-1, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1, -1, 1, 1, 1, -1), CLOSING.PRICE = c(2,415.9000, 2,415.9000, 25,755.7100, 25,755.7100, 25,760.8600, 25,760.8600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,355.9600, 2,357.1200, 2,420.7300, 2,420.7300, 2,420.7300, 2,421.0500, 2,421.0500, 2,421.0500, 2,421.0500, 2,421.0500, 2,388.4300, 2,388.4300)), .Names = c(DESCRIPTION, CREATED.DATE, QUANITY, CLOSING.PRICE), row.names = 26:49, class = data.frame) I am looking at summarize it in something like this : op DESCRIPTION POSITION DATE 1 PRIMARY NICKEL 0 2010-03-10 2 PRM HGH GD ALU 0 2010-04-09 3 SPCL HIGH GRAD 2 2010-04-09 4 STANDARD LEAD 0 2010-04-06 To obtain op, I wrote this following line : op=ddply(df, c(DESCRIPTION), summarise, POSITION= sum(QUANITY),DATE=max(CREATED.DATE)). Until there, fine. But I need to have one more column, CLOSING.PRICE. If I write this line : op1=ddply(c, c(DESCRIPTION,CLOSING.PRICE), summarise, POSITION= sum(QUANITY),DATE=max(CREATED.DATE)) Here is what I get:
[R] Help on rKward
Hi, I'm testing rKward and it's become a great GUI for R on linux, mainly for new linux users. I'm not a new linux user and I use Emacs for my own R's script. But I always try new GUIs or IDEs for to recommend to my students. The most dificult for new R and Linux users is: I have installed R on my linux but I dont found the R icon. This happen because R on linux have only the R-Console in your basic installation. After this barrier, users can install JGR, Rcmdr, etc. Anyway rKward is the best way to new Linux users start to use R. Lets go to my specific problem. The problem is the use of rKward for the heavy R user but new Linux user. Heavy R user dont use menus (they use rKward most like a script IDE than R GUI) and normally dont need to use the rKward output, made automatically by using rKward menus or manually using rk.print() and rk.header() function. They normally use sweave or a ascii output with comments. That is my problem? Exist anyway to save my ascii output from rKward to a file without need to use the copy and paste function? Thanks a lot. Inte Ronaldo -- 14ª lei - Geralmente, só quando você puder publicar seus resultados, eles são bons o suficiente para fazer parte de sua dissertação. --Herman, I. P. 2007. Following the law. NATURE, Vol 445, p. 228. Prof. Ronaldo Reis Júnior | .''`. UNIMONTES/DBG/Lab. Ecologia Comportamental e Computacional | : :' : Campus Universitário Prof. Darcy Ribeiro, Vila Mauricéia | `. `'` CP: 126, CEP: 39401-089, Montes Claros - MG - Brasil | `- Fone: (38) 3229-8192 | ronaldo.r...@unimontes.br | http://www.ppgcb.unimontes.br/lecc | LinuxUser#: 205366 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] merge
I have a problem with the merge command. I have to merge two dataframe that looks like the following example: CODPROD N1 N3 N4 23 3 55 4 24 5 6736 25 3 73 24 second data frame CODPROD N1 N2 30 34 45 45 078 65056 The result should be: CODPROD N1 N2 N3N4 23 3 NA554 24 5 NA67 36 25 3 NA73 24 30 34 45 NA NA 45 0 78 NA NA 65 0 56 NANA Anyone knows how to do it?? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Weights in binomial glm
I have some questions about the use of weights in binomial glm as I am not getting the results I would expect. In my case the weights I have can be seen as 'replicate weights'; one respondent i in my dataset corresponds to w[i] persons in the population. From the documentation of the glm method, I understand that the weights can indeed be used for this: For a binomial GLM prior weights are used to give the number of trials when the response is the proportion of successes. From Modern applied statistics with S-Plus 3rd ed. I understand the same. However, I am getting some strange results. I generated an example: Generate some data which is simular to my dataset Z - rbinom(1000, 1, 0.1) W - round(rnorm(1000, 100, 40)) W[W 1] - 1 Probability of success can either be estimated using: sum(Z*W)/sum(W) [1] 0.09642109 Or using glm: model - glm(Z ~ 1, weights=W, family=binomial()) Warning message: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, : fitted probabilities numerically 0 or 1 occurred predict(model, type=response)[1] 1 2.220446e-16 These two results are obviously not the same. The strange thing is that when I scale the weights, such that the total equals one, the probability is correctly estimated: model - glm(Z ~ 1, weights=W/sum(W), family=binomial()) Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! predict(model, type=response)[1] 1 0.09642109 However scaling of the weights should, as far as I am aware, not have an effect on the estimated parameters. I also tried some other scalings. And, for example scaling the weights by 20 also gives me the correct result. model - glm(Z ~ 1, weights=W/20, family=binomial()) Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! predict(model, type=response)[1] 1 0.09642109 Am I misinterpreting the weights? Could this be a numerical problem? Regards, Jan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: merge
Hi r-help-boun...@r-project.org napsal dne 16.04.2010 14:00:09: I have a problem with the merge command. I have to merge two dataframe that looks like the following example: CODPROD N1 N3 N4 23 3 55 4 24 5 6736 25 3 73 24 second data frame CODPROD N1 N2 30 34 45 45 078 65056 The result should be: CODPROD N1 N2 N3N4 23 3 NA554 24 5 NA67 36 25 3 NA73 24 30 34 45 NA NA 45 0 78 NA NA 65 0 56 NA NA merge(data1, data2, by=CODPROD, all=T) should work. So what does not work in your case? Regards Petr Anyone knows how to do it?? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge
Try this: library(plyr) rbind.fill(DF1, DF2) On Fri, Apr 16, 2010 at 8:00 AM, n.via...@libero.it n.via...@libero.it wrote: I have a problem with the merge command. I have to merge two dataframe that looks like the following example: CODPROD N1 N3 N4 23 3 55 4 24 5 67 36 25 3 73 24 second data frame CODPROD N1 N2 30 34 45 45 0 78 65 0 56 The result should be: CODPROD N1 N2 N3 N4 23 3 NA 55 4 24 5 NA 67 36 25 3 NA 73 24 30 34 45 NA NA 45 0 78 NA NA 65 0 56 NA NA Anyone knows how to do it?? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hugene10stv1cdf
Hi Christoph, Christoph Knapp wrote: Hi all, I'm just tried to start analysing some micro-array chips. And R was asking for this package. When I tried to install it it says that: Using R version 2.10.1, biocinstall version 2.5.10. Installing Bioconductor version 2.5 packages: [1] hugene10stv1cdf Please wait... Warning message: In getDependencies(pkgs, dependencies, available, lib) : package ‘hugene10stv1cdf’ is not available What do I wrong and where can I get this package from? This is a Bioconductor package, so the correct list to ask this is the Bioconductor-help list, not R-help. But what you want is dat - ReadAffy(cdfname=hugene10stv1.r3cdf) and go from there. Affy has the unfortunate habit of naming related data with inconsistent names, and when I built this package last time I didn't notice the inconsistency. Best, Jim Thanks Christoph __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- James W. MacDonald, M.S. Biostatistician Douglas Lab 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exporting an rgl graph
l...@stat.uiowa.edu wrote: The current issue of JCGS (Vol 18 No 1, http://pubs.amstat.org/toc/jcgs/19/1) has an editorial on including animations, 3D visualizations, and movies in on-line PDF files supporting JCGS articles. The online supplements to the editorial include examples. The 3D examples related to the misc3d packages are also available in http://www.stat.uiowa.edu/~luke/R/misc3d/misc3d-pdf/. At some point the code there will be added to misc3d. It should be possible to adapt these ideas to other objects rendered with rgl. luke Luke, Your misc3d-pdf example is very instructive and the .tex file shows how to embed in LaTeX. Thanks! (JCGS 19(1) is actually one of the nicest issues in a long time.) Of the two approaches you describe, the Asymptote route seems easier and more capable than the MeshLab one. It would be particularly useful to have this capability available for rgl. Any plans for this? One note: With Adobe Acrobat Pro 9.3.1, the U3D and PRC images display on screen, but do not print (replaced by the filename). Is this your experience too? -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Streethttp://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with FUN in Hmisc::summarize
Hi all, I'd like to use the Hmisc::summarize function, but it uses a function (FUN) of a single vector argument to create the statistical summaries. Consider an easy case: I'd like to compute the correlation between two variables in my dataframe, grouped according to other variables in the same dataframe. For exemple, consider the following dataframe D: V1 V2 V3 A 1-1 A 1 1 A-1-1 B 1 1 B 1 1 I'd like to use Hmisc::summarize(X=D, by=llist(myvar=D$V1), FUN=corr.V2.V3) where corr.V2.V3 is defined as follows: corr.V2.V3 = function(x) { d = cbind(x$V2, x$V3) out = c(cor(d)) names(out) = c(CORR) return(out) } I was not able to use Hmisc::summarize in this case because FUN should be a function of a matrix argument. Any idea? Thanks in advance, Arnaud [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weights in binomial glm
Jan, It looks like you did not understand the line For a binomial GLM prior weights are used to give the number of trials when the response is the proportion of successes. Weights must be a number of trials (hence integer). Not a proportion of a population. Here is an example that clarifies the use of weights. library(boot) library(reshape) dataset - data.frame(Person = c(rep(A, 20), rep(B, 10)), Success = c(rbinom(20, 1, 0.25), rbinom(10, 1, 0.75))) Aggregated - cast(Person ~ ., data = dataset, value = Success, fun = list(mean, length)) m0 - glm(Success ~ 1, data = dataset, family = binomial) m1 - glm(mean ~ 1, data = Aggregated, family = binomial, weights = length) inv.logit(coef(m0)) inv.logit(coef(m1)) Have a look at the survey package is you want to analyse stratified data. Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Jan van der Laan Verzonden: vrijdag 16 april 2010 14:11 Aan: r-help@r-project.org Onderwerp: [R] Weights in binomial glm I have some questions about the use of weights in binomial glm as I am not getting the results I would expect. In my case the weights I have can be seen as 'replicate weights'; one respondent i in my dataset corresponds to w[i] persons in the population. From the documentation of the glm method, I understand that the weights can indeed be used for this: For a binomial GLM prior weights are used to give the number of trials when the response is the proportion of successes. From Modern applied statistics with S-Plus 3rd ed. I understand the same. Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to use the neural networks package for time series prediction
Hello all , Does any one know how to use the neural networks package for time series prediction ? Have you a similar example in R language ? thanks in advance David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data.frame and ddply
Dear group, Here is my df : futures - structure(list(CONTRAT = c(WHEAT May/10 , WHEAT May/10 , WHEAT May/10 , WHEAT May/10 , COTTON NO.2 May/10 , COTTON NO.2 May/10 , COTTON NO.2 May/10 , PLATINUM Jul/10 , SUGAR NO.11 May/10 , SUGAR NO.11 May/10 , SUGAR NO.11 May/10 , SUGAR NO.11 May/10 , SUGAR NO.11 May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 ), QUANTITY = c(1, 1, 1, 1, 1, 1, 1, -1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1), SETTLEMENT = c(467.7500, 467.7500, 467.7500, 467.7500, 78.1300, 78.1300, 78.1300, 1,739.4000, 16.5400, 16.5400, 16.5400, 16.5400, 16.5400, 1,353., 1,353., 1,353., 1,353., 1,353., 1,353., 1,353., 1,353., 1,353., 1,353., 1,353., 1,353.)), .Names = c(CONTRAT, QUANTITY, SETTLEMENT), row.names = c(NA, 25L), class = data.frame) Here is my code : opfut=ddply(futures, c(CONTRAT,SETTLEMENT), summarise, POSITION= sum(QUANTITY)) Here is the output: opfut CONTRAT SETTLEMENT POSITION 1 SUGAR NO.11 May/10 16.54005 2 COTTON NO.2 May/10 78.13003 3PLATINUM Jul/10 1,739.4000 -1 4 ROBUSTA COFFEE (10) May/10 1,353. 15 5 WHEAT May/10467.75004 It is almost exactly what I want, except I am expecting the POSITION column before the SETTLEMENT column. How can I modified my code to obtain this? TY *** Arnaud Gaboury Mobile: +41 79 392 79 56 BBM: 255B488F __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.frame and ddply
You can do something like this after the output from opfut opfut - data.frame(opfut$CONTRAT,opfut$POSITION,opfut$SETTLEMENT) names(opfut) - c('CONTRAT','POSITION','SETTLEMENT') opfut Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA - Original Message From: arnaud Gaboury arnaud.gabo...@gmail.com To: r-help@r-project.org Sent: Fri, April 16, 2010 6:28:37 AM Subject: [R] data.frame and ddply Dear group, Here is my df : futures - structure(list(CONTRAT = c(WHEAT May/10 , WHEAT May/10 , WHEAT May/10 , WHEAT May/10 , COTTON NO.2 May/10 , COTTON NO.2 May/10 , COTTON NO.2 May/10 , PLATINUM Jul/10 , SUGAR NO.11 May/10 , SUGAR NO.11 May/10 , SUGAR NO.11 May/10 , SUGAR NO.11 May/10 , SUGAR NO.11 May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 ), QUANTITY = c(1, 1, 1, 1, 1, 1, 1, -1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1), SETTLEMENT = c(467.7500, 467.7500, 467.7500, 467.7500, 78.1300, 78.1300, 78.1300, 1,739.4000, 16.5400, 16.5400, 16.5400, 16.5400, 16.5400, 1,353., 1,353., 1,353., 1,353., 1,353., 1,353., 1,353., 1,353., 1,353., 1,353., 1,353., 1,353.)), .Names = c(CONTRAT, QUANTITY, SETTLEMENT), row.names = c(NA, 25L), class = data.frame) Here is my code : opfut=ddply(futures, c(CONTRAT,SETTLEMENT), summarise, POSITION= sum(QUANTITY)) Here is the output: opfut CONTRAT SETTLEMENT POSITION 1 SUGAR NO.11 May/10 16.5400 5 2 COTTON NO.2 May/10 78.1300 3 3 PLATINUM Jul/10 1,739.4000 -1 4 ROBUSTA COFFEE (10) May/10 1,353. 15 5 WHEAT May/10 467.7500 4 It is almost exactly what I want, except I am expecting the POSITION column before the SETTLEMENT column. How can I modified my code to obtain this? TY *** Arnaud Gaboury Mobile: +41 79 392 79 56 BBM: 255B488F __ ymailto=mailto:R-help@r-project.org; href=mailto:R-help@r-project.org;R-help@r-project.org mailing list href=https://stat.ethz.ch/mailman/listinfo/r-help; target=_blank https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] vector matching
Hello all, I have searched the archives for a similar problem to no avail. I could use your help. I have a bunch of vectors organized into two matrices, x and y. These vectors (as rows) consist of combinations of elements such that order does not matter. I want to create a third matrix from the first two, which is basically all the rows in x and all the rows in y, excluding the rows that they both have in common. %in% seems to match individual elements, not entire rows, so something else is needed. Any help is appreciated. Thanks, -Michael -- Michael A. Nestrud Cornell U. Sensory Science PhD Candidate m...@ataraxis.org All that you taste... all that you eat. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how can I plot the histogram like this using R?
Thanks for your reply, I just want to get the figure like y1.jpg using the data from y1.txt. Through the figure I want to obtain the split point like y1.jpg, and consider 2.5 as the plit point. This figure is drawn by other people, I just want to draw it using R, but I can not, so I hope, friends can help me. Best wishes! kevin http://n4.nabble.com/file/n1965378/y1.jpg http://n4.nabble.com/file/n1965378/y1.txt y1.txt -- View this message in context: http://n4.nabble.com/how-can-I-plot-the-histogram-like-this-using-R-tp1839303p1965378.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R loop.
Hi every one I’m new to R and I cant figure our how to use the loop to do the following task, any help would be very kind of every one. I have a file called (table3.txt) that contains over 1000 row and over 40 columns. So for example first row would look like that Deafness, EYA4, DIAPH1, MYO7A, TECTA, COL11A2, POU4F3, MYH9, ACTG1, MYO6 I want the loop stamens to loop thro each row and take first cell which is (Deafness and second which is EYA4) and but it on the button of the file and then take the first cell which is (Deafness again and the third cell which is the DIAPH1) and put it on the button of the file. And so on till I end up with two columns one consists all the disease and one consist all the genes. -- View this message in context: http://n4.nabble.com/R-loop-tp1979620p1979620.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Image RGB calculation
Dear all, I need to read an image (mostly jpg) and split the channel of this image to an colour channel calculation like this: sqrt(R²+G²+B²) Do you have an idea what package I need to use for it, and is it possible? Thanky a lot Ole -- View this message in context: http://n4.nabble.com/Image-RGB-calculation-tp1989864p1989864.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems getting symbols() to show table data
Thanks. I don't think I would ever have worked that twist out. It is perfect. Guy -- View this message in context: http://n4.nabble.com/Problems-getting-symbols-to-show-table-data-tp1839676p1989384.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] generating a SpatialLinesDataFrame (rgdal)
Could somebody give me a pointer on how to generate a SpatialLinesDataFrame from a dataframe, that contains lat,long coordinates as separate variables. At the moment the data looks like this: lat long [1] 53. 1. where as the SpatialLinesDataFrame consists of Coordinates [1] (53.xxx, 1.xxx) This is probably a trival issue, but I'm a relatively new user and searching the documentation hasn't yielded and obvious way to do it so far. Thanks, Simon -- View this message in context: http://n4.nabble.com/generating-a-SpatialLinesDataFrame-rgdal-tp1990352p1990352.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Removing empty (or very underpopulated) sub-populations
Hi, I'm trying to develop a function that will simplify the most common analyses in my area of interest (social sciences) by computing all required statistics at one run (for exaple in case of a factor and numeric variable: 1) normality test, then in case variable are normal 2) ANOVA 3) with efect-size estimation and aprropriate graph). I test normality in each group with this code: are.normal - c() group - as.factor(group) for (i in 1:length(levels(factor(group { are.normal[i] - normality(response[group==levels(factor(group))[i]]) } whrere: 1) response is response (numeric variable), 2) group is grouping variable (factor), 4) normality is a function which takes one variable as argument, and the tries to figure out wheter it's normal (TRUE) or not (FALSE). My problem is that sometimes, some combinations of response~group produce empty populations or very underpopulated (eg. situation when you examine relation between country of origin and age of respondents, and it turns out, that you have only one guy from some country). It causes a failure of my function. I've been wondering wheter there is some way to exclude those underpopulated groups from analysis? Best regards, Kamil Sijko [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] format() method
Hello, I use format() function to get number of the week, like this: format(tmp,'%U') Recently, I have spotted something bizarre. For example, I have such object: (index(tmp$x.delta['2009'][1:16])) [1] 2009-01-02 CET 2009-01-09 CET 2009-01-16 CET 2009-01-23 CET [5] 2009-01-30 CET 2009-02-06 CET 2009-02-13 CET 2009-02-20 CET [9] 2009-02-27 CET 2009-03-06 CET 2009-03-13 CET 2009-03-20 CET [13] 2009-03-27 CET 2009-04-03 CEST 2009-04-09 CEST 2009-04-17 CEST dput(index(tmp$x.delta['2009'][1:16]),'%U',file='as.date') structure(c(1230850800, 1231455600, 1232060400, 1232665200, 123327, 1233874800, 1234479600, 1235084400, 1235689200, 1236294000, 1236898800, 1237503600, 1238108400, 1238709600, 1239228000, 1239919200), tzone = structure(, .Names = TZ), class = c(POSIXt, POSIXct)) To get number of the week I run: format(index(tmp$x.delta['2009'][1:16]),'%U') Here is the output - the weird thing is, that the first number of the week is 00. [1] 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 [16] 15 Is it the bug, my mistake or it is supposed to by like that? Thank you, kafka -- View this message in context: http://n4.nabble.com/format-method-tp1999753p1999753.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xyplot ontop a contourplot (package: lattice)
Hello, I have a contourplot plot that shows the data I want. However, I would like to point a certain amount of points from this plot via a xyplot(). Example: x - seq(pi/4, 5 * pi, length.out = 100) y - seq(pi/4, 5 * pi, length.out = 100) r - as.vector(sqrt(outer(x^2, y^2, +))) grid - expand.grid(x=x, y=y) grid$z - cos(r^2) * exp(-r/(pi^3)) levelplot(z~x*y, grid, cuts = 50, panel.xyplot(x~y)) But the point does not show up. What is the correct way to achieve this? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with FUN in Hmisc::summarize
Hi Arnaud, I'm not sure how do to this with Hmis::summarize, but it's pretty easy with plyr::ddply: D - read.table(textConnection(V1 V2 V3 A 1-1 A 1 1 A-1-1 B 1 1 B 1 1), header=TRUE) closeAllConnections() corr.V2.V3 = function(x) { out = cor(x$V2, x$V3) names(out) = CORR return(out) } library(plyr) ddply(D, .(V1), corr.V2.V3) -Ista On Fri, Apr 16, 2010 at 9:21 AM, arnaud chozo arnaud.ch...@gmail.comwrote: Hi all, I'd like to use the Hmisc::summarize function, but it uses a function (FUN) of a single vector argument to create the statistical summaries. Consider an easy case: I'd like to compute the correlation between two variables in my dataframe, grouped according to other variables in the same dataframe. For exemple, consider the following dataframe D: V1 V2 V3 A 1-1 A 1 1 A-1-1 B 1 1 B 1 1 I'd like to use Hmisc::summarize(X=D, by=llist(myvar=D$V1), FUN=corr.V2.V3) where corr.V2.V3 is defined as follows: corr.V2.V3 = function(x) { d = cbind(x$V2, x$V3) out = c(cor(d)) names(out) = c(CORR) return(out) } I was not able to use Hmisc::summarize in this case because FUN should be a function of a matrix argument. Any idea? Thanks in advance, Arnaud [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weights in binomial glm
Thierry, Thank you for your answer. From the documentation it looks like it is valid to assume that the weights can be used for replicate weights. Continuing your example: dataset$Success2 - dataset$Success Aggregated2 - cast(Person+Success ~ ., data = dataset, value = Success2, fun =list(mean, length)) m2 - glm(mean ~ 1, data = Aggregated2, family = binomial, weights =length) In this case the weights can be seen as replicate weights. In my case the proportion of successes for each group is either 0 or 1. I am familiar with the survey package. However, in this case there should not be difference between the two as far as the parameter estimates are concerned (the standard errors are incorrect for glm). The strange thing in this case is that the estimates seem to depend on the scaling of the weights, which should not be the case. Also in your example scaling the weights gives the same estimate: m1 - glm(mean ~ 1, data = Aggregated, family = binomial, weights = length/10) Regards, Jan On Fri, Apr 16, 2010 at 3:19 PM, ONKELINX, Thierry thierry.onkel...@inbo.be wrote: Jan, It looks like you did not understand the line For a binomial GLM prior weights are used to give the number of trials when the response is the proportion of successes. Weights must be a number of trials (hence integer). Not a proportion of a population. Here is an example that clarifies the use of weights. library(boot) library(reshape) dataset - data.frame(Person = c(rep(A, 20), rep(B, 10)), Success = c(rbinom(20, 1, 0.25), rbinom(10, 1, 0.75))) Aggregated - cast(Person ~ ., data = dataset, value = Success, fun = list(mean, length)) m0 - glm(Success ~ 1, data = dataset, family = binomial) m1 - glm(mean ~ 1, data = Aggregated, family = binomial, weights = length) inv.logit(coef(m0)) inv.logit(coef(m1)) Have a look at the survey package is you want to analyse stratified data. Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Jan van der Laan Verzonden: vrijdag 16 april 2010 14:11 Aan: r-help@r-project.org Onderwerp: [R] Weights in binomial glm I have some questions about the use of weights in binomial glm as I am not getting the results I would expect. In my case the weights I have can be seen as 'replicate weights'; one respondent i in my dataset corresponds to w[i] persons in the population. From the documentation of the glm method, I understand that the weights can indeed be used for this: For a binomial GLM prior weights are used to give the number of trials when the response is the proportion of successes. From Modern applied statistics with S-Plus 3rd ed. I understand the same. Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.frame and ddply
I found a way using the subset command : opfut=subset(ddply(futures, c(CONTRAT,SETTLEMENT), summarise, POSITION= sum(QUANTITY)),select=c(CONTRAT,POSITION,SETTLEMENT)) opfut CONTRAT POSITION SETTLEMENT 1 SUGAR NO.11 May/10 516.5400 2 COTTON NO.2 May/10 378.1300 3PLATINUM Jul/10-1 1,739.4000 4 ROBUSTA COFFEE (10) May/1015 1,353. 5 WHEAT May/10 4 467.7500 -Original Message- From: Felipe Carrillo [mailto:mazatlanmex...@yahoo.com] Sent: Friday, April 16, 2010 4:02 PM To: arnaud Gaboury; r-help@r-project.org Subject: Re: [R] data.frame and ddply You can do something like this after the output from opfut opfut - data.frame(opfut$CONTRAT,opfut$POSITION,opfut$SETTLEMENT) names(opfut) - c('CONTRAT','POSITION','SETTLEMENT') opfut Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA - Original Message From: arnaud Gaboury arnaud.gabo...@gmail.com To: r-help@r-project.org Sent: Fri, April 16, 2010 6:28:37 AM Subject: [R] data.frame and ddply Dear group, Here is my df : futures - structure(list(CONTRAT = c(WHEAT May/10 , WHEAT May/10 , WHEAT May/10 , WHEAT May/10 , COTTON NO.2 May/10 , COTTON NO.2 May/10 , COTTON NO.2 May/10 , PLATINUM Jul/10 , SUGAR NO.11 May/10 , SUGAR NO.11 May/10 , SUGAR NO.11 May/10 , SUGAR NO.11 May/10 , SUGAR NO.11 May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 , ROBUSTA COFFEE (10) May/10 ), QUANTITY = c(1, 1, 1, 1, 1, 1, 1, -1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1), SETTLEMENT = c(467.7500, 467.7500, 467.7500, 467.7500, 78.1300, 78.1300, 78.1300, 1,739.4000, 16.5400, 16.5400, 16.5400, 16.5400, 16.5400, 1,353., 1,353., 1,353., 1,353., 1,353., 1,353., 1,353., 1,353., 1,353., 1,353., 1,353., 1,353.)), .Names = c(CONTRAT, QUANTITY, SETTLEMENT), row.names = c(NA, 25L), class = data.frame) Here is my code : opfut=ddply(futures, c(CONTRAT,SETTLEMENT), summarise, POSITION= sum(QUANTITY)) Here is the output: opfut CONTRAT SETTLEMENT POSITION 1 SUGAR NO.11 May/10 16.5400 5 2 COTTON NO.2 May/10 78.1300 3 3 PLATINUM Jul/10 1,739.4000 -1 4 ROBUSTA COFFEE (10) May/10 1,353. 15 5 WHEAT May/10 467.7500 4 It is almost exactly what I want, except I am expecting the POSITION column before the SETTLEMENT column. How can I modified my code to obtain this? TY *** Arnaud Gaboury Mobile: +41 79 392 79 56 BBM: 255B488F __ ymailto=mailto:R-help@r-project.org; href=mailto:R-help@r-project.org;R-help@r-project.org mailing list href=https://stat.ethz.ch/mailman/listinfo/r-help; target=_blank https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multiple variables pointing to single dataframe?
Hi, I have a need to have 2 variables point to the same dataframe (d1), I don't want to simply copy the dataframe ( d2-d1 ) as my understanding is that this will create a second dataframe. Any suggestions on best practice here? Thank You, // // Alex Bryant // Software Developer // Integrated Clinical Systems, Inc. // 908-996-7208 Confidentiality Note: This e-mail, and any attachment to...{{dropped:13}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Blocking and Nested ANOVA Design. Am I using the aov() function correctly?
Dear list members, I am new member and fairly new into R world! I hope what I have is not beyond the purpose of this list. I did first search for similar experimental designs without success. I want to perform an ANOVA analysis using the aov() function. I am not 100% sure that I have it right. If anyone can help me, that will be greatly appreciated. My design is not balanced for any of the factors. My main aim is to compare the 5 different AI (aridity index) groups and to identify a pattern of the response of the AI groups to the treatment that I applied for the parameter that I measured. In total I have 24 different populations of a specific tree species. The population refers to the geographical area that I choose to collect seeds from and for every population I know the annual rainfall and annual evapotranspiration. I started with equal replicate number of plants per population per treatment, but some died and some where not healthy enough to include them in the experiment. My design is as follows: - Blocks (6 blocks, those are different days that I planted my plants. Every block at the beginning had at least one plant for every population for every treatment. At the end some died or where not healthy enough and that's why I have an unbalanced design.). - Treatments (2 treatments that I selected therefore fixed) - AI (5 AI, this is and Aridity Index, is the ratio of rainfall to evapotranspiration for each of my populations and therefore each population goes to the appropriate AI group. When I selected my populations I did not select them in order to have a balance design from the AI perspective). - Populations nested in AI and I am interested for the interactions as well. So if OP is one of my parameters that I measured I right the following function and when I run it I get the ANOVA table that I show: b- aov(OP ~ Block + Treat*factor(AI)*(factor(AI)/factor(Pop))) summary (b) Df Sum Sq Mean Sq F value Pr(F) Block 5 2.187 0.437 2.6350 0.02423 * Treat 1 126.656 126.656 762.8590 2e-16 *** factor(AI) 4 2.098 0.525 3.15980.01478 * Treat:factor(AI) 4 1.057 0.264 1.5912 0.17721 factor(AI):factor(Pop) 19 2.990 0.157 0.9478 0.52430 Treat:factor(AI):factor(Pop) 19 2.811 0.148 0.8912 0.59429 Residuals 245 40.677 0.166 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 If this is right I need to correct the F value. Since Pop is nested within AI I need to use different f ratios. And the rations that I am using are the ones that I show in the following table. For simplicity I am using the number from 1 to 7 and not MS. Source of variation df F-ratio 1 Block 5 1/7 2 Treat1 2/5 3 AI 4 3/5 4 TreatAI 4 4/5 5 AIPop 195/7 6 TreatAIPop 196/5 7 Residuals 254 Have I used the aov() function correctly? Can anyone comment on that? That’s the first thing that I need to confirm. The other thing is: If I exclude the factor(AI) that is outside of the parenthesis, I get the following: b1- aov(OP ~ Block + Treat*(factor(AI)/factor(Pop))) summary (b1) Df Sum Sq Mean Sq F value Pr(F) Block 5 2.187 0.437 2.6350 0.02423 * Treat 1 126.656 126.656 762.8590 2e-16 *** factor(AI) 4 2.098 0.525 3.1598 0.01478 * factor(AI):factor(Pop) 19 3.056 0.161 0.9689 0.49862 Treat:factor(AI) 4 0.990 0.248 1.4909 0.20551 Treat:factor(AI):factor(Pop) 19 2.811 0.148 0.8912 0.59429 Residuals 245 40.677 0.166 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 The differences between the two tables are not significant at all, but I’m guessing that the one is more correct then the other one. Which one is preferable? I continue with using TukeyHSD, but I’m not going to get into that now. Not sure if the raw data are necessary but I have attached them. Thanking you in advance, Eleftheria Pop AI Block Treat OP 2 0.2 A C 1.13 22 0.2 A C 2.31 3 0.2 A C 1.56 6 0.2
Re: [R] Image RGB calculation
Hi Ole, ole_roessler wrote: I need to read an image (mostly jpg) and split the channel of this image to an colour channel calculation like this: sqrt(R²+G²+B²) Do you have an idea what package I need to use for it, and is it possible? For general image processing capabilities within R, I would recommend the EBImage package which you can find on the BioConductor repositories. Hope this helps, Tobias __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Yet Testing rKward
Hi, I continue testing rKward. I dont know how to save the results from a script execution without use copy and paste or using a rkward output system. Now I try to understand how adapt my script do use the rKward output system. Example: I have this script: -- ## Carregar a tabela de riqueza e equitabilidade library(gdata) dadosriq - read.xls(Panalise.xls,h=T,sheet=2) ## Resumo dos dados summary(dadosriq) -- Using rkward output system I try: -- rk.header(Carregar a tabela de riqueza e equitabilidade) library(gdata) dadosriq - read.xls(Panalise.xls,h=T,sheet=2) rk.header(Resumo dos dados) rk.print(summary(dadosriq)) -- Ok. The problems: 1) my script become rkward specific and it is not a good idea. 2) I cant print the command in output unless I repeat the command like a string: rk.header(dadosriq - read.xls(Panalise.xls,h=T,sheet=2)), but it is also not a good idea. Anyone know if exist a global rkward command to send all (commands e results) to the output? In this way if I'm a rkward user I use this global command, if I'm not a rkward user I comment this command and my script work. This is possible or I need to forget rkward as a linux R script IDE? Thanks Ronaldo -- 8ª lei - Colete seus dados hoje como se você soubesse que seu equipamento vai quebrar amanhã. --Herman, I. P. 2007. Following the law. NATURE, Vol 445, p. 228. Prof. Ronaldo Reis Júnior | .''`. UNIMONTES/DBG/Lab. Ecologia Comportamental e Computacional | : :' : Campus Universitário Prof. Darcy Ribeiro, Vila Mauricéia | `. `'` CP: 126, CEP: 39401-089, Montes Claros - MG - Brasil | `- Fone: (38) 3229-8192 | ronaldo.r...@unimontes.br | http://www.ppgcb.unimontes.br/lecc | LinuxUser#: 205366 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weights in binomial glm
Jan, You misread the documentation of ?glm. Note that glm works with different kinds of families. So the first statement about weights is rather general: it holds for most of the families. It explicitly tells you that is not the case with the binomial family. From the documentation: For a binomial GLM prior weights are used to give the number of trials when the response is the proportion of successes. Nothing more, nothing less. Scaling the weights will change the results because you change the NUMBER OF TRIALS. More trials = more information = lower variances. So you only need to give the weights when the response is expressed as a ratio. If you have it as a binary variable or as cbind(NummerOfSuccesses,NumberOfFailures) then you don't need weights. Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: Jan van der Laan [mailto:djvanderl...@gmail.com] Verzonden: vrijdag 16 april 2010 16:09 Aan: ONKELINX, Thierry CC: r-help@r-project.org Onderwerp: Re: [R] Weights in binomial glm Thierry, Thank you for your answer. From the documentation it looks like it is valid to assume that the weights can be used for replicate weights. Continuing your example: dataset$Success2 - dataset$Success Aggregated2 - cast(Person+Success ~ ., data = dataset, value = Success2, fun =list(mean, length)) m2 - glm(mean ~ 1, data = Aggregated2, family = binomial, weights =length) In this case the weights can be seen as replicate weights. In my case the proportion of successes for each group is either 0 or 1. I am familiar with the survey package. However, in this case there should not be difference between the two as far as the parameter estimates are concerned (the standard errors are incorrect for glm). The strange thing in this case is that the estimates seem to depend on the scaling of the weights, which should not be the case. Also in your example scaling the weights gives the same estimate: m1 - glm(mean ~ 1, data = Aggregated, family = binomial, weights = length/10) Regards, Jan On Fri, Apr 16, 2010 at 3:19 PM, ONKELINX, Thierry thierry.onkel...@inbo.be wrote: Jan, It looks like you did not understand the line For a binomial GLM prior weights are used to give the number of trials when the response is the proportion of successes. Weights must be a number of trials (hence integer). Not a proportion of a population. Here is an example that clarifies the use of weights. library(boot) library(reshape) dataset - data.frame(Person = c(rep(A, 20), rep(B, 10)), Success = c(rbinom(20, 1, 0.25), rbinom(10, 1, 0.75))) Aggregated - cast(Person ~ ., data = dataset, value = Success, fun = list(mean, length)) m0 - glm(Success ~ 1, data = dataset, family = binomial) m1 - glm(mean ~ 1, data = Aggregated, family = binomial, weights = length) inv.logit(coef(m0)) inv.logit(coef(m1)) Have a look at the survey package is you want to analyse stratified data. Thierry -- -- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Jan van der Laan Verzonden: vrijdag 16 april 2010 14:11 Aan: r-help@r-project.org Onderwerp: [R] Weights in binomial glm I have some questions about the use
Re: [R] vector matching
If I understand: unique(t(apply(rbind(x, y), 1, sort))) On Fri, Apr 16, 2010 at 11:05 AM, Michael Nestrud m...@ataraxis.org wrote: Hello all, I have searched the archives for a similar problem to no avail. I could use your help. I have a bunch of vectors organized into two matrices, x and y. These vectors (as rows) consist of combinations of elements such that order does not matter. I want to create a third matrix from the first two, which is basically all the rows in x and all the rows in y, excluding the rows that they both have in common. %in% seems to match individual elements, not entire rows, so something else is needed. Any help is appreciated. Thanks, -Michael -- Michael A. Nestrud Cornell U. Sensory Science PhD Candidate m...@ataraxis.org All that you taste... all that you eat. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] score counts in an aggregate function
Dear R-Users, I have a big data set mydata with repeated observation and some missing values. It looks like the format below: userid sex item score1 score2 1 01 1 1 1 02 0 1 1 03 NA 1 1 04 1 0 2 11 0 1 2 12 NA 1 2 13 1 NA 2 14 NA 0 3 01 1 0 3 02 1 NA 3 03 1 0 3 04 0 0 I wound like to summarise the dataset such that i get something in the format of userid sumscore1 countscore1 meanscore1 sumscore2 countscore2 meanscore2 1 230.67 3 4 0.75 2 120.52 3 0.67 3 340.75 0 3 0.00 I tried using : means - data.frame(aggregate(mydata[, 4:5],by=list(mydata$userid),FUN=mean, na.rm=TRUE)) and sums - data.frame(aggregate(mydata[, 4:5],by=list(mydata$userid),FUN=sum, na.rm=TRUE)) so that i could merge the two data.frames later. This works quite okay but i still can not get a function that can give me a data.frame for the counts!! Something like this:: counts - data.frame(aggregate(mydata[, 4:5],by=list(mydata$userid),FUN=count, na.rm=TRUE)). Any advice? Trevor Belgium -- View this message in context: http://n4.nabble.com/score-counts-in-an-aggregate-function-tp2007152p2007152.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GAMM : how to use a smoother for some levels of a variable, and a linear effect for other levels?
Both versions of this should have worked, but you are right that the first version didn't when used with `gamm', I've fixed this for mgcv 1.6-2 (`mgcv:gam' was ok). Thanks for this. best, Simon On Wednesday 14 April 2010 09:03, JANSEN, Ivy wrote: Hi, I was reading the book on Mixed Effects Models and Extensions in Ecology with R by Zuur et al. In Section 6.2, an example is discussed where a gamm-model is fitted, with a smoother for time, which differs for each value of ID (4 different bird species). In earlier versions of R, the following code was used BM2-gamm(Birds~Rain+ID+ s(Time,by=as.numeric(ID==Stilt.Oahu))+ s(Time,by=as.numeric(ID==Stilt.Maui))+ s(Time,by=as.numeric(ID==Coot.Oahu))+ s(Time,by=as.numeric(ID==Coot.Maui)), correlation=corAR1(form=~Time |ID ), weights=varIdent(form=~1|ID)) However, in the current version of R, this does not work anymore, and should be changed into BM2-gamm(Birds~Rain+ID+ s(Time,by=ID), correlation=corAR1(form=~Time |ID ), weights=varIdent(form=~1|ID)) It turns out that 2 of the 4 smoothers have estimated degrees of freedom of 1, so a linear effect would be sufficient. Now my question is how I need to change the code in order to have a time smoother for ID=Coot.Oahu and ID=Coot.Maui, and a linear time effect for ID=Stilt.Oahu and ID=Stilt.Maui. With the old R-code, this seems trivial, but I don't have any idea how to do it in the newest R-version (interactions with a dummy variable do not work in gamm). Thanks, Ivy Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK +44 1225 386603 www.maths.bath.ac.uk/~sw283 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] format() method
Were you expecting 01, and is that why you are puzzled? See ?strftime and the explanation of the %U format. It depends on where the first Sunday of the year happens to fall. -Don At 5:28 AM -0800 4/16/10, kafkaz wrote: Hello, I use format() function to get number of the week, like this: format(tmp,'%U') Recently, I have spotted something bizarre. For example, I have such object: (index(tmp$x.delta['2009'][1:16])) [1] 2009-01-02 CET 2009-01-09 CET 2009-01-16 CET 2009-01-23 CET [5] 2009-01-30 CET 2009-02-06 CET 2009-02-13 CET 2009-02-20 CET [9] 2009-02-27 CET 2009-03-06 CET 2009-03-13 CET 2009-03-20 CET [13] 2009-03-27 CET 2009-04-03 CEST 2009-04-09 CEST 2009-04-17 CEST dput(index(tmp$x.delta['2009'][1:16]),'%U',file='as.date') structure(c(1230850800, 1231455600, 1232060400, 1232665200, 123327, 1233874800, 1234479600, 1235084400, 1235689200, 1236294000, 1236898800, 1237503600, 1238108400, 1238709600, 1239228000, 1239919200), tzone = structure(, .Names = TZ), class = c(POSIXt, POSIXct)) To get number of the week I run: format(index(tmp$x.delta['2009'][1:16]),'%U') Here is the output - the weird thing is, that the first number of the week is 00. [1] 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 [16] 15 Is it the bug, my mistake or it is supposed to by like that? Thank you, kafka -- View this message in context: http://*n4.nabble.com/format-method-tp1999753p1999753.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://*stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- -- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with FUN in Hmisc::summarize
corr.V2.V3 = function(x) { out = cor(x$V2, x$V3) names(out) = CORR return(out) } A litte more concisely: corr.V2.V3 = function(x) { c(CORR = cor(x$V2, x$V3)) } -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] score counts in an aggregate function
Dear r-list, I have a big data set mydata with repeated observation and some missing values. It looks like the format below: userid sex item score1 score2 1 01 1 1 1 02 0 1 1 03 NA 1 1 04 1 0 2 11 0 1 2 12 NA 1 2 13 1 NA 2 14 NA 0 3 01 1 0 3 02 1 NA 3 03 1 0 3 04 0 0 I wound like to summarise the dataset such that i get something in the format of userid sumscore1 countscore1 meanscore1 sumscore2 countscore2 meanscore2 1 23 0.67 3 4 0.75 2 12 0.52 3 0.67 3 34 0.75 0 3 0.00 I tried using : means - data.frame(aggregate(mydata[, 4:5],by=list(mydata$userid),FUN=mean, na.rm=TRUE)) and sums - data.frame(aggregate(mydata[, 4:5],by=list(mydata$userid),FUN=sum, na.rm=TRUE)) so that i could merge the two data.frames later. This works quite okay but i still can not get a function that can give me a data.frame for the counts!! Something like this:: counts - data.frame(aggregate(mydata[, 4:5],by=list(mydata$userid),FUN=* count*, na.rm=TRUE)). Any advice? Trevor Belgium -- NiceLovely [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how can I plot the histogram like this using R?
On Fri, Apr 16, 2010 at 10:13 AM, bbslover dlu...@yeah.net wrote: Thanks for your reply, I just want to get the figure like y1.jpg using the data from y1.txt. Through the figure I want to obtain the split point like y1.jpg, and consider 2.5 as the plit point. This figure is drawn by other people, I just want to draw it using R, but I can not, so I hope, friends can help me. Best wishes! kevin http://n4.nabble.com/file/n1965378/y1.jpg http://n4.nabble.com/file/n1965378/y1.txt y1.txt -- View this message in context: http://n4.nabble.com/how-can-I-plot-the-histogram-like-this-using-R-tp1839303p1965378.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hi, Does this do what you want? temp-read.table(url(http://n4.nabble.com/file/n1965378/y1.txt;)) hist(temp$V1,breaks=seq(0,5.1,by=0.1)) abline(v=2.5,lty=2,lwd=2,col=red) Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] run R script from Excel VBA
I wrote a R script say called computeCovarMatrix.R and i want to call and run this piece from Excel visual basic. does anyone know how to do that? thanks, KZ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] score counts in an aggregate function
Hi Trever, You can do it like this: count - function(x) { length(na.omit(x)) } counts - data.frame(aggregate(mydata[,4:5],by=list(mydata$userid),FUN=count)) -Ista On Fri, Apr 16, 2010 at 10:35 AM, KDT dkaden...@gmail.com wrote: Dear R-Users, I have a big data set mydata with repeated observation and some missing values. It looks like the format below: userid sex item score1 score2 1 01 1 1 1 02 0 1 1 03 NA 1 1 04 1 0 2 11 0 1 2 12 NA 1 2 13 1 NA 2 14 NA 0 3 01 1 0 3 02 1 NA 3 03 1 0 3 04 0 0 I wound like to summarise the dataset such that i get something in the format of userid sumscore1 countscore1 meanscore1 sumscore2 countscore2 meanscore2 1 230.67 3 4 0.75 2 120.52 3 0.67 3 340.75 0 3 0.00 I tried using : means - data.frame(aggregate(mydata[, 4:5],by=list(mydata$userid),FUN=mean, na.rm=TRUE)) and sums - data.frame(aggregate(mydata[, 4:5],by=list(mydata$userid),FUN=sum, na.rm=TRUE)) so that i could merge the two data.frames later. This works quite okay but i still can not get a function that can give me a data.frame for the counts!! Something like this:: counts - data.frame(aggregate(mydata[, 4:5],by=list(mydata$userid),FUN=count, na.rm=TRUE)). Any advice? Trevor Belgium -- View this message in context: http://n4.nabble.com/score-counts-in-an-aggregate-function-tp2007152p2007152.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Outlier detection from trayectory data
Hi all, I am trying to analyze data coming from trajectories of moving objects. It can be take as a two dimension time serie. The only method I've found is this: http://figment.cse.usf.edu/~sfefilat/data/papers/TuAT10.41.pdf Anyone know if this method is already implemented in R of if there is any other alternative implemented? Thanks in advice. Patricia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] format() method
On Apr 16, 2010, at 9:28 AM, kafkaz wrote: Hello, I use format() function to get number of the week, like this: format(tmp,'%U') Recently, I have spotted something bizarre. For example, I have such object: (index(tmp$x.delta['2009'][1:16])) [1] 2009-01-02 CET 2009-01-09 CET 2009-01-16 CET 2009-01-23 CET [5] 2009-01-30 CET 2009-02-06 CET 2009-02-13 CET 2009-02-20 CET [9] 2009-02-27 CET 2009-03-06 CET 2009-03-13 CET 2009-03-20 CET [13] 2009-03-27 CET 2009-04-03 CEST 2009-04-09 CEST 2009-04-17 CEST dput(index(tmp$x.delta['2009'][1:16]),'%U',file='as.date') structure(c(1230850800, 1231455600, 1232060400, 1232665200, 123327, 1233874800, 1234479600, 1235084400, 1235689200, 1236294000, 1236898800, 1237503600, 1238108400, 1238709600, 1239228000, 1239919200), tzone = structure(, .Names = TZ), class = c(POSIXt, POSIXct)) To get number of the week I run: format(index(tmp$x.delta['2009'][1:16]),'%U') Here is the output - the weird thing is, that the first number of the week is 00. Appears to behave as documented. From ?formatPOSIXct (help page): %U Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week... [1] 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 [16] 15 Is it the bug, my mistake or it is supposed to by like that? Thank you, kafka -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] piecewise nls?
I am looking into fitting a so-called double von Bertalanffy function to fish length-at-age data. Attempting to simplify the situation, the model looks like this ... Y ~ f(X; a,b,c) if x Z Y ~ g(X; a,d,e) if x = Z where * f and g are non-linear functions (the traditional single von Bertalanffy growth function), * Y (length) and X (age) are observed variables, * a,b,c,d,e are parameters to be estimated, and * Z is not a parameter but is a constant computed from b,c,d,e. I usually fit the traditional single model with nls() but am unsure of how to fit this model with the if statement. I tried search the archives with piecewise and either nls, nonlinear, or regression but did not find anything that seemed to fit this situation. One thought I had was to do something like this (mostly pseudo-code) ... nls(Y~ifelse(XZ,1,0)*f(X;a,b,c)+ifelse(X=Z,1,0)*g(X;a,d,e), ...) but am unsure if this makes sense. If anyone can offer some help I would be very appreciative. Thank you in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] run R script from Excel VBA
Have a look at rcom.univie.ac.at. We have an Excel addin which will allow you to do that. Disclaimer: I am the author of the addin. On 4/16/2010 4:57 PM, KZ wrote: I wrote a R script say called computeCovarMatrix.R and i want to call and run this piece from Excel visual basic. does anyone know how to do that? thanks, KZ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Erich Neuwirth, University of Vienna Faculty of Computer Science Computer Supported Didactics Working Group Visit our SunSITE at http://sunsite.univie.ac.at Phone: +43-1-4277-39464 Fax: +43-1-4277-39459 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bootstrapping a repeated measures ANOVA
On Fri, 16 Apr 2010, Fischer, Felix wrote: Hello everyone, i have a question regarding the sampling process in boot(). PLEASE ... provide commented, minimal, self-contained, reproducible code. Which means something a correspondent could actually run. But before that, a careful reading of ?boot should get you started. Note these bits: Arguments: data: The data as a vector, ... statistic: A function which when applied to data returns a vector containing the statistic(s) of interest. When sim=parametric, [snip] In all other cases statistic must take at least two arguments. The first argument passed will always be the original data. The second will be a vector of indices, frequencies or weights which define the bootstrap sample. ... HTH, Chuck I try to bootstrap F-values for a repeated measures ANOVA to get a confidence interval of F-values. Unfortunately, while the aov works fine, it fails in the boot()-function. I think the problem might be that the resampling process fails to select both lines of data representing the 2 measuring times for one subject and I therefore get missing cases. The data is organised like this: subject ortmz PHQ 1 1 1 x 1 1 2 y 2 1 1 z 2 1 2 zz ... Is there any way to specify, that both lines need to be selected? Thanks a lot! Felix Fischer P.S. If you need to have a look to my code: F_values - function(formula, data, indices) { d - data[indices,] # allows boot to select sample fit=aov(formula,data=d) #fit model return(c(summary(fit)[1][[1]][[1]]$`F value`, summary(fit)[2][[1]][[1]]$`F value`)) #return F-values } results - boot(data=anova.daten, statistic=F_values, R=10, formula=PHQ_Sum_score~mz*ort+Error(subject/mz)) Dipl. Psych. Felix Fischer Medizinische Klinik mit Schwerpunkt Psychosomatik Charit? -- Universit?tsmedizin Berlin Luisenstr. 13a 10117 Berlin Tel.: 030 - 450 553575 Email: felix.fisc...@charite.demailto:felix.fisc...@charite.de [[alternative HTML version deleted]] Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] return of a function
Dear R users, I have a function which takes as arguments big arrays, say : w, x , y and z. My function changes these arrays and I want them as result/output. I have tried to write return(w,x,y,z), and thus to replace the previous w, x, y and z. It does not seem to work. What can I do ? Thank you very much, Gustave [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] return of a function
You can return a single object from a function. If you want multiple values, use a list: f - function(x,y,z){ return(list(x=x, y=y, z=z)) } value - f(x,y,z) # now copy the values x - value$x y - value$y z - value$z On Fri, Apr 16, 2010 at 12:02 PM, Gustave Lefou gustave5...@gmail.comwrote: Dear R users, I have a function which takes as arguments big arrays, say : w, x , y and z. My function changes these arrays and I want them as result/output. I have tried to write return(w,x,y,z), and thus to replace the previous w, x, y and z. It does not seem to work. What can I do ? Thank you very much, Gustave [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] return of a function
Below Bert Gunter Genentech Nonclinical Statistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Gustave Lefou Sent: Friday, April 16, 2010 9:03 AM To: r-help@r-project.org Subject: [R] return of a function Dear R users, I have a function which takes as arguments big arrays, say : w, x , y and z. My function changes these arrays and I want them as result/output. I have tried to write return(w,x,y,z), and thus to replace the previous w, x, y and z. It does not seem to work. What can I do ? -- 1. Read the Help file? -- which says: return(value) Arguments: value: An expression. -- and note that w,x,y,z is **not** a legal R expression 2. Have you read the online documentation, including an Introduction to R? There you would find many examples. 3. return(list(w,x,y,z)) ## is what you want ## or even list(w,x,y,z) ## without the return(), as the last R expression is by default what is returned. Thank you very much, Gustave [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] return of a function
On Apr 16, 2010, at 12:02 PM, Gustave Lefou wrote: Dear R users, I have a function which takes as arguments big arrays, say : w, x , y and z. My function changes these arrays and I want them as result/output. I have tried to write return(w,x,y,z), and thus to replace the previous w, x, y and z. It does not seem to work. Right. Two misconceptions here. First, return() accepts one object, which could be a list of items. Second, just because you return it with a name that is the same as some obkect outside the function does not mean that the new values will be placed in the outside object. In fact if you do not assign the returned value to something, it will be temporarily placed in .LastValue and then overwritten when the next evaluation operation occurs. You need to assign the result of a function to some object. What can I do ? Read more about functions and do more examples with small objects to see the effects on test cases. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exporting an rgl graph
The easiest approach may be to just install R onto a USB drive (flash/thumb/...) then when you go to your coworkers computer just run R from the USB drive and show the rgl plot. I think there is also a tool to create an animation from rgl, it is not interactive, but you could e-mail a movie file that they could play to see the plot from many angles. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of cgeno...@u-paris10.fr Sent: Thursday, April 15, 2010 6:02 AM To: ted.hard...@manchester.ac.uk; Barry Rowlingson Cc: r-help@r-project.org Subject: Re: [R] Exporting an rgl graph Thanks for you answer. Let me precise my question. In fact, I do not want to capture a screen, I want to save an object that can be seen in 3D. With rgl, using my mouse, I can make the object move. This is what I want to export: an real 3D object that my collaborator will have the possibility to see in 3D. Christophe On 15-Apr-10 10:10:54, Barry Rowlingson wrote: On Thu, Apr 15, 2010 at 10:24 AM, cgeno...@u-paris10.fr wrote: Hi the list, I use rgl to produce a 3D graph. I would like to show this graph to some collaborator. Is there a way to save it and send it to someone else? See ?rgl.postscript and ?rgl.snapshot Or use some kind of screen capture system - on Windows the 'Print Screen' key can copy the screen to the clipboard, paste into Photoshop or other graphics program. On Linux, I use 'scrot' from the command line - type 'scrot -s', click on a window, and it makes a PNG file of it. Again on Linux, since ImageMagick is installed, I use the 'import' programme from that suite. When you start that, it produces a +-shaped mouse cursor which you can use (selecting a top-left-hand corner to start with, and holding down the left mouse button) to drag out a bounding frame for the part of the screen you want to save. Then, when you release the button, an image of that portion of the screen is saved to a file of your choice, in any graphics format of your choice that is supported by ImageMagick (including PS and EPS, as well as all the common butmap formats). See 'man import' for pointers to more information. I have this set up as an icon on my launch panel, so it is just a matter of clicking on that, and then doing the above. The command behind the icon is /usr/local/bin/mkscreengrab and my script file 'mkscreengrab' contains: #! /bin/bash export ScrGrbTmp=`mktemp /home/ted/Screengrabs/screengrab` import $ScrGrbTmp.jpg rm $ScrGrbTmp so this makes JPEGs (I could have chosen somthing else, but that's the default I mostly want for that activity). This produces a file with a name like screengrab4913.jpg which will be unique in that directory, and it can later be renamed to your taste. If I wanted a different file format, I would use 'import' from the command line, with appropriate filenam extension (e.g. .png, .ps, .eps, ... ). I hadn't heard of scrot before, but now I've looked it up it seems that its output format is limited to PNG. I've now also located more info about various ways of taking screenshots in Linux: http://tips.webdesign10.com/how-to-take-a-screenshot-on-ubuntu-linux Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 15-Apr-10 Time: 12:18:25 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weights in binomial glm
Jan, Thierry is correct in saying that you are misusing glm(), but there is also a numerical problem. You are misusing glm() because your model specification claims to have Binomial(n,p) observations with w in the vicinity of 100, where there is a single common p but the observed binomial proportion is either 1 or 0, never anything in between. These data are a very poor fit to a binomial model. The correct specification if you have what you call replicate weights and I call frequency weights is to produce a single data record for each covariate pattern that has both the 1 and 0 observations. This can either be two columns for successes and failures, or one column of proportions and one column of weights. As your quote from MASS says weights are used to give the number of trials when the response is the proportion of successes. In your data the response is *not* the proportion of successes. However, the MLE should still be equal to the weighted mean even with this misuse. The reason it is not is because of the starting values. R has to find some starting values for the iterative maximization of the likelihood, and for binomial data with y successes out of n it uses starting values for the fitted means of (y+0.5)/(n+1). Starting the iteration at the data in this way usually makes the Fisher scoring algorithm very reliable -- it is correctly scaled to the data, in some sense. Unfortunately, if you separate out the successes and failures, you have some points starting with values very close to 0. When I used your code the starting value for the point with the largest weight was 0.5/199. At iteration 2, the estimated mean ends up very small for all observations, and then the iteration diverges. However, if you provide a starting value then the fitting works, even if you start the iteration at, say beta=1, corresponding to a fitted mean of over 70%. So, the result is wrong in the sense that it is not the mle, because of a failure of convergence, which happens because specifying the weights the way you did rather than the documented way leads to bad default starting values for the iteration. You need either to specify the data as recommended or supply starting values. =thomas On Fri, 16 Apr 2010, Jan van der Laan wrote: I have some questions about the use of weights in binomial glm as I am not getting the results I would expect. In my case the weights I have can be seen as 'replicate weights'; one respondent i in my dataset corresponds to w[i] persons in the population. From the documentation of the glm method, I understand that the weights can indeed be used for this: For a binomial GLM prior weights are used to give the number of trials when the response is the proportion of successes. From Modern applied statistics with S-Plus 3rd ed. I understand the same. However, I am getting some strange results. I generated an example: Generate some data which is simular to my dataset Z - rbinom(1000, 1, 0.1) W - round(rnorm(1000, 100, 40)) W[W 1] - 1 Probability of success can either be estimated using: sum(Z*W)/sum(W) [1] 0.09642109 Or using glm: model - glm(Z ~ 1, weights=W, family=binomial()) Warning message: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, : fitted probabilities numerically 0 or 1 occurred predict(model, type=response)[1] 1 2.220446e-16 These two results are obviously not the same. The strange thing is that when I scale the weights, such that the total equals one, the probability is correctly estimated: model - glm(Z ~ 1, weights=W/sum(W), family=binomial()) Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! predict(model, type=response)[1] 1 0.09642109 However scaling of the weights should, as far as I am aware, not have an effect on the estimated parameters. I also tried some other scalings. And, for example scaling the weights by 20 also gives me the correct result. model - glm(Z ~ 1, weights=W/20, family=binomial()) Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! predict(model, type=response)[1] 1 0.09642109 Am I misinterpreting the weights? Could this be a numerical problem? Regards, Jan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bootstrapping a repeated measures ANOVA
Thank you for your answer. Sorry for the missing example. In fact, i think, i solved the issue by some data-manipulations in the function. I splitted the data (one set for each measuring time), selected the cases at random, and then combined the two measuring times again. Results look promising to me, but if someone is aware of problems, please let me know. This code should run: library(boot) anova.daten=data.frame(subject=sort(rep(1:10,2)), mz=rep(1:2,10), ort=sort(rep(1:2,10)),PHQ_Sum_score=rnorm(20,10,2)) #generate data summary(aov(PHQ_Sum_score~mz*ort+Error(subject/mz),data=anova.daten)) F_values - function(formula, data1, indices) { data2=subset(data1, data1$mz==2) #subsetting data for each measuring time data3=subset(data1, data1$mz==1) data4 - data3[indices,] # allows boot to select sample subjekte=na.omit(data4$subject) data5=rbind(data3[subjekte,], data2[subjekte,]) #combine data data5$subject=factor(rep(1:length(subjekte),2)) #convert repeated subjects to unique subjects fit=aov(formula,data=data5)#fit model return(c(summary(fit)[1][[1]][[1]]$`F value`, summary(fit)[2][[1]][[1]]$`F value`)) #return F-values } results - boot(data=anova.daten, statistic=F_values, R=10, formula=PHQ_Sum_score~mz*ort+Error(subject/mz)) #bootstrap Thanks a lot, Felix Fischer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Poblems wih EBImage
Hello, EBImage is a Bioconductor package: please post on the Bioconductor mailing list. EBImage requires the libraries ImageMagick and GTK+ to be installed. Did you follow the instructions of the installation manual ? http://www.bioconductor.org/packages/release/bioc/html/EBImage.html It looks like EBImage cannot locate ImageMagick and GTK+. Are they working properly (try gtk-demo and convert in the command line) ? Did you add the GTK path in the system path (most likely c:\gtk\bin) ? Did you tick the Install developement headers and libraries checkbox when installing ImageMagick ? Hope this helps, Regards, Greg --- Gregoire Pau EMBL Research Officer http://www.embl.de/~gpau/ R Heberto Ghezzo, Dr wrote: Hello, Working with Windows 7 in a HP laptop with R-2.10.1 I download and installed ImageMagick-6.3.7.7-Q16-Windows-dll.exe and GTK 2.12.9-win32-2, then downloaded and installed from local file EBImage_3.2.0.zip and I got: library(EBImage) Loading required package: abind Error in inDL(x, as.logical(local), as.logical(now), ...) : unable to load shared library 'C:/Programs/R/Cran/EBImage/libs/EBImage.dll': LoadLibrary failure: The specified module could not be found. In addition: Warning message: package 'abind' was built under R version c(2, 5, 0) and help will not work correctly Please re-install it Error: package/namespace load failed for 'EBImage' the location C:\Programs\R\Cran\EBImage\libs\EBImage.dll exists Can somebody tell me what is wrong? Thanks Heberto Ghezzo McGill University Canada __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] TeachingDemos install bumps out with 'Out of memory!'
I have no idea what is happening here (not an ubunto or linux expert), but it seems unlikely that the particular package is the main problem, rather that is the package you happen to be on when the problem manifests. TeachingDemos does not have any compiled code (all straight R code) and does not run any initialization procedures and is not a huge package, so seems unlikely to be the culprit (but if it is, let me know and I will try to fix it). Have you tried restarting the computer and installing packages in a fresh session? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Uwe Dippel [mailto:udip...@uniten.edu.my] Sent: Thursday, April 15, 2010 8:56 PM To: r-help@r-project.org; Greg Snow Subject: TeachingDemos install bumps out with 'Out of memory!' The same thing that happened to my 'maptools' (http://permalink.gmane.org/gmane.comp.lang.r.general/177404) also hits me here: It eats all memory until the system dies. Alas, in this case, no Ubuntu package. Since I installed some tens of packages with the same method in the meantime, I guess something must be wrong with the install.packages; at least on Ubuntu9.10, amd64. (And I am not out of memory, really: Mem: 3347584k total, 812872k used, 2534712k free,12648k buffers Swap: 4305380k total, 618464k used, 3686916k free, 272476k cached) Uwe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] PCA scores
Hi all, I have a difficulty to calculate the PCA scores. The PCA scores I calculated doesn't match with the scores generated by R, mypca-princomp(mymatrix, cor=T) myscore-as.matrix(mymatrix)%*%as.matrix(mypca$loadings) Does anybody know how the mypca$scores were calculated? Is my formula not correct? Thanks a lot! Phoebe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] VERY SIMPLE QUESTION
Dear R users, I am looking for more efficient way to compute the followings -- a - matrix(c(1,1,1,1,2,2,2,2),4,2) b - matrix(c(1,2,3,4),4,1) Eventually, I want to get this matrix, `c`. c - matrix(c(1/1,1/2,1/3,1/4,2/1,2/2,2/3,2/4),4,2) -- In fact, #column of `a` is so big.. Is there a more efficient way to compute this instead of using apply or something? or apply is only way..? Any suggestion will be greatly appreciated. Regards, Kathryn Lord -- View this message in context: http://n4.nabble.com/VERY-SIMPLE-QUESTION-tp2013288p2013288.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] VERY SIMPLE QUESTION
Try this: sweep(a, 1, b, '/') On Fri, Apr 16, 2010 at 2:30 PM, Kathie kathryn.lord2...@gmail.com wrote: Dear R users, I am looking for more efficient way to compute the followings -- a - matrix(c(1,1,1,1,2,2,2,2),4,2) b - matrix(c(1,2,3,4),4,1) Eventually, I want to get this matrix, `c`. c - matrix(c(1/1,1/2,1/3,1/4,2/1,2/2,2/3,2/4),4,2) -- In fact, #column of `a` is so big.. Is there a more efficient way to compute this instead of using apply or something? or apply is only way..? Any suggestion will be greatly appreciated. Regards, Kathryn Lord -- View this message in context: http://n4.nabble.com/VERY-SIMPLE-QUESTION-tp2013288p2013288.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA scores
On Fri, 2010-04-16 at 10:23 -0700, phoebe kong wrote: Hi all, I have a difficulty to calculate the PCA scores. The PCA scores I calculated doesn't match with the scores generated by R, mypca-princomp(mymatrix, cor=T) myscore-as.matrix(mymatrix)%*%as.matrix(mypca$loadings) Does anybody know how the mypca$scores were calculated? Is my formula not correct? You need to apply the centring and scaling done because you set 'cor = TRUE' in your princomp call. Here's an example using the inbuilt 'swiss' data set. data(swiss) pc - princomp(swiss, cor = TRUE) my.scr - with(pc, scale(swiss, center = center, scale = scale) %*% loadings(pc)) all.equal(my.scr, pc$scores) You can see all of this in the princomp code if you look closely: getAnywhere(princomp.default) HTH Thanks a lot! Phoebe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] VERY SIMPLE QUESTION
Since b is only one column, just make it a vector. a - matrix(c(1,1,1,1,2,2,2,2),4,2) b - c(1,2,3,4) then result - a/b result [,1] [,2] [1,] 1.000 2.000 [2,] 0.500 1.000 [3,] 0.333 0.667 [4,] 0.250 0.500 should be what you want. It is also a bad idea to name the resulting matrix c since c(...) is a primitive function. Christian On Fri, 2010-04-16 at 09:30 -0800, Kathie wrote: Dear R users, I am looking for more efficient way to compute the followings -- a - matrix(c(1,1,1,1,2,2,2,2),4,2) b - matrix(c(1,2,3,4),4,1) Eventually, I want to get this matrix, `c`. c - matrix(c(1/1,1/2,1/3,1/4,2/1,2/2,2/3,2/4),4,2) -- In fact, #column of `a` is so big.. Is there a more efficient way to compute this instead of using apply or something? or apply is only way..? Any suggestion will be greatly appreciated. Regards, Kathryn Lord -- Christian Raschke Department of Economics and ISDS Research Lab (HSRG) Louisiana State University Patrick Taylor Hall, Rm 2128 Baton Rouge, LA 70803 cras...@lsu.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] VERY SIMPLE QUESTION
thanks a lot. good day. Kathie On Fri, Apr 16, 2010 at 1:43 PM, Henrique Dallazuanna [via R] ml-node+2013302-929204043-67...@n4.nabble.comml-node%2b2013302-929204043-67...@n4.nabble.com wrote: Try this: sweep(a, 1, b, '/') On Fri, Apr 16, 2010 at 2:30 PM, Kathie [hidden email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=2013302i=0 wrote: Dear R users, I am looking for more efficient way to compute the followings -- a - matrix(c(1,1,1,1,2,2,2,2),4,2) b - matrix(c(1,2,3,4),4,1) Eventually, I want to get this matrix, `c`. c - matrix(c(1/1,1/2,1/3,1/4,2/1,2/2,2/3,2/4),4,2) -- In fact, #column of `a` is so big.. Is there a more efficient way to compute this instead of using apply or something? or apply is only way..? Any suggestion will be greatly appreciated. Regards, Kathryn Lord -- View this message in context: http://n4.nabble.com/VERY-SIMPLE-QUESTION-tp2013288p2013288.html Sent from the R help mailing list archive at Nabble.com. __ [hidden email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=2013302i=1mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ [hidden email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=2013302i=2mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View message @ http://n4.nabble.com/VERY-SIMPLE-QUESTION-tp2013288p2013302.html To unsubscribe from VERY SIMPLE QUESTION, click here (link removed) ==. -- View this message in context: http://n4.nabble.com/VERY-SIMPLE-QUESTION-tp2013288p2013312.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read xml
Hi I am trying to read selected fields from a xml file with R using xml package. So far I have learned the basics of this package by going through the manual, examples, tutorial, and so on (www.omegahat.org/RSXML) . The problem is that I am getting stuck when it comes down to more complex xml files. I am a novice in R and xml, and was wondering if someone could help me out with here. Here is my xml file. I am only interested in the protein_group node. Therefore, I have omitted most of the information from the other two previous nodes (protein_summary_header, proteinprophet_details). ?xml version=1.0 encoding=UTF-8? ?xml-stylesheet type=text/xsl href=http://localhost/ISB/data/interact-LFA1_C18_PME5R1.prot.xsl ? protein_summary xmlns=http://regis-web.systemsbiology.net/protXML; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=http://sashimi.sourceforge.net/schema_revision/protXML/protXML_v6.xsd summary_xml=interact-LFA1_C18_PME5R1.prot.xml protein_summary_header reference_database=EColi_decoy_v3.0.fasta program_details analysis=proteinprophet proteinprophet_details occam_flag=Y run_options=XML protein_group group_number=1 probability=1. protein protein_name=sp|P4|CYC_HORSE n_indistinguishable_proteins=1 probability=1. percent_coverage=46.7 unique_stripped_peptides=EDLIAYLK+EETLMEYLENPK +KTGQAPGFTYTDANK+TEREDLIAYLK+TGPNLHGLFGR+TGQAPGFTYTDANK group_sibling_id=a total_number_peptides=226 pct_spectrum_ids=2.54 confidence=1.00 parameter name=prot_length value=107/ annotation protein_description=Cytochrome c OS=Equus caballus GN=CYCS PE=1 SV=2/ peptide peptide_sequence=KTGQAPGFTYTDANK charge=2 initial_probability=0.9989 nsp_adjusted_probability=0.9998 peptide_group_designator=a weight=1.00 is_nondegenerate_evidence=Y n_enzymatic_termini=2 n_sibling_peptides=8.50 n_sibling_peptides_bin=6 n_instances=10 exp_tot_instances=9.94 is_contributing_evidence=Y calc_neutral_pep_mass=1597.7737 /peptide peptide peptide_sequence=TGQAPGFTYTDANK charge=2 initial_probability=0.9989 nsp_adjusted_probability=0.9998 weight=1.00 is_nondegenerate_evidence=Y n_enzymatic_termini=2 n_sibling_peptides=8.50 n_sibling_peptides_bin=6 n_instances=90 exp_tot_instances=89.82 is_contributing_evidence=Y calc_neutral_pep_mass=1469.6786 /peptide peptide peptide_sequence=KTGQAPGFTYTDANK charge=3 initial_probability=0.9990 nsp_adjusted_probability=0.9998 peptide_group_designator=a weight=1.00 is_nondegenerate_evidence=Y n_enzymatic_termini=2 n_sibling_peptides=8.50 n_sibling_peptides_bin=6 n_instances=10 exp_tot_instances=9.89 is_contributing_evidence=Y calc_neutral_pep_mass=1597.7737 /peptide /protein /protein_group protein_group group_number=2 probability=1. protein protein_name=sp|P00350|6PGD_ECOLI n_indistinguishable_proteins=1 probability=1. percent_coverage=32.1 unique_stripped_peptides=AGAGTDAAIDSLKPYLDK +EAYELVAPILTK+EFVESLETPR+EKTEEVIAENPGK+GDIIIDGGNTFFQDTIR+GPSIMPGGQK +GYTVSIFNR+IAAVAEDGEPCVTYIGADGAGHYVK+IVSYAQGFSQLR+QIADDYQQALR +TEEVIAENPGK+VLSGPQAQPAGDK group_sibling_id=a total_number_peptides=32 pct_spectrum_ids=0.36 confidence=1.00 parameter name=prot_length value=474/ annotation protein_description=6-phosphogluconate deh ... I did the following: doc - xmlRoot(xmlTreeParse(myfile.xml)) xmlApply(doc, names) $protein_summary_header program_details program_details $dataset_derivation list() $protein_group protein protein $protein_group protein protein [IN FACT, THE $protein_group APPEARS A COUPLE HUNDRED TIMES] So, I want to create a data frame comprising of selected information from my $protein_group as follows: group_numberprotein_nameprobability peptide_sequence initial_probability n_instances 1 sp|P4|CYC_HORSE 1. KTGQAPGFTYTDANK 0.9989 10 1 sp|P4|CYC_HORSE 1. TGQAPGFTYTDANK 0.9989 90 1 sp|P4|CYC_HORSE 1. KTGQAPGFTYTDANK 0.9990 10 2 sp|P00350|6PGD_ECOLI1. NAPGTYCMR 0.9349 8 2 sp|P00350|6PGD_ECOLI1. TGAHPGPMK 0.9124 2 As I understand the variables from columns 4, 5 and 6 are children from protein_group. For each $protein_group, I need to retrieve some of its children. I would greatly appreciate any help. Thank you very much, Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to get rid of extra areas on spatial data map
Hello All, I am using sp and maps libraries to have a map for some fire data. The region covered is Mississippi State in the US. Then I would like to add a layer of ecoregion on the top (omenrik layer from nationalatlas.gov). The problem is that the ecoregion layer is larger than the state boundary. How can I get rid of these ecoregion areas out of MS state? The code is like this: # Both data.fire and data.ecoregion are class of SpatialPolygonsDataFrame. # Every fire is within the state boundary so it looks nice. # Some ecoregions go beyond the state boundary. library(sp); library(maps) win.graph(width=4, height=6) map('state', region = 'mississippi', col='red') plot(data.fire, add=T) plot(data.ecoregion, add=T) Any hint is greatly appreciated. Edwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] call R script from Excel VBA/macro
i wrote a R script say called computeCovarMatrix.R and i want to call and run this piece from Excel visual basic. does anyone know how to do that? thanks, KZ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] run R script from Excel VBA
See RExcel, http://rcom.univie.ac.at/ http://rcom.univie.ac.at/ and especially the video demo http://rcom.univie.ac.at/RExcelDemo/ http://rcom.univie.ac.at/RExcelDemo/ Guy -- View this message in context: http://n4.nabble.com/run-R-script-from-Excel-VBA-tp2009478p2011942.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Image RGB calculation
Thanks a lot, i will try this out! Ole -- View this message in context: http://n4.nabble.com/Image-RGB-calculation-tp1989864p2013203.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Efficiency of C Compiler in R CMD SHLIB
Thank you very much for your kind explanation. I did find my DLL compiled using either VC++ 6.0 or Intel Compiler (almost equally fast) is significanlty faster than that compiled using gcc (55 seconds vs. 78 seconds), the default compiler in R. I did not choose debug mode when using gcc so I suppose it generates released version of DLL. I just wonder how to switch to using ICC or VC++ 6.0' compiler in R CMD SHLIB. Could you give me some advice? Thanks! -- View this message in context: http://n4.nabble.com/Efficiency-of-C-Compiler-in-R-CMD-SHLIB-tp1934429p2004312.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Efficiency of C Compiler in R CMD SHLIB
I wonder how to further improve the optimization level of gcc. I thought O-3 has already been the best. -- View this message in context: http://n4.nabble.com/Efficiency-of-C-Compiler-in-R-CMD-SHLIB-tp1934429p2008994.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R loop.
I'm not sure I completely understand your question, but I think the solution to your problem is the reshape function in the reshape package. Here is a silly example of how it would work: V-matrix(rbinom(15,4,.5),nrow=3) X-data.frame(A=c(A,B,C),V=V) X A V.1 V.2 V.3 V.4 V.5 1 A 1 2 3 3 3 2 B 4 3 0 2 2 3 C 2 3 2 1 2 reshape(X,direction=long,varying=c(V.1,V.2,V.3,V.4,V.5)) A time V id 1.1 A1 1 1 2.1 B1 4 2 3.1 C1 2 3 1.2 A2 2 1 2.2 B2 3 2 3.2 C2 3 3 1.3 A3 3 1 2.3 B3 0 2 3.3 C3 2 3 1.4 A4 3 1 2.4 B4 2 2 3.4 C4 1 3 1.5 A5 3 1 2.5 B5 2 2 3.5 C5 2 3 Your two columns of interest are A and V. The time column lets you know from which column the V came. -tgs On Fri, Apr 16, 2010 at 6:35 AM, mhalsham mhals...@bradford.ac.uk wrote: Hi every one Im new to R and I cant figure our how to use the loop to do the following task, any help would be very kind of every one. I have a file called (table3.txt) that contains over 1000 row and over 40 columns. So for example first row would look like that Deafness, EYA4, DIAPH1, MYO7A, TECTA, COL11A2, POU4F3, MYH9, ACTG1, MYO6 I want the loop stamens to loop thro each row and take first cell which is (Deafness and second which is EYA4) and but it on the button of the file and then take the first cell which is (Deafness again and the third cell which is the DIAPH1) and put it on the button of the file. And so on till I end up with two columns one consists all the disease and one consist all the genes. -- View this message in context: http://n4.nabble.com/R-loop-tp1979620p1979620.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is it ok to apply the z.test this way?
Dear R-users, I want to check if certain values are from random distribution, that includes values between 0-1. So, it is not really normal even though shapiro.test says it is highly normal... Can I do something like this and think that the values given are right. z.test is from package TeachingDemos. --- SelectedVals=c() for(i in seq(0,1,by=0.001)) { if((z.test(i, mu=mean(Distribution), stdev=sd(Distribution))$p.value)=0.05) SelectedVals=c(SelectedVals,i) } --- I have marked the border values given by this script to the histogram of the original random distribution: http://www.ag.fimug.fi/~Atte/62Hist100410.pdf Atte Tenkanen University of Turku, Finland Department of Musicology +35823335278 http://users.utu.fi/attenka/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R loop.
On Apr 16, 2010, at 11:52 AM, Thomas Stewart wrote: I'm not sure I completely understand your question, but I think the solution to your problem is the reshape function in the reshape package. Except there is no reshape function in the reshape package. Your code works because the reshape function is in the stats package which is loaded by default. Here is a silly example of how it would work: V-matrix(rbinom(15,4,.5),nrow=3) X-data.frame(A=c(A,B,C),V=V) X A V.1 V.2 V.3 V.4 V.5 1 A 1 2 3 3 3 2 B 4 3 0 2 2 3 C 2 3 2 1 2 reshape(X,direction=long,varying=c(V.1,V.2,V.3,V.4,V.5)) A time V id 1.1 A1 1 1 2.1 B1 4 2 3.1 C1 2 3 1.2 A2 2 1 2.2 B2 3 2 3.2 C2 3 3 1.3 A3 3 1 2.3 B3 0 2 3.3 C3 2 3 1.4 A4 3 1 2.4 B4 2 2 3.4 C4 1 3 1.5 A5 3 1 2.5 B5 2 2 3.5 C5 2 3 Your two columns of interest are A and V. The time column lets you know from which column the V came. -tgs On Fri, Apr 16, 2010 at 6:35 AM, mhalsham mhals...@bradford.ac.uk wrote: Hi every one Im new to R and I cant figure our how to use the loop to do the following task, any help would be very kind of every one. I have a file called (table3.txt) that contains over 1000 row and over 40 columns. So for example first row would look like that Deafness, EYA4, DIAPH1, MYO7A, TECTA, COL11A2, POU4F3, MYH9, ACTG1, MYO6 I want the loop stamens to loop thro each row and take first cell which is (Deafness and second which is EYA4) and but it on the button of the file and then take the first cell which is (Deafness again and the third cell which is the DIAPH1) and put it on the button of the file. And so on till I end up with two columns one consists all the disease and one consist all the genes. -- View this message in context: http://n4.nabble.com/R-loop-tp1979620p1979620.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is it ok to apply the z.test this way?
On Apr 16, 2010, at 12:11 PM, Atte Tenkanen wrote: Dear R-users, I want to check if certain values are from random distribution, that includes values between 0-1. So, it is not really normal even though shapiro.test says it is highly normal... Can I do something like this and think that the values given are right. z.test is from package TeachingDemos. --- SelectedVals=c() for(i in seq(0,1,by=0.001)) { if((z.test(i, mu=mean(Distribution), stdev=sd(Distribution)) $p.value)=0.05) SelectedVals=c(SelectedVals,i) } You are attempting to do statistics on a single number at a time. If you do not immediately appreciate the absurdity of this effort, then you should consult a real statistician without delay. There are many fine statisticians at your university. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with FUN in Hmisc::summarize
arnaud chozo wrote: Hi all, I'd like to use the Hmisc::summarize function, but it uses a function (FUN) of a single vector argument to create the statistical summaries. Consider an easy case: I'd like to compute the correlation between two variables in my dataframe, grouped according to other variables in the same dataframe. For exemple, consider the following dataframe D: V1 V2 V3 A 1-1 A 1 1 A-1-1 B 1 1 B 1 1 I'd like to use Hmisc::summarize(X=D, by=llist(myvar=D$V1), FUN=corr.V2.V3) where corr.V2.V3 is defined as follows: corr.V2.V3 = function(x) { d = cbind(x$V2, x$V3) out = c(cor(d)) names(out) = c(CORR) return(out) } I was not able to use Hmisc::summarize in this case because FUN should be a function of a matrix argument. Any idea? Thanks in advance, Arnaud See the Hmisc mApply or summary.formula functions, or use tapply using a vector of possible subscripts (1:n) as the first argument; then you can use the subscripts selected to address multiple variables. Frank -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is it ok to apply the z.test this way?
Several points: 1. The Shapiro test does not tell you that something is normal or highly normal, only that you don't have enough evidence to disprove that the data came from a normal population (powered for a certain type of deviation from normality). 2. The z.test function is intended to be used as a stepping stone in learning for students, a simple test with unrealistic assumptions to get the ideas, then relax the assumptions and learn about t tests and others. 3. The z test is only used when the population standard deviation is known, you calculate the sd from the data, that is what t tests are for. 4. Calculating the hypothesized mean from the data is backwards. 5. using a sample size of 1 is questionable, doing this 1,000 times without correction is even more questionable. 6. Your code is equivalent to: tmp - seq(0,1, by=0.001) tmp2 - tmp[ abs(tmp-mean(Distribution))/sd(Distribution) 1.96 ] just slower and less memory efficient. 7. None of this establishes what is from an unknown distribution. If you can tell us what your real question is, then maybe we can help with a real solution. So to answer your question of if it is ok to use z.test in that way: Leagally the license says you can use it anyway you want, ethically/morally/aesthetically/or following the intent of the author, No! -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Atte Tenkanen Sent: Friday, April 16, 2010 10:11 AM To: r-help@r-project.org Subject: [R] Is it ok to apply the z.test this way? Dear R-users, I want to check if certain values are from random distribution, that includes values between 0-1. So, it is not really normal even though shapiro.test says it is highly normal... Can I do something like this and think that the values given are right. z.test is from package TeachingDemos. --- SelectedVals=c() for(i in seq(0,1,by=0.001)) { if((z.test(i, mu=mean(Distribution), stdev=sd(Distribution))$p.value)=0.05) SelectedVals=c(SelectedVals,i) } --- I have marked the border values given by this script to the histogram of the original random distribution: http://www.ag.fimug.fi/~Atte/62Hist100410.pdf Atte Tenkanen University of Turku, Finland Department of Musicology +35823335278 http://users.utu.fi/attenka/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is it ok to apply the z.test this way?
So .. are you trying to figure out whether your data hasa substantial number of outliers that call into question the adequacy of the normal distro fro your data? If this is the case, note that you cannot individually check the values (as you are doing) without taking into account of the Bonferoni fallacy i.e. small p-values will be found with a respectable frequency as the size of the dataset grows (C Robert discusses this in a preprint in arxiv see http://arxiv.org/PS_cache/arxiv/pdf/1002/1002.2080v1.pdf ) So even though you could check each individual point for normality, testing the whole dataset requires that you apply a Bonferoni correction to your z.tests or use outlier.test from package car to reduce the amount of code you have to write. Regards, Christos Date: Fri, 16 Apr 2010 19:11:19 +0300 From: atte...@utu.fi To: r-help@r-project.org Subject: [R] Is it ok to apply the z.test this way? Dear R-users, I want to check if certain values are from random distribution, that includes values between 0-1. So, it is not really normal even though shapiro.test says it is highly normal... Can I do something like this and think that the values given are right. z.test is from package TeachingDemos. --- SelectedVals=c() for(i in seq(0,1,by=0.001)) { if((z.test(i, mu=mean(Distribution), stdev=sd(Distribution))$p.value)=0.05) SelectedVals=c(SelectedVals,i) } --- I have marked the border values given by this script to the histogram of the original random distribution: http://www.ag.fimug.fi/~Atte/62Hist100410.pdf Atte Tenkanen University of Turku, Finland Department of Musicology +35823335278 http://users.utu.fi/attenka/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ Hotmail: Powerful Free email with security by Microsoft. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] formatR: farewell to ugly R code
This is an announcement of the release of an R package 'formatR', which can help us format our R code to make it more human-readable. If you have ugly (I mean unformatted) R code like this: # rotation of the word Animation # in a loop; change the angle and color # step by step for (i in 1:360) { # redraw the plot again and again plot(1,ann=FALSE,type=n,axes=FALSE) # rotate; use rainbow() colors text(1,1,Animation,srt=i,col=rainbow(360)[i],cex=7*i/360) # pause for a while Sys.sleep(0.01)} There are no spaces, no appropriate indent... The package 'formatR' provides a GUI (by gWidgets) to make messy R code clean and tidy, e.g. # rotation of the word 'Animation' # in a loop; change the angle and color # step by step for (i in 1:360) { # redraw the plot again and again plot(1, ann = FALSE, type = n, axes = FALSE) # rotate; use rainbow() colors text(1, 1, Animation, srt = i, col = rainbow(360)[i], cex = 7 * i/360) # pause for a while Sys.sleep(0.01) } The usage is simple: # formatR depends on RGtk+; will be installed automatically # better use the latest version of R (=2.10.1) install.packages('formatR') library(formatR) # or formatR() Screen-shots can be found here: http://yihui.name/en/2010/04/formatr-farewell-to-ugly-r-code/ Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: 515-294-6609 Web: http://yihui.name Department of Statistics, Iowa State University 3211 Snedecor Hall, Ames, IA ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.