Re: [R] \ll and \gg in expression()
Prof Brian Ripley wrote: On Sat, 9 Feb 2008, Michael Kubovy wrote: On Feb 9, 2008, at 4:41 PM, Prof Brian Ripley wrote: On Sat, 9 Feb 2008, Michael Kubovy wrote: How do I enter 'much greater than' and 'much less than' symbols in an expression? Those are not in the Adobe Symbol encoding used for plotmath. Since you have not told us your platform and locale as requested in the posting guide R version 2.7.0 Under development (unstable) (2008-02-05 r44340) i386-apple-darwin8.10.1 locale: C I don't know if the following is relevant to you. If you have a suitable Unicode font and the means to use it (which most likely means a UTF-8 locale in R 2.7.0) they are the glyphs for \u226a and \u226b (see http://www.alanwood.net/unicode/mathematical_operators.html). A quick check suggests that not many fonts do. Thanks. The Mac character palette tells me that they correspond to Unicode 226A and B or UTF8 E2 89 AA and AB. My question now is, how do I tell expression() to use the glyphs for these? \u226a and \u226b, as I said. But not in a C locale. Otherwise, a quick approximation could be expression(x~~1). (We don't do negative thin space, do we? That could make it look better.) -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] writing a function
mohamed nur anisah [EMAIL PROTECTED] [Fri, Feb 08, 2008 at 04:42:41PM CET]: Dear lists, I'm in my process of learning of writing a function. I tried to write a simple functions of a matrix and a vector. Here are the codes: mm-function(m,n){ #matrix function w-matrix(nrow=m, ncol=n) for(i in 1:m){ for(j in 1:n){ w[i,j]=i+j } } return(w[i,j]) } In addition to the other comments, allow me to remark that R provides a lot of convenience functions on vectors that make explicit looping unnecessary. An error such as yours wouldn't have occurred to a more experienced expRt because indices wouldn't turn up in the code at all: mm - function(m, n) { a - array(nrow=m, ncol=n) row(a)+col(a) } Greetings Johannes -- Johannes Hüsing There is something fascinating about science. One gets such wholesale returns of conjecture mailto:[EMAIL PROTECTED] from such a trifling investment of fact. http://derwisch.wikidot.com (Mark Twain, Life on the Mississippi) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Testing for differecnes between groups, need help to find the right test in R.
Dear all, I have a data set with four different groups, for each group I have several observation (number of observation in each group are unequal), and I want to test if there are some differences in the values between the groups. What will be the most proper way to test this in R? Regards Kes [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using 'sapply' and 'by' in one function
Greetings, I'm having a problem with something that I think is very simple - I'd like to be able to use the 'sapply' and 'by' functions in 1 function to be able (for example) to get regression coefficients from multiple models by a grouping variable. I think that I'm missing something that is probably obvious to experienced users. Here's a simple (trivial) example of what I'd like to do: new - data.frame(Outcome.1=rnorm(10),Outcome.2=rnorm(10),sex=rep(0:1,5),Pred=rnorm(10)) fxa - function(x,data) { lm(x~Pred,data=data)$coef } sapply(new[,1:2],fxa,new) # this yields coefficients for the predictor in separate models fxb - function(x) {lm(Outcome.1~Pred,da=x)$coef}; by(new,new$sex,fxb) #yields the coefficient for Outcome.1 for each sex ## I'd like to be able to combine 'sapply' and 'by' to be able to get the regression coefficients for Outome.1 and Outcome.2 by each sex, rather than running fxb a second time predicting 'Outcome.2' or by subsetting the data - by sex - before I run the function, but the following doesn't work - by(new,new$sex,FUN=function(x)sapply(x[,1:2],fxa,new)) 'Error in model.frame.default(formula = x ~ Pred, data = data, drop.unused.levels = TRUE) : variable lengths differ (found for 'Pred')' ##I understand the error message - the length of 'Pred' is 10 while the length of each sex group is 5, but I'm not sure how to correctly write the 'by' function to use 'sapply' inside it. Could someone please point me in the right direction? Thanks very much in advance David S Freedman, CDC (Atlanta USA) [definitely not the well-know statistician, David A Freedman, in Berkeley] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using 'sapply' and 'by' in one function
By passing new to fxa via the second argument of fxa, new is not being subsetted hence the error. Try this: by(new, new$sex, function(x) sapply(x[1:2], function(y) coef(lm(y ~ Pred, x))) Actually, you can do the above without sapply as lm can take a matrix for the dependent variable: by(new, new$sex, function(x) coef(lm(as.matrix(x[1:2]) ~ Pred, x))) On Feb 10, 2008 8:19 AM, David Natalia [EMAIL PROTECTED] wrote: Greetings, I'm having a problem with something that I think is very simple - I'd like to be able to use the 'sapply' and 'by' functions in 1 function to be able (for example) to get regression coefficients from multiple models by a grouping variable. I think that I'm missing something that is probably obvious to experienced users. Here's a simple (trivial) example of what I'd like to do: new - data.frame(Outcome.1=rnorm(10),Outcome.2=rnorm(10),sex=rep(0:1,5),Pred=rnorm(10)) fxa - function(x,data) { lm(x~Pred,data=data)$coef } sapply(new[,1:2],fxa,new) # this yields coefficients for the predictor in separate models fxb - function(x) {lm(Outcome.1~Pred,da=x)$coef}; by(new,new$sex,fxb) #yields the coefficient for Outcome.1 for each sex ## I'd like to be able to combine 'sapply' and 'by' to be able to get the regression coefficients for Outome.1 and Outcome.2 by each sex, rather than running fxb a second time predicting 'Outcome.2' or by subsetting the data - by sex - before I run the function, but the following doesn't work - by(new,new$sex,FUN=function(x)sapply(x[,1:2],fxa,new)) 'Error in model.frame.default(formula = x ~ Pred, data = data, drop.unused.levels = TRUE) : variable lengths differ (found for 'Pred')' ##I understand the error message - the length of 'Pred' is 10 while the length of each sex group is 5, but I'm not sure how to correctly write the 'by' function to use 'sapply' inside it. Could someone please point me in the right direction? Thanks very much in advance David S Freedman, CDC (Atlanta USA) [definitely not the well-know statistician, David A Freedman, in Berkeley] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using 'sapply' and 'by' in one function
Actually thinking about this, not only do you not need sapply but you don't even need by: new2 - transform(new, sex = factor(sex)) coef(lm(as.matrix(new2[1:2]) ~ sex/Pred - 1, new2)) On Feb 10, 2008 8:43 AM, Gabor Grothendieck [EMAIL PROTECTED] wrote: By passing new to fxa via the second argument of fxa, new is not being subsetted hence the error. Try this: by(new, new$sex, function(x) sapply(x[1:2], function(y) coef(lm(y ~ Pred, x))) Actually, you can do the above without sapply as lm can take a matrix for the dependent variable: by(new, new$sex, function(x) coef(lm(as.matrix(x[1:2]) ~ Pred, x))) On Feb 10, 2008 8:19 AM, David Natalia [EMAIL PROTECTED] wrote: Greetings, I'm having a problem with something that I think is very simple - I'd like to be able to use the 'sapply' and 'by' functions in 1 function to be able (for example) to get regression coefficients from multiple models by a grouping variable. I think that I'm missing something that is probably obvious to experienced users. Here's a simple (trivial) example of what I'd like to do: new - data.frame(Outcome.1=rnorm(10),Outcome.2=rnorm(10),sex=rep(0:1,5),Pred=rnorm(10)) fxa - function(x,data) { lm(x~Pred,data=data)$coef } sapply(new[,1:2],fxa,new) # this yields coefficients for the predictor in separate models fxb - function(x) {lm(Outcome.1~Pred,da=x)$coef}; by(new,new$sex,fxb) #yields the coefficient for Outcome.1 for each sex ## I'd like to be able to combine 'sapply' and 'by' to be able to get the regression coefficients for Outome.1 and Outcome.2 by each sex, rather than running fxb a second time predicting 'Outcome.2' or by subsetting the data - by sex - before I run the function, but the following doesn't work - by(new,new$sex,FUN=function(x)sapply(x[,1:2],fxa,new)) 'Error in model.frame.default(formula = x ~ Pred, data = data, drop.unused.levels = TRUE) : variable lengths differ (found for 'Pred')' ##I understand the error message - the length of 'Pred' is 10 while the length of each sex group is 5, but I'm not sure how to correctly write the 'by' function to use 'sapply' inside it. Could someone please point me in the right direction? Thanks very much in advance David S Freedman, CDC (Atlanta USA) [definitely not the well-know statistician, David A Freedman, in Berkeley] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Which package should I use if I estimate a recursive model?
Dear Yongfu He, If you mean a recursive structural-equation model, then if you're willing to assume normally distributed errors, equation-by-equation OLS regression, using lm(), will give you the full-information maximum-likelihood estimates of the structural coefficients. You could also use the sem() function in the sem package, but, aside from getting a test of over-identifying restrictions (assuming that the model is overidentified), there's not much reason to do so -- you'll get the same estimates. I hope this helps, John John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] project.org] On Behalf Of Yongfu He Sent: February-09-08 9:16 PM To: r-help@r-project.org Subject: [R] Which package should I use if I estimate a recursive model? Dear All: I want to estimate a simple recursive mode in R. Which package should I use? Thank you very much in advance. Yongfu He _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R on Mac PRO does anyone have experience with R on such a platform ?
On Feb 10, 2008 2:29 AM, Maura E Monville [EMAIL PROTECTED] wrote: I saw there exists an R version for Mac/OS. I'd like to hear from someone who is running R on a Mac/OS before venturing on getting the following computer system. I am in the process of choosing a powerful laptop 17 MB PRO 2.6GHZ(dual-core) 4GBRAM Thank you so much, -- Maura E.M [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. You can see the R MacOSX FAQ, http://cran.es.r-project.org/bin/macosx/RMacOSX-FAQ.html. Also you can post in the Mac R list (R-sig-mac) Rod. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Do I need to use dropterm()??
Hi Dani, it would be better to start with a question you are trying to ask of your data rather than trying to figure out what a particular function does. with your variables and model, even if the component terms were not significant, they must in the model or the product of sunlight and aspect will NOT represent the interaction. also note that the tests of your components are probably not what you think they are. in general, tests of components of interactions test the simple effect of that variable when the other variable is 0. hence, your 'significant' result for aspect pertains to when log sunlight is 0, which probably isn't what you want to be testing. what the significant effect for sunlight means depends on how aspect was coded. you should check to see what code was used to know what zero means. gary mcclelland colorado On Sun, Feb 10, 2008 at 6:40 AM, DaniWells [EMAIL PROTECTED] wrote: Hello, I'm having some difficulty understanding the useage of the dropterm() function in the MASS library. What exactly does it do? I'm very new to R, so any pointers would be very helpful. I've read many definitions of what dropterm() does, but none seem to stick in my mind or click with me. I've coded everything fine for an interaction that runs as follows: two sets of data (one for North aspect, one for Southern Aspect) and have a logscale on the x axis, with survival on the y. After calculating my anova results i have all significant results (ie aspect = sig, logscale of sunlight = sig, and aspect:llight = sig). When i have all significant results in my ANOVA table, do i need dropterm(), or is that just to remove insignificant terms? Many thanks, Dani -- View this message in context: http://www.nabble.com/Do-I-need-to-use-dropterm%28%29---tp15396151p15396151.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in optim while using fitdistr() function
I get the digest, so I apologize if this is a little late. For your situation (based on the description and what I think your code is doing, more on that below), it looks like you are modeling a Poisson flow where the number of hits per unit time is a random integer with some mean value. If I understand your code correctly, you are trying to put your data into k bins of width f-(max(V1)-min(V1))/k. In that case I would think something like this would work more efficiently: m-min(V1); k-floor(1 + log2(length(V1))); f-(max(V1)-min(V1))/k; binCount-NULL; for(i in seq(length=k)){ binIndex-which((m+(i-1)*fV1)(V1m+i*f)); binCount[i]-sum(V2[binIndex]); }; where i becomes the index of time intervals. Hope it helps. Sincerely, Jason Q. McClintic [EMAIL PROTECTED] wrote: Send R-help mailing list submissions to r-help@r-project.org To subscribe or unsubscribe via the World Wide Web, visit https://stat.ethz.ch/mailman/listinfo/r-help or, via email, send a message with subject or body 'help' to [EMAIL PROTECTED] You can reach the person managing the list at [EMAIL PROTECTED] When replying, please edit your Subject line so it is more specific than Re: Contents of R-help digest... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using 'sapply' and 'by' in one function
On Feb 10, 2008 8:25 AM, Gabor Grothendieck [EMAIL PROTECTED] wrote: Actually thinking about this, not only do you not need sapply but you don't even need by: new2 - transform(new, sex = factor(sex)) coef(lm(as.matrix(new2[1:2]) ~ sex/Pred - 1, new2)) Although that's a very slightly different model, as it assumes that both sexes have the same error variance. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error while using fitdistr() function or goodfit() function
Hello, Thanks that helped for poisson. When I changed method to ML it worked for poisson but when I used that for nbinomial I got errors.But why is this happening? gf-goodfit(binCount,type= poisson) summary(gf) Goodness-of-fit test for poisson distribution X^2 df P( X^2) Likelihood Ratio 2730.24 30 gf-goodfit(binCount,type= nbinomial) Warning messages: 1: NaNs produced in: dnbinom(x, size, prob, log) 2: NaNs produced in: dnbinom(x, size, prob, log) summary(gf) Goodness-of-fit test for nbinomial distribution X^2 df P( X^2) Likelihood Ratio 64.53056 2 9.713306e-15 But how can I interpret above result? When I was using goodfit using method MinChisq I was getting some P value.More the P value among goodness of fit tests for different distributions (poisson,binomial,nbinomial) better the fit would be.Am I correct?If I am wrong correct me. But now with ML method how can I decide which distribution is best fit? Thank You. Aswad On 2/10/08, Jason Q. McClintic [EMAIL PROTECTED] wrote: Try changing your method to ML and try again. I tried the run the first example from the documentation and it failed with the same error. Changing the estimation method to ML worked. @List: Can anyone else verify the error I got? I literally ran the following two lines interactively from the example for goodfit: dummy - rnbinom(200, size = 1.5, prob = 0.8) gf - goodfit(dummy, type = nbinomial, method = MinChisq) and got back Warning messages: 1: In pnbinom(q, size, prob, lower.tail, log.p) : NaNs produced 2: In pnbinom(q, size, prob, lower.tail, log.p) : NaNs produced Again, I hope this helps. Sincerely, Jason Q. McClintic Aswad Gurjar wrote: Hello, Thanks for help.But I am facing different problem. I have 421 readings of time and no of requests coming at perticular time.Basically I have data with interval of one minute and corresponding no of requests.It is discrete in nature.I am collecting data from 9AM to 4PM.But some of readings are coming as 0.When I plotted histogram of data I could not get shape of any standard distribution.Now,my aim is to find distribution which is best fit to my data among standard ones. So there was huge data.That's why I tried to collect data into no of bins.That was working properly.Whatever code you have given is working properly too.But your code is more efficient.Now,problem comes at next stage.When I apply fitdistr() for continuous data or goodfit() for discrete data I get following error.I am not able to remove that error.Please help me if you can. Errors are as follows: library(vcd) gf-goodfit(binCount,type= nbinomial,method= MinChisq) Warning messages: 1: NaNs produced in: pnbinom(q, size, prob, lower.tail, log.p) 2: NaNs produced in: pnbinom(q, size, prob, lower.tail, log.p) 3: NaNs produced in: pnbinom(q, size, prob, lower.tail, log.p) 4: NaNs produced in: pnbinom(q, size, prob, lower.tail, log.p) 5: NaNs produced in: pnbinom(q, size, prob, lower.tail, log.p) summary(gf) Goodness-of-fit test for nbinomial distribution X^2 dfP( X^2) Pearson 9.811273 2 0.007404729 Warning message: Chi-squared approximation may be incorrect in: summary.goodfit(gf) for another distribution: gf-goodfit(binCount,type= poisson,method= MinChisq) Warning messages: 1: NA/Inf replaced by maximum positive value in: optimize(chi2, range(count)) 2: NA/Inf replaced by maximum positive value in: optimize(chi2, range(count)) 3: NA/Inf replaced by maximum positive value in: optimize(chi2, range(count)) 4: NA/Inf replaced by maximum positive value in: optimize(chi2, range(count)) 5: NA/Inf replaced by maximum positive value in: optimize(chi2, range(count)) 6: NA/Inf replaced by maximum positive value in: optimize(chi2, range(count)) 7: NA/Inf replaced by maximum positive value in: optimize(chi2, range(count)) 8: NA/Inf replaced by maximum positive value in: optimize(chi2, range(count)) Goodness-of-fit test for poisson distribution X^2 df P( X^2) Pearson 1.660931e+115 30 Warning message: Chi-squared approximation may be incorrect in: summary.goodfit(gf) Aswad On 2/10/08, Jason Q. McClintic [EMAIL PROTECTED] wrote: I get the digest, so I apologize if this is a little late. For your situation (based on the description and what I think your code is doing, more on that below), it looks like you are modeling a Poisson flow where the number of hits per unit time is a random integer with some mean value. If I understand your code correctly, you are trying to put your data into k bins of width f-(max(V1)-min(V1))/k. In that case I would think something like this would work more efficiently: m-min(V1); k-floor(1 + log2(length(V1))); f-(max(V1)-min(V1))/k; binCount-NULL; for(i in seq(length=k)){
Re: [R] Using 'sapply' and 'by' in one function
Although that's a very slightly different model, as it assumes that both sexes have the same error variance. But the output are the coefficients and they are identical. For the sake of an example I'm sure that David simply omitted the part of his analysis where he looked at the standard errors as well ;) Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] building packages for Linux vs. Windows
On 10/02/2008 1:07 PM, Erin Hodgess wrote: Hi R People: I sure that this is a really easy question, but here goes: I'm trying to build a package that will run on both Linux and Windows. However, there are several commands in a section that will be different in Linux than they are in Windows. Would I be better off just to build two separate packages, please? If just one is needed, how could I determine which system is running in order to use the correct command, please? You will find it much easier to build just one package. You can use .Platform or (for more detail) Sys.info() to find out what kind of system you're running on. Remember that R doesn't just run on Linux and Windows: there's also MacOSX, and other Unix and Unix-like systems (Solaris, etc.). Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] building packages for Linux vs. Windows
Hi R People: I sure that this is a really easy question, but here goes: I'm trying to build a package that will run on both Linux and Windows. However, there are several commands in a section that will be different in Linux than they are in Windows. Would I be better off just to build two separate packages, please? If just one is needed, how could I determine which system is running in order to use the correct command, please? Thanks in advance, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] prcomp vs. princomp vs fast.prcomp
Hi R People: When performing PCA, should I use prcomp, princomp or fast.prcomp, please? thanks. Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] building packages for Linux vs. Windows
On my widows XP computer, W From my windows XP system running R 2.6.1: version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 6.1 year 2007 month 11 day26 svn rev43537 language R version.string R version 2.6.1 (2007-11-26) John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Ted Harding [EMAIL PROTECTED] 2/10/2008 1:39 PM On 10-Feb-08 18:07:56, Erin Hodgess wrote: Hi R People: I sure that this is a really easy question, but here goes: I'm trying to build a package that will run on both Linux and Windows. However, there are several commands in a section that will be different in Linux than they are in Windows. Would I be better off just to build two separate packages, please? If just one is needed, how could I determine which system is running in order to use the correct command, please? Thanks in advance, Erin There is the version (a list) variable: version # platform i486-pc-linux-gnu # arch i486 # os linux-gnu # system i486, linux-gnu # status Patched # major 2 # minor 4.0 # year 2006 # month 11 # day25 # svn rev39997 # language R from which you can extract the os component: version$os # [1] linux-gnu I don;t know what this says on a Windows system, but it surely won't mention Linux! So testing this wil enable you to set a flag, e.g. Linux-ifelse(length(grep(linux,version$os))0, TRUE, FALSE) if(Linux){window-function(...) X11(...)} else {window-function(...) windows(...)} Hoping this helps, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 10-Feb-08 Time: 18:39:29 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] building packages for Linux vs. Windows
On 10-Feb-08 18:07:56, Erin Hodgess wrote: Hi R People: I sure that this is a really easy question, but here goes: I'm trying to build a package that will run on both Linux and Windows. However, there are several commands in a section that will be different in Linux than they are in Windows. Would I be better off just to build two separate packages, please? If just one is needed, how could I determine which system is running in order to use the correct command, please? Thanks in advance, Erin There is the version (a list) variable: version # platform i486-pc-linux-gnu # arch i486 # os linux-gnu # system i486, linux-gnu # status Patched # major 2 # minor 4.0 # year 2006 # month 11 # day25 # svn rev39997 # language R from which you can extract the os component: version$os # [1] linux-gnu I don;t know what this says on a Windows system, but it surely won't mention Linux! So testing this wil enable you to set a flag, e.g. Linux-ifelse(length(grep(linux,version$os))0, TRUE, FALSE) if(Linux){window-function(...) X11(...)} else {window-function(...) windows(...)} Hoping this helps, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 10-Feb-08 Time: 18:39:29 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-Geo] Comparing spatial point patterns - Syrjala test
Hi, I went ahead and implemented something. However: - I cannot garantie it gives correct results since, unfortunately, the data used in Syrjala 1996 is not published along with the paper. To avoid mistakes, I started by coding things in a fast and simple way and then tried to optimize the code. At least all versions given the same results. - As expected, the test is still quite slow since it relies on permutations to compute the p.value. The successive optimizations allowed to go from 73 to 13 seconds on my machine, but 13 seconds is still a long time. Furthermore, I don't know how the different versions would scale according to the number of points (I only tested with one dataset). I'm not very good at thinking vector so if someone could look at this and further improve it, I would welcome patches. Maybe the only real solution would be to go the Fortran way and link some code to R, but I did not want to wander in such scary places ;) The code and test data is here: http://cbetm.univ-perp.fr/irisson/svn/distribution_data/tetiaroa/trunk/data/lib_spatial.R Warning: it probably uses non canonical S syntax, sorry for those with sensitive eyes. On 2008-February-10 , at 17:02 , Jan Theodore Galkowski wrote: I'm also interested here in comparing spatial point patterns. So, if anyone finds any further R-based, or S-plus-based work on the matter, or any more recent references, might you please include me in the distribution list? Thanks much! Begin forwarded message: From: jiho [EMAIL PROTECTED] Subject: Comparing spatial point patterns - Syrjala test Dear Lists, At several stations distributed regularly in space[1], we sampled repeatedly (4 times) the abundance of organisms and measured environmental parameters. I now want to compare the spatial distribution of various species (and test wether they differ or not), or to compare the distribution of a particular organism with the distribution of some environmental variable. Syrjala's test[2] seems to be appropriate for such comparisons. The hamming distance is also used (but it is not associated with a test). However, as far as I understand it, Syrjala's test only compares the distribution gathered during one sampling event, while I have four successive repeats and: - I am interested in comparing if, on average, the distributions are the same - I would prefer to keep the information regarding the variability of the abundances in time, rather than just comparing the means, since the abundances are quite variable. Therefore I have two questions for all the knowledgeable R users on these lists: - Is there a package in which Syrjala's test is implemented for R? - Is there another way (a better way) to test for such differences? Thank you very much in advance for your help. [1] http://jo.irisson.free.fr/work/research_tetiaroa.html [2] http://findarticles.com/p/articles/mi_m2120/is_n1_v77/ai_18066337/pg_7 JiHO --- http://jo.irisson.free.fr/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep etc.
sub(-, --, v, fixed=TRUE) See ?sub. Gabor On Sun, Feb 10, 2008 at 02:14:48PM -0500, Michael Kubovy wrote: Dear R-helpers, How do I transform v - c('insd-otsd', 'sppr-unsp') into c('insd--otsd', 'sppr--unsp') ? _ Professor Michael Kubovy University of Virginia Department of Psychology USPS: P.O.Box 400400Charlottesville, VA 22904-4400 Parcels:Room 102Gilmer Hall McCormick RoadCharlottesville, VA 22903 Office:B011+1-434-982-4729 Lab:B019+1-434-982-4751 Fax:+1-434-982-4766 WWW:http://www.people.virginia.edu/~mk9y/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Csardi Gabor [EMAIL PROTECTED]UNIL DGM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grep etc.
Dear R-helpers, How do I transform v - c('insd-otsd', 'sppr-unsp') into c('insd--otsd', 'sppr--unsp') ? _ Professor Michael Kubovy University of Virginia Department of Psychology USPS: P.O.Box 400400Charlottesville, VA 22904-4400 Parcels:Room 102Gilmer Hall McCormick RoadCharlottesville, VA 22903 Office:B011+1-434-982-4729 Lab:B019+1-434-982-4751 Fax:+1-434-982-4766 WWW:http://www.people.virginia.edu/~mk9y/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vector Size
You just have too large a vector for your memory. There is not much you can do with an object of 500 MG. You have over 137 million combinations. What are you trying to do with this vector? --- Oscar A [EMAIL PROTECTED] wrote: Hello everybody!! I'm from Colombia (South America) and I'm new on R. I've been trying to generate all of the possible combinations for a 6 number combination with numbers that ranges from 1 to 53. I've used the following commands: datos-c(1:53) M-matrix(data=(combn(datos,6,FUN=NULL,simplify=TRUE)),nrow=22957480,ncol=6,byrow=TRUE) Once the commands are executed, the program shows the following: Error: CANNOT ALLOCATE A VECTOR OF SIZE 525.5 Mb How can I fix this problem? -- View this message in context: http://www.nabble.com/Vector-Size-tp15366901p15366901.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [OT] good reference for mixed models and EM algorithm
Dear R People: Sorry for the off-topic. Could someone recommend a good reference for using the EM algorithm on mixed models, please? I've been looking and there are so many of them. Perhaps someone here can narrow things down a bit. Thanks in advance, Sincerely, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data frame question
Hello I have 2 data frames df1 and df2. I would like to create a new data frame new_df which will contain only the common rows based on the first 2 columns (chrN and start). The column score in the new data frame should be replaced with a column containing the average score (average_score) from df1 and df2. df1= data.frame(chrN= c(chr1, chr1, chr1, chr1, chr2, chr2, chr2), start= c(23, 82, 95, 108, 95, 108, 121), end= c(33, 92, 105, 118, 105, 118, 131), score= c(3, 6, 2, 4, 9, 2, 7)) df2= data.frame(chrN= c(chr1, chr2, chr2, chr2 , chr2), start= c(23, 50, 95, 20, 121), end= c(33, 60, 105, 30, 131), score= c(9, 3, 7, 7, 3)) new_df= data.frame(chrN= c(chr1, chr2, chr2), start= c(23, 95, 121), end= c(33, 105, 131), average_score= c(6, 8, 5)) Thank you for your help Joseph Never miss a thing. Make Yahoo your home page. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [OT] good reference for mixed models and EM algorithm
Hi, Erin: Have you looked at Pinheiro and Bates (2000) Mixed-Effects Models in S and S-Plus (Springer)? As far as I know, Doug Bates has been the leading innovator in this area for the past 20 years. Pinheiro was one of his graduate students. The 'nlme' package was developed by him or under his supervision, and 'lme4' is his current development platform. The ~R\library\scripts subdirectory contains ch01.R, ch02.R, etc. = script files to work the examples in the book (where ~R = your R installation directory). There are other good books, but I recommend you start with Pinheiro and Bates. Spencer Graves Erin Hodgess wrote: Dear R People: Sorry for the off-topic. Could someone recommend a good reference for using the EM algorithm on mixed models, please? I've been looking and there are so many of them. Perhaps someone here can narrow things down a bit. Thanks in advance, Sincerely, Erin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame question
On 10/02/2008, joseph [EMAIL PROTECTED] wrote: Hello I have 2 data frames df1 and df2. I would like to create a new data frame new_df which will contain only the common rows based on the first 2 columns (chrN and start). The column score in the new data frame should be replaced with a column containing the average score (average_score) from df1 and df2. Try this: (avoiding underscores) new.df - merge(df1, df2, by=c('chrN','start')) new.df$average.score - apply(df3[,c('score.x','score.y')], 1, mean, na.rm=T) As always, interested to see whether it can be done in one line... -- Dr. Mark Wardle Specialist registrar, Neurology Cardiff, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame question
joseph [EMAIL PROTECTED] wrote in news:[EMAIL PROTECTED]: I have 2 data frames df1 and df2. I would like to create a new data frame new_df which will contain only the common rows based on the first 2 columns (chrN and start). The column score in the new data frame should be replaced with a column containing the average score (average_score) from df1 and df2. df1= data.frame(chrN= c(chr1, chr1, chr1, chr1, chr2, chr2, chr2), start= c(23, 82, 95, 108, 95, 108, 121), end= c(33, 92, 105, 118, 105, 118, 131), score= c(3, 6, 2, 4, 9, 2, 7)) df2= data.frame(chrN= c(chr1, chr2, chr2, chr2 , chr2), start= c(23, 50, 95, 20, 121), end= c(33, 60, 105, 30, 131), score= c(9, 3, 7, 7, 3)) Clunky to be sure, but this should worked for me: df3 - merge(df1,df2,by=c(chrN,start) #non-match variables get auto-relabeled df3$avg.scr - with(df3, (score.x+score.y)/2) # or mean( ) df3 - df3[,c(chrN,start,avg.scr)] #drops the variables not of interest df3 chrN start avg.scr 1 chr123 6 2 chr2 121 5 3 chr295 8 -- David Winsemius __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prcomp vs. princomp vs fast.prcomp
On 2/10/08, Erin Hodgess [EMAIL PROTECTED] wrote: When performing PCA, should I use prcomp, princomp or fast.prcomp, please? You can take a look here [1] and here [2] for some short references. From the first page: Principal Components Analysis (PCA) is available in prcomp() (preferred) and princomp() in standard package stats. There are also - at least - FactoMineR, psych and ade4 that provide PCA funtions. I imagine that it would much depend on what you want to do. Liviu [1] http://cran.miscellaneousmirror.org/src/contrib/Views/Environmetrics.html [2] http://cran.r-project.org/src/contrib/Views/Psychometrics.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Applying lm to data with combn
I think that what you want do is stepwise, see step function On 09/02/2008, AliR [EMAIL PROTECTED] wrote: Thank you, can you suggest wht is the shortest way to store the combination with min residual error term? AliR wrote: http://www.nabble.com/file/p15359204/test.data.csv http://www.nabble.com/file/p15359204/test.data.csv test.data.csv Hi, I have used apply to have certian combinations, but when I try to use these combinations I get the error [Error in eval(expr, envir, enclos) : object X.GDAXI not found]. being a novice I donot understand that after applying combination to the data I cant access it and use lm on these combinations. The data frame either is no longer a matrix, how can I access the data and make it work for lm!! Any help please! fruit = read.csv(file=test.data.csv,head= TRUE, sep=,)# read it in matrix format #fruit =read.file(row.names=1)$data mD =head(fruit[, 1:5])# only first five used in combinations #X.SSMII = head(fruit[, 6])# Keep it for referebce nmax = NULL n = ncol(mD)# dont take the last column for reference purpose if(is.null(nmax)) nmax = n mDD = apply(combn(5, 1),1, FUN= function(y) mD[, y])# to fg = lm( X.SSMII ~ X.GDAXI + X.FTSE + X.FCHI + X.IBEX, data = mDD )# regress on combos s = cbind(s, Residuals = residuals(fg))# take residuals print(mD) -- View this message in context: http://www.nabble.com/Applying-lm-to-data-with-combn-tp15359204p15391159.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reshape
Dear colleagues, I'd like to reshape a datafame in a long format to a wide format, but I do not quite get what I want. Here is an example of the data I've have (dat): sp - c(a, a, a, a, b, b, b, c, d, d, d, d) tr - c(A, B, B, C, A, B, C, A, A, B, C, C) code - c(a1, a2, a2, a3, a3, a3, a4, a4, a4, a5, a5, a6) dat - data.frame(id=1:12, sp=sp, tr=tr, val=31:42, code=code) and below is what I'd like to obtain. That is, I'd like the tr variable in different columns (as a timevar) with their value (val). sp code tr.A tr.B tr.C aa1 31NANA aa2 NA32NA aa2 NA33NA** aa3 NANA34 ba3 3536NA ba4 NANA37 ca4 38NANA da4 39NANA da5 NA4041 da6 NANA42 Using reshape: reshape(dat[,2:5], direction=wide, timevar=tr, idvar=c(code,sp )) I'm getting very close. The only difference is in the 3rd row (**), that is when sp and code are the same I only get one record. Is there a way to get all records? Any idea? Thank you very much for any help Juli Pausas -- http://www.ceam.es/pausas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape
reshape(dat, direction=wide, timevar=tr, idvar=c(id, code,sp ))[,2:6] But, I don't understand why you use reshape On 10/02/2008, juli pausas [EMAIL PROTECTED] wrote: Dear colleagues, I'd like to reshape a datafame in a long format to a wide format, but I do not quite get what I want. Here is an example of the data I've have (dat): sp - c(a, a, a, a, b, b, b, c, d, d, d, d) tr - c(A, B, B, C, A, B, C, A, A, B, C, C) code - c(a1, a2, a2, a3, a3, a3, a4, a4, a4, a5, a5, a6) dat - data.frame(id=1:12, sp=sp, tr=tr, val=31:42, code=code) and below is what I'd like to obtain. That is, I'd like the tr variable in different columns (as a timevar) with their value (val). sp code tr.A tr.B tr.C aa1 31NANA aa2 NA32NA aa2 NA33NA** aa3 NANA34 ba3 3536NA ba4 NANA37 ca4 38NANA da4 39NANA da5 NA4041 da6 NANA42 Using reshape: reshape(dat[,2:5], direction=wide, timevar=tr, idvar=c(code,sp )) I'm getting very close. The only difference is in the 3rd row (**), that is when sp and code are the same I only get one record. Is there a way to get all records? Any idea? Thank you very much for any help Juli Pausas -- http://www.ceam.es/pausas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape
This isn't really well defined. Suppose we have two rows that both have a, a2 and a value for B. Now suppose we have another row with a,a2 but with a value for C. Does the third row go with the first one? the second one? a new row? both the first and the second? Here is one possibility but without a good definition of the problem we don't know whether its answering the problem that is intended. In the code below we assume that all dat rows that have the same sp value and the same code value are adjacent and if a tr occurs among those dat rows that is equal to or less than the prior row in factor level order then the new dat row must start a new output row else not. Thus within an sp/code group we assign each row a 1 until we get a tr that is less than the prior row's tr and then we start assigning 2 and so on. This is the new column seq below. We then use seq as part of our id.var in reshape. For the particular example in your post this does give the same answer. f - function(x) cumsum(c(1, diff(x) = 0)) dat$seq - ave(as.numeric(dat$tr), dat$sp, dat$code, FUN = f) reshape(dat[-1], direction=wide, timevar=tr, idvar=c(code,sp,seq ))[-3] On Feb 10, 2008 4:58 PM, juli pausas [EMAIL PROTECTED] wrote: Dear colleagues, I'd like to reshape a datafame in a long format to a wide format, but I do not quite get what I want. Here is an example of the data I've have (dat): sp - c(a, a, a, a, b, b, b, c, d, d, d, d) tr - c(A, B, B, C, A, B, C, A, A, B, C, C) code - c(a1, a2, a2, a3, a3, a3, a4, a4, a4, a5, a5, a6) dat - data.frame(id=1:12, sp=sp, tr=tr, val=31:42, code=code) and below is what I'd like to obtain. That is, I'd like the tr variable in different columns (as a timevar) with their value (val). sp code tr.A tr.B tr.C aa1 31NANA aa2 NA32NA aa2 NA33NA** aa3 NANA34 ba3 3536NA ba4 NANA37 ca4 38NANA da4 39NANA da5 NA4041 da6 NANA42 Using reshape: reshape(dat[,2:5], direction=wide, timevar=tr, idvar=c(code,sp )) I'm getting very close. The only difference is in the 3rd row (**), that is when sp and code are the same I only get one record. Is there a way to get all records? Any idea? Thank you very much for any help Juli Pausas -- http://www.ceam.es/pausas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Do I need to use dropterm()??
dropterm() is a tool for model building, not primarily for significance testing. As the name suggests, it tells you what the effect would be were you to drop each *accessible* term in the model as it currently stands. By default it displays the effect on AIC of dropping each term, in turn, from the model. If you request them, though, it can also give you test statistics and significance probabilities. If there is an A:B interaction in the model, the main effects A or B, if present, are not considered until a decision has been made on including A:B. The meaning of A:B in a model is not absolute: it is conditional on which main effect terms you have there as well. This is one reason why the process is ordered in this way, but the main reason is the so-called 'marginality' issue. If you do ask for test statistics and significance probabilities, you get almost a SAS-style Type III anova table, with the important restriction noted above: you will not get main effect terms shown along with interactions. If you want the full SAS, uh, version, there are at least two possibilities. 1. Use SAS. 2. Use John Fox's Anova() function from the 'car' package, along with his excellent book, which should explain how to avoid shooting yourself in the foot over this. (This difference of opinion on what should sensibly be done, by the way, predates R by a long shot. My first exposure to it were with the very acrimonious disputes between Nelder and Kempthorne in the mid 70's. It has remained a cross-Atlantic dispute pretty well ever since, with the latest shot being the paper by Lee and Nelder in 2004. Curiously, the origin of the software can almost be determined by the view taken on this issue, with Genstat going one way and SAS, SPSS, ... the other. S-PLUS was a late comer...but I digress!) Bill Venables. Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of DaniWells Sent: Sunday, 10 February 2008 11:40 PM To: r-help@r-project.org Subject: [R] Do I need to use dropterm()?? Hello, I'm having some difficulty understanding the useage of the dropterm() function in the MASS library. What exactly does it do? I'm very new to R, so any pointers would be very helpful. I've read many definitions of what dropterm() does, but none seem to stick in my mind or click with me. I've coded everything fine for an interaction that runs as follows: two sets of data (one for North aspect, one for Southern Aspect) and have a logscale on the x axis, with survival on the y. After calculating my anova results i have all significant results (ie aspect = sig, logscale of sunlight = sig, and aspect:llight = sig). When i have all significant results in my ANOVA table, do i need dropterm(), or is that just to remove insignificant terms? Many thanks, Dani -- View this message in context: http://www.nabble.com/Do-I-need-to-use-dropterm%28%29---tp15396151p15396 151.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] j and jcross queries
Hi: I have a query related to the J and Jcross functions in the SpatStat package. I use J to finding indications of clustering in my data, and Jcross to look for dependence between point patterns. I use the envelope function to do Monte Carlo tests to look for significance. So far so good. My question is how I can test to see if tests are significantly different. For example, if find J of pattern X and J of pattern Y, how could I determine the liklihood that those results come from different processes? Similarly, if I find J of marks X and Y, and X and Z, how could I determine the liklihood that Y and Z come from different processes? I would appreciate advice. Cheers Robert Biddle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Questions about histograms
Hello I'm doing some experiments with the various histogram functions and I have a two questions about the prob option and binning. First, here's a simple plot of my data using the default hist() function: hist(data[,1], prob = TRUE, xlim = c(0, 35)) http://go.sneakymustard.com/tmp/hist.jpg My first question is regarding the resulting plot from hist.scott() and hist.FD(), from the MASS package. I'm setting prob to TRUE in these functions, but as it can be seen in the images below, the value for the first bar of the histogram is well above 1.0. Shouldn't the total area be 1.0 in the case of prob = TRUE? hist.scott(data[,1], prob = TRUE, xlim=c(0, 35)) http://go.sneakymustard.com/tmp/scott.jpg hist.FD(data[,1], prob = TRUE, xlim=c(0, 35)) http://go.sneakymustard.com/tmp/FD.jpg Is there anything I can do to fix these plots? My second question is related to binning. Is there a function or package that allows one to use logarithmic binning in R, that is, create bins such that the length of a bin is a multiple of the length of the one before it? Pointers to the appropriate docs are welcome, I've been searching for this and couldn't find any info. Best regards, Andre __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Questions about histograms
On 10/02/2008 8:14 PM, Andre Nathan wrote: Hello I'm doing some experiments with the various histogram functions and I have a two questions about the prob option and binning. First, here's a simple plot of my data using the default hist() function: hist(data[,1], prob = TRUE, xlim = c(0, 35)) http://go.sneakymustard.com/tmp/hist.jpg My first question is regarding the resulting plot from hist.scott() and hist.FD(), from the MASS package. I'm setting prob to TRUE in these functions, but as it can be seen in the images below, the value for the first bar of the histogram is well above 1.0. Shouldn't the total area be 1.0 in the case of prob = TRUE? hist.scott(data[,1], prob = TRUE, xlim=c(0, 35)) It looks to me as though the area is one. The first bar is about 3.6 units high, and about 0.2 units wide: area is 0.72. There are no gaps between bars in an R histogram, so the gaps you see in this jpg are bars with zero height. Duncan Murdoch http://go.sneakymustard.com/tmp/scott.jpg hist.FD(data[,1], prob = TRUE, xlim=c(0, 35)) http://go.sneakymustard.com/tmp/FD.jpg Is there anything I can do to fix these plots? My second question is related to binning. Is there a function or package that allows one to use logarithmic binning in R, that is, create bins such that the length of a bin is a multiple of the length of the one before it? Pointers to the appropriate docs are welcome, I've been searching for this and couldn't find any info. Best regards, Andre __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Questions about histograms
Andre, Regarding your first question, it is by no means clear there is anything to fix, in fact I'm sure there is nothing to fix. The fact that the height of any bar is greater than one is irrelevant - the width of the bar is much less than one, as is the product of height by width. Area is height x width, not just height Regarding the second question - logarithmic breaks. I'm not aware of anything currently available to do this, but the tools are there for you to do it yourself. The 'breaks' argument to hist allows you to specify your breaks explicitly (among other things) so it's just a matter of setting up the logarithmic (or, more precisely, 'geometric progression') bins yourself and relaying them on to hist. Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andre Nathan Sent: Monday, 11 February 2008 11:14 AM To: r-help@r-project.org Subject: [R] Questions about histograms Hello I'm doing some experiments with the various histogram functions and I have a two questions about the prob option and binning. First, here's a simple plot of my data using the default hist() function: hist(data[,1], prob = TRUE, xlim = c(0, 35)) http://go.sneakymustard.com/tmp/hist.jpg My first question is regarding the resulting plot from hist.scott() and hist.FD(), from the MASS package. I'm setting prob to TRUE in these functions, but as it can be seen in the images below, the value for the first bar of the histogram is well above 1.0. Shouldn't the total area be 1.0 in the case of prob = TRUE? hist.scott(data[,1], prob = TRUE, xlim=c(0, 35)) http://go.sneakymustard.com/tmp/scott.jpg hist.FD(data[,1], prob = TRUE, xlim=c(0, 35)) http://go.sneakymustard.com/tmp/FD.jpg Is there anything I can do to fix these plots? My second question is related to binning. Is there a function or package that allows one to use logarithmic binning in R, that is, create bins such that the length of a bin is a multiple of the length of the one before it? Pointers to the appropriate docs are welcome, I've been searching for this and couldn't find any info. Best regards, Andre __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using R in a university course: dealing with proposal comments
Hi All, I am scheduled to teach a graduate course on research methods in health sciences at a university. While drafting the course proposal, I decided to include a brief introduction to R, primarily with an objective to enable the students to do data analysis using R. It is expected that enrolled students of this course have all at least a formal first level introduction to quantitative methods in health sciences and following completion of the course, they are all expected to either evaluate, interpret, or conduct primary research studies in health. The course would be delivered over 5 months, and R was proposed to be taught as several laboratory based hands-on sessions along with required readings within the coursework. The course proposal went to a few colleagues in the university for review. I received review feedbacks from them; two of them commented about inclusion of R in the proposal. In quoting parts these mails, I have masked the names/identities of the referees, and have included just part of the relevant text with their comments. Here are the comments: Comment 1: In my quick glance, I did not see that statistics would be taught, but I did see that R would be taught. Of course, R is a statistics programme. I worry that teaching R could overwhelm the class. Or teaching R would be worthless, because the students do not understand statistics. (Prof LR) Comment 2: Finally, on a minor point, why is R the statistical software being used? SPSS is probably more widely available in the workplace – certainly in areas of social policy etc. (Prof NB) I am interested to know if any of you have faced similar questions from colleagues about inclusion of R in non-statistics based university graduate courses. If you did and were required to address these concerns, how you would respond? TIA, Arin Basu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using R in a university course: dealing with proposal comments
Comment 1 raises a real issue. R is just a tool. Too often people do confuse the tool with the real skill that the people who use it should have. There are plenty of questions on R-help that demonstrate this confusion. It's well worth keeping in mind and acting upon if you can see a problem emerging, but I would not take it quite at face value and abandon R on those grounds. Comment 2 is one of those comments that belongs to a very particular period of time, one that passes as we look on. It reminds me of the time I tried to introduce some new software into my courses, (back in the days when I was a teacher, long, long ago...). The students took to it like ducks to water, but my colleagues on the staff were very slow to adapt, and some never did. Also, R wins every time on price! Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Arin Basu Sent: Monday, 11 February 2008 1:41 PM To: r-help@r-project.org Subject: [R] Using R in a university course: dealing with proposal comments Hi All, I am scheduled to teach a graduate course on research methods in health sciences at a university. While drafting the course proposal, I decided to include a brief introduction to R, primarily with an objective to enable the students to do data analysis using R. It is expected that enrolled students of this course have all at least a formal first level introduction to quantitative methods in health sciences and following completion of the course, they are all expected to either evaluate, interpret, or conduct primary research studies in health. The course would be delivered over 5 months, and R was proposed to be taught as several laboratory based hands-on sessions along with required readings within the coursework. The course proposal went to a few colleagues in the university for review. I received review feedbacks from them; two of them commented about inclusion of R in the proposal. In quoting parts these mails, I have masked the names/identities of the referees, and have included just part of the relevant text with their comments. Here are the comments: Comment 1: In my quick glance, I did not see that statistics would be taught, but I did see that R would be taught. Of course, R is a statistics programme. I worry that teaching R could overwhelm the class. Or teaching R would be worthless, because the students do not understand statistics. (Prof LR) Comment 2: Finally, on a minor point, why is R the statistical software being used? SPSS is probably more widely available in the workplace - certainly in areas of social policy etc. (Prof NB) I am interested to know if any of you have faced similar questions from colleagues about inclusion of R in non-statistics based university graduate courses. If you did and were required to address these concerns, how you would respond? TIA, Arin Basu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using R in a university course: dealing with proposal comments
Hello Arin, If your future students do not know statistics, you might consider buffering their introduction to R with the help of a GUI package, such as Rcmdr (if functionality is missing, you could add it yourself via the plugin infrastructure). Another way to help students would be to direct them to easy to use and straight-forward resources, like this [1], this [2] or this [3]. On the why not SPSS point, I would imagine the answer is quality and price, and all the corollary arguments (say, you can use it at home or during the weekend, etc). No more than my two cents. Liviu [1] http://oit.utk.edu/scc/RforSASSPSSusers.pdf [2] http://www.statmethods.net/index.html [3] http://zoonek2.free.fr/UNIX/48_R/all.html On 2/11/08, Arin Basu [EMAIL PROTECTED] wrote: Comment 1: In my quick glance, I did not see that statistics would be taught, but I did see that R would be taught. Of course, R is a statistics programme. I worry that teaching R could overwhelm the class. Or teaching R would be worthless, because the students do not understand statistics. (Prof LR) Comment 2: Finally, on a minor point, why is R the statistical software being used? SPSS is probably more widely available in the workplace – certainly in areas of social policy etc. (Prof NB) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with write.csv
Dear all, I am new to R. I am using the impute package with data contained in csv file. I have followed the example in the impute package as follows: mydata = read.csv(sample_impute.csv, header = TRUE) mydata.expr - mydata[-1,-(1:2)] mydata.imputed - impute.knn(as.matrix(mydata.expr)) The impute is succesful. Then I try to write the imputation results (mydata.imputed) to a csv file such as follows.. write.csv(mydata.imputed, file = sample_imputed.csv) Error in data.frame(data = c(-0.07, -1.22, -0.09, -0.6, 0.65, -0.36, 0.25, : arguments imply differing number of rows: 18, 1, 0 I need help understanding the error message and overcoming the write.csvproblem. TQVM! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using R in a university course: dealing with proposal comments
R is just a tool, but so is English. R is the platform of choice for an increasing portion of people involved in new statistical algorithm development. R is not yet the de facto standard for nearly all serious research internationally, to the extent that English is. However, I believe that is only a matter of time. There will always be a place for software with a nicer graphical user interface, etc., than R. For an undergraduate course, it may be wise to stick with SPSS, SAS, Minitab, etc. Are you teaching graduate students to solve yesterday's problems or tomorrow's? Much of my work in 2007 was in Matlab, because I am working with colleagues who use only Matlab. Matlab has better debugging tools. However, R now has well over 1,000 contributed packages, and r-help and r-sig-x provide better support and extensibility than you will likely get from commercial software. Twice in the past year, an executive said I should get some Matlab toolbox. In the first case, after thinking about it for a few days, I finally requested and received official permission from a Vice President. From that point, it took roughly a week to get a quote from Mathsoft, then close to two weeks to get approval from our Chief Financial Officer, then a few more days to actually get the software. With R, that month long process is reduced to seconds: I download the package and try it. This has allowed me to do things today that I only dreamed of doing a few years ago. Moreover, R makes it much easier for me to learn new statistical techniques. When I'm not sure I understand the math, I can trace through a worked example in R, and the uncertainties almost always disappear. For that, 'debug(fun)' helps a lot. If I want to try something different, I don't have to start from scratch to develop code to perform an existing analysis. I now look for companion R code before I decide to buy a book or when I prioritize how much time I will spend with different books or articles: If something has companion R code, I know I can learn much quicker how to use, modify and extend the statistical tools discussed. Spencer Graves [EMAIL PROTECTED] wrote: Comment 1 raises a real issue. R is just a tool. Too often people do confuse the tool with the real skill that the people who use it should have. There are plenty of questions on R-help that demonstrate this confusion. It's well worth keeping in mind and acting upon if you can see a problem emerging, but I would not take it quite at face value and abandon R on those grounds. Comment 2 is one of those comments that belongs to a very particular period of time, one that passes as we look on. It reminds me of the time I tried to introduce some new software into my courses, (back in the days when I was a teacher, long, long ago...). The students took to it like ducks to water, but my colleagues on the staff were very slow to adapt, and some never did. Also, R wins every time on price! Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Arin Basu Sent: Monday, 11 February 2008 1:41 PM To: r-help@r-project.org Subject: [R] Using R in a university course: dealing with proposal comments Hi All, I am scheduled to teach a graduate course on research methods in health sciences at a university. While drafting the course proposal, I decided to include a brief introduction to R, primarily with an objective to enable the students to do data analysis using R. It is expected that enrolled students of this course have all at least a formal first level introduction to quantitative methods in health sciences and following completion of the course, they are all expected to either evaluate, interpret, or conduct primary research studies in health. The course would be delivered over 5 months, and R was proposed to be taught as several laboratory based hands-on sessions along with required readings within the coursework. The course proposal went to a few colleagues in the university for review. I received review feedbacks from them; two of them commented about inclusion of R in the proposal. In quoting parts these mails, I have masked the names/identities of the referees, and have included just part of the relevant text with their comments. Here are the comments: Comment 1: In my quick glance, I did not see that statistics would be taught, but I did see that R would be taught. Of course, R is a statistics programme. I worry that teaching R could overwhelm the class. Or teaching R would be worthless, because the
[R] tree() producing NA's
Hi Hoping someone can help me (a newbie). I am trying to construct a tree using tree() in package tree. One of the fields is a factor field (owner), with many levels. In the resulting tree, I see many NA's (see below), yet in the actual data there are none. rr200.tr - tree(backprof ~ ., rr200) rr200.tr 1) root 200 1826.00 -0.2332 ... [snip] ... 5) owner: Cliveden Stud,NA,NA,NA,NA,NA,NA,NA,NA 10 14.25 1.5870 * 3) owner: B E T Partnership,Flaming Sambuca Syndicate,NA,NA,NA,NA,NA,NA,NA,NA 11 384.40 10.5900 6) decodds 12 5 74.80 6.3000 * 7) decodds 12 6 140.80 14.1700 * Can anyone tell me why this happens and what I can do about it? Regards Amnon [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.