Re: [R] Cross Spectrum Analysis
On Sep 2, 2010, at 1:00 AM, aftar wrote: Hi Is anyone know how could I find the cross spectrum? from the spectrum function, it only give the spectrum for each individual series. I read the manual very differently. It is telling me that you _do_ get cross spectrum results in the coh matrix: VALUE cohNULL for univariate series. For multivariate time series, a matrix containing the squared coherency between different series. Column i + (j - 1) * (j - 2)/2 of coh contains the squared coherency between columns i and j ofx, where i j. And there is a link on that help page to a help page that includes the plot.spec.coherency function. This response from Prof Ripley (very easily found on an RSiteSearch search for cross spectrum) has a further link to a supplement to VR4: http://finzi.psych.upenn.edu/Rhelp10/2009-November/219526.html http://www.stats.ox.ac.uk/pub/MASS4/VR4stat.pdf ... which discusses coherency and illustrates processing of the coh object. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to access some elements of a S4 object?
David, Thanks a lot!!! It works,and that's what I need. Good night. Sean From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Wed 9/1/2010 9:50 PM To: Xiang Li Cc: r-help@r-project.org Subject: Re: [R] How to access some elements of a S4 object? On Sep 1, 2010, at 10:04 PM, Xiang Li wrote: Hi, Could you please tell me how to access the elements of a S4 object? Slot alpha.name: [1] Cutoff p...@alpha.values [[1]] [1] Inf 0.991096434 0.984667270 0.984599159 0.983494405 0.970641299 0.959417835 0.945548941 0.943432189 If that is the print() result of: p...@alpha.values# then it's probably an ordinary list with a single vector element, so just try p...@alpha.values[[1]][2] (The str function also works on S4 objects .) -- David. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to generate integers from uniform distribution with fixed mean
Hi, folks, runif (n,min,max) is the typical code for generate R.V from uniform dist. But what if we need to fix the mean as 20, and we want the values to be integers only? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] extract month data automatically
Hi, I would like to extract monthly data automatically using a code. Given are my sample data. I know how to do one by one: jan_data1 - Pooraka_data[Pooraka_data$Month==1,4] feb_data1 - Pooraka_data[Pooraka_data$Month==2,4] mar_data1 - Pooraka_data[Pooraka_data$Month==3,4] apr_data1 - Pooraka_data[Pooraka_data$Month==4,4] ... then i try: p0_mn - function(dt) { p0 - sum(dt==0)/length(dt) mn - mean(dt) c(p0=p0, mn=mn) } rbind(p0_mn(jan_data2),p0_mn(feb_data1), p0_mn(mar_data1),p0_mn(apr_data1),p0_mn(may_data1),p0_mn(jun_data1),p0_mn(jul_data1), p0_mn(aug_data1),p0_mn(sep_data1),p0_mn(oct_data1),p0_mn(nov_data1),p0_mn(dec_data1)) ) head(Pooraka_data,20); tail(Pooraka_data,20) Year Month Day Amount 1 1901 1 1 0.0 2 1901 1 2 3.0 3 1901 1 3 0.0 4 1901 1 4 0.5 5 1901 1 5 0.0 6 1901 1 6 0.0 7 1901 1 7 0.3 8 1901 1 8 0.0 9 1901 1 9 0.0 10 1901 1 10 0.0 11 1901 1 11 0.5 12 1901 1 12 1.8 13 1901 1 13 0.0 14 1901 1 14 0.0 15 1901 1 15 2.5 16 1901 1 16 0.0 17 1901 1 17 0.0 18 1901 1 18 0.0 19 1901 1 19 0.0 20 1901 1 20 0.0 Year Month Day Amount 32858 1990 12 17 0.0 32859 1990 12 18 0.0 32860 1990 12 19 0.8 32861 1990 12 20 0.0 32862 1990 12 21 0.0 32863 1990 12 22 8.0 32864 1990 12 23 0.0 32865 1990 12 24 0.0 32866 1990 12 25 0.0 32867 1990 12 26 0.0 32868 1990 12 27 0.4 32869 1990 12 28 0.0 32870 1990 12 29 0.0 32871 1990 12 30 0.0 32872 1990 12 31 0.0 32873 1991 1 1 0.0 32874 1991 1 2 0.0 32875 1991 1 3 0.0 32876 1991 1 4 0.0 32877 1991 1 5 5.4 thank you. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R CMD check Package(Windows): Error in inDL(x, as.logical(local), as.logical(now), ...) :
Hi, I've built my own package in windows and when I run R CMD check Package-Name I get, * install options are ' --no-html' * installing *source* package 'AceTest' ... ** libs making DLL ... g++ ...etc. installing to PATH ... done ** R ** preparing package for lazy loading ** help Warning: ./man/AceTest-package.Rd:34: All text must be in a section Warning: ./man/AceTest-package.Rd:35: All text must be in a section *** installing help indices ** building package indices ... ** testing if installed package can be loaded Error in inDL(x, as.logical(local), as.logical(now), ...) : unable to load shared library 'H:/RTick/Project/Client/AceTest.Rcheck/AceTest/libs/AceTest.dll': LoadLibrary failure: The specified module could not be found. ERROR: loading failed * removing 'H:/RTick/Project/Client/AceTest.Rcheck/AceTest' can someone point out what went wrong? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using R's svd from outside R
Hi, I have to compute the singular value decomposition of rather large matrices. My test matrix is 10558 by 4255 and it takes about three minutes in R to decompose on a 64bit quadruple core linux machine. (R is running svd in parallel, all four cores are at their maximum load while doing this.) I tried several blas and lapack libraries as well as the gnu scientific library in my C++ programm. Apart from being unable to have them do svd in parallel mode (although I thought I did everything to make them do it in parallel) execution time always exceeds 25 minutes which is still way more than the expected 12 minutes for the non-parallel R code. I am now going to call R from within my program, but this not very elegant. So my questions are: Does R use a special svd-routine and is it possible to use it directly by linking in the relevant libraries? (Sorry but I couldn't figure that out by looking at the source code.) If that is possible, can I have the code run in parallel mode? Thanks, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pairs with same xlim and ylim scale
Hi Dejian, You're right on this! Do you know how to pass those two argument into lower.panel? Thanks! ...Tao From: Dejian Zhao zha...@ioz.ac.cn To: r-help@r-project.org Sent: Tue, August 31, 2010 6:10:16 PM Subject: Re: [R] pairs with same xlim and ylim scale I think you have successfully passed the xlim and ylim into the function pairs1. Compare the two graphs produced by the codes you provided, you can find the xlim and ylim in the second graph have been reset to the assigned value. It seems that the program halted in producing the second plot after adding xlim and ylim. According to the error message, the two added parameters were not used in lower.panel, or the customized function f.xy. On 2010-9-1 2:26, Shi, Tao wrote: Hi list, I have a function which basically is a wrapper of pairs with some useful panel functions. However, I'm having trouble to pass the xlim and ylim into the function so the x and y axes are in the same scale and 45 degree lines are exactly diagonal. I've looked at some old posts, they didn't help much. I [[elided Yahoo spam]] Thanks! ...Tao pairs1- function(x, ...) { f.xy- function(x, y, ...) { points(x, y, ...) abline(0, 1, col = 2) } panel.cor- function(x, y, digits=2, prefix=, cex.cor) { usr- par(usr); on.exit(par(usr)) par(usr = c(0, 1, 0, 1)) r- abs(cor(x, y, method=p, use=pairwise.complete.obs)) txt- format(c(r, 0.123456789), digits=digits)[1] txt- paste(prefix, txt, sep=) if(missing(cex.cor)) cex- 0.8/strwidth(txt) text(0.5, 0.5, txt, cex = cex * r) } panel.hist- function(x, ...) { usr- par(usr); on.exit(par(usr)) par(usr = c(usr[1:2], 0, 1.5) ) h- hist(x, plot = FALSE) breaks- h$breaks; nB- length(breaks) y- h$counts; y- y/max(y) rect(breaks[-nB], 0, breaks[-1], y, col=cyan, ...) } pairs(x, lower.panel=f.xy, upper.panel=panel.cor, diag.panel=panel.hist, ...) } x- rnorm(100, sd=0.2) x- cbind(x=x-0.1, y=x+0.1) pairs1(x) pairs1(x, xlim=c(-1,1), ylim=c(-1,1)) Error in lower.panel(...) : unused argument(s) (xlim = c(-1, 1), ylim = c(-1, 1)) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to generate integers from uniform distribution with fixed mean
On Thu, Sep 2, 2010 at 7:17 AM, Yi liuyi.fe...@gmail.com wrote: Hi, folks, runif (n,min,max) is the typical code for generate R.V from uniform dist. But what if we need to fix the mean as 20, and we want the values to be integers only? It's not clear what you want. Uniformly random integers with expected mean 20 - but what range? Any range centred on 20 will work, for example you could use sample() with replacement. To see the distribution, use sample() table(sample(17:23,1,TRUE)) which gives a uniform distribution of integers from 17 to 23, so the mean is 20.0057 for 1 samples. Is that what you want? Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cross Spectrum Analysis
Hi I'm sorry, but I don't think that coherence is the same as the cross spectrum. People use coherence since it is much easier to deal with. I know how by using R to plot and calculate the coherence and phase, but what I didn't know is how to calculate the cross spectrum by using R. Regards Aftar -- View this message in context: http://r.789695.n4.nabble.com/Cross-Spectrum-Analysis-tp855188p2486248.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R code changes to crap
Hi, I am using vim to edit my files...I do not know what has been pressed by me as my R code get converted to such a crap... %PDF-1.4 %81â81ã81Ï81Ó\r 1 0 obj /CreationDate (D:20100902122215) /ModDate (D:20100902122215) /Title (R Graphics Output) /Producer (R 2.11.1) /Creator (R) how to reconvert it to the my actual code?? as I do not have backup for it..? Thanks in advance Khushwant [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reshape to wide format takes extremely long
Hello, I have a data.frame with the following format: head(clin2) Study Subject Type Obs Cycle Day Date Time 1 A001101 10108 ALB 44.098 1 2004-03-11 14:26 2 A001101 10108 ALP 95.098 1 2004-03-11 14:26 3 A001101 10108 ALT 61.098 1 2004-03-11 14:26 5 A001101 10108 AST 33.098 1 2004-03-11 14:26 I want to transform this data.frame so that I have Obs columns for each Type. The full dataset is 45000 rows long. For a short subset of 100 rows, reshaping takes 0.2 seconds, and produces what I want. All columns are either numeric or character format (incl. date/time). reshape(clin2, v.names=Obs, timevar=Type, direction=wide,idvar=c(Study,Subject,Cycle,Day,Date,Time),) Study Subject Cycle Day Date Time Obs.ALB Obs.ALP Obs.ALT Obs.AST 1 A001101 1010898 1 2004-03-11 14:26 44 95 61 33 11 A001101 10108 1 1 2004-03-12 14:01 41 85 39 33 21 A001101 10108 1 8 2004-03-22 10:34 40 90 70 34 30 A001101 10108 1 15 2004-03-29 09:56 45 97 66 48 [] However, when using the same reshape command for the full data.frame of 45000 rows, it still wasn't finished when run overnight (8 GB RAM + 8 GB swap in use). The time to process this data.frame from a 100-row subset to a 1000-row subset increases from 0.2 sec to 60 sec. I would greatly appreciate a advice why the time for reshaping is increasing exponentially with the nr. of rows, and how I can do this in an elegant way. Thanks! Coen. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R code changes to crap
On Sep 2, 2010, at 09:16 , khush wrote: Hi, I am using vim to edit my files...I do not know what has been pressed by me as my R code get converted to such a crap... %PDF-1.4 %81â81ã81Ï81Ó\r 1 0 obj /CreationDate (D:20100902122215) /ModDate (D:20100902122215) /Title (R Graphics Output) /Producer (R 2.11.1) /Creator (R) how to reconvert it to the my actual code?? as I do not have backup for it..? Well, that's a PDF file, with some R Graphics inside, so presumably you did pdf(file=foo.R) or copied the graphics output as PDF to foo.R using the user-friendly interface to shooting yourself in the foot. Depending on your (unstated) platform and configuration, VIM may be leaving .bak files around, which you can use, or you may have the commands echoed into an output scriptfile, from which you can extract them. Otherwise, I suspect that you are out of luck. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] general question on binomial test / sign test
i test the null that the coin is fair (p(succ) = p(fail) = 0.5) with one trail and get a p-value of 1. actually i want to proof the alternative H that the estimate is different from 0.5, what certainly can not be aproven here. but in reverse the p-value of 1 says that i can 100% sure that the estimate of 0.5 is true (??) - that's the point that astonishes me. thanks if anybody could clarify this for me, kay Zitat von Greg Snow greg.s...@imail.org: Try thinking this one through from first principles, you are essentially saying that your null hypothesis is that you are flipping a fair coin and you want to do a 2-tailed test. You then flip the coin exactly once, what do you expect to happen? The p-value of 1 just means that what you saw was perfectly consistent with what is predicted to happen flipping a single time. Does that help? If not, please explain what you mean a little better. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Kay Cichini Sent: Wednesday, September 01, 2010 3:06 PM To: r-help@r-project.org Subject: [R] general question on binomial test / sign test hello, i did several binomial tests and noticed for one sparse dataset that binom.test(1,1,0.5) gives a p-value of 1 for the null, what i can't quite grasp. that would say that the a prob of 1/2 has p-value of 0 ?? - i must be wrong but can't figure out the right interpretation.. best, kay - Kay Cichini Postgraduate student Institute of Botany Univ. of Innsbruck -- View this message in context: http://r.789695.n4.nabble.com/general- question-on-binomial-test-sign-test-tp2419965p2419965.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R code changes to crap
On 02-Sep-10 07:16:54, khush wrote: Hi, I am using vim to edit my files...I do not know what has been pressed by me as my R code get converted to such a crap... %PDF-1.4 %81â81ã81Ï81Ó\r 1 0 obj /CreationDate (D:20100902122215) /ModDate (D:20100902122215) /Title (R Graphics Output) /Producer (R 2.11.1) /Creator (R) how to reconvert it to the my actual code?? as I do not have backup for it..? Thanks in advance Khushwant The content you have listed above is part of a header from a PDF file, apparently generated by R from a plot() command (R Graphics Output) presumably encapsulated between pdf(...) ... dev.off() commands. It is most unlikely that such a file would contain any R code (i.e. a list of C commands). Are you sure you are editing the correct file? What are you trying to do? In any case, there are almost no circumstances in which it is meaningful to apply a text editor (like 'vim') to a PDF file! Possibly, if you are using a Windows platform, you may have mixed up a text file with R commands (e.g. project1.R) with a PDF file generated by R using the same name (e.g. project1.pdf) because Windows (by default) does not show you the extension (respectively .R and .pdf). Please clarify! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 02-Sep-10 Time: 09:01:35 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 multiple group barchart
Dear Greg, First convert your data.frame to a long format using melt(). Then use ddply() to calculate the averages. Once you get at this point is should be rather straightforward. library (ggplot2) v1 - c(1,2,3,3,4) v2 - c(4,3,1,1,9) v3 - c(3,5,7,2,9) gender - c(m,f,m,f,f) d.data - data.frame (v1, v2, v3, gender) molten - melt(d.data) Average - ddply(molten, c(gender, variable), function(z){ c(Mean = mean(z$value))} ) ggplot (data=Average, aes(x = variable, y = Mean, fill = gender)) + geom_bar(position = position_dodge()) + coord_flip() + geom_text (position = position_dodge(0.5), aes(label=round(Mean, 1)), vjust=0.5, hjust=4,colour=white, size=7) ggplot (data=Average, aes(x = variable, y = Mean)) + geom_bar() + coord_flip() + geom_text(aes(label=round(Mean, 1)), vjust=0.5, hjust=4,colour=white, size=7) + facet_wrap(~gender) HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Waller Gregor (wall) Verzonden: woensdag 1 september 2010 17:16 Aan: r-help@r-project.org Onderwerp: [R] ggplot2 multiple group barchart hi there.. i got a problem with ggplot2. here my example: library (ggplot2) v1 - c(1,2,3,3,4) v2 - c(4,3,1,1,9) v3 - c(3,5,7,2,9) gender - c(m,f,m,f,f) d.data - data.frame (v1, v2, v3, gender) d.data x - names (d.data[1:3]) y - mean (d.data[1:3]) pl - ggplot (data=d.data, aes (x=x,y=y)) pl - pl + geom_bar() pl - pl + coord_flip() pl - pl + geom_text (aes(label=round(y,1)),vjust=0.5, hjust=4,colour=white, size=7) pl this gives me a nice barchart to compare the means of my variables v1,v2 and v3. my question: how do i have to proceed if i want this barchart splittet by the variable gender. so i get two small bars for v1, one for female and one for male, two bars for v2 etc. i need them all in one chart. fill=gender, position=dodge do not work... any ideas? thanks a lot greg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using library and lib.loc
Hi, I didn't find any post on this subject so I ll ask you some advices. Let's say that I have two library trees. Number 1 is the default R library tree on path1 Number 2 is another library tree on a server with all packages on path2. When I set library(aaMI,lib.loc=paths2) it loads the package even if its not on default R library When I set library(fOptions,lib.loc=paths2) it doesn't load because timeSeries is not on default R library (timeSeries is a required package for fOptions) library(fOptions,lib.loc=.lib.loc) Le chargement a nécessité le package : timeDate Le chargement a nécessité le package : timeSeries Erreur : le package 'timeSeries' ne peut être chargé De plus : Message d'avis : In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = lib.loc) :  aucun package nommé 'timeSeries' n'est trouvé (Sorry for french error message. By the way, how can I set error in French (setting language in English in R installation is not sufficient !) How can I set lib.loc for every package that I will load ? Or is there any global way of doing this ? Thanks, Olivier Une messagerie gratuite, garantie à vie et des services en plus, ça vous tente ? Je crée ma boîte mail www.laposte.net [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] general question on binomial test / sign test
You state: in reverse the p-value of 1 says that i can 100% sure that the estimate of 0.5 is true. This is where your logic about significance tests goes wrong. The general logic of a singificance test is that a test statistic (say T) is chosen such that large values represent a discrepancy between possible data and the hypothesis under test. When you have the data, T evaluates to a value (say t0). The null hypothesis (NH) implies a distribution for the statistic T if the NH is true. Then the value of Prob(T = t0 | NH) can be calculated. If this is small, then the probability of obtaining data at least as discrepant as the data you did obtain is small; if sufficiently small, then the conjunction of NH and your data (as assessed by the statistic T) is so unlikely that you can decide to not believe that it is possible. If you so decide, then you reject the NH because the data are so discrepant that you can't believe it! This is on the same lines as the reductio ad absurdum in classical logic: An hypothesis A implies that an outcome B must occur. But I have observed that B did not occur. Therefore A cannot be true. But it does not follow that, if you observe that B did occur (which is *consistent* with A), then A must be true. A could be false, yet B still occur -- the only basis on which occurrence of B could *prove* that A must be true is when you have the prior information that B will occur *if and only if* A is true. In the reductio ad absurdum, and in the parallel logic of significance testing, all you have is B will occur *if* A is true. The only if part is not there. So you cannot deduce that A is true from the observation that B occurred, since what you have to start with allows B to occur if A is false (i.e. B will occur *if* A is true says nothing about what may or may not happen if A is false). So, in your single toss of a coin, it is true that I will observe either 'succ' or 'fail' if the coin is fair. But (as in the above) you cannot deduce that the coin is fair if you observe either 'succ' or 'fail', since it is possible (indeed necessary) that you obtain such an observation if the coin is not fair (even if the coin is the same, either 'succ' or 'fail', on both sides, therefore completely unfair). This is an attempt to expand Greg Snow's reply! Your 2-sided test takes the form T=1 if either outcome='succ' or outcome='fail'. And that is the only possible value for T since no other outcome is possible. Hence Prob(T==1) = 1 whether the coin is fair or not. It is not possible for such data to discriminate between a fair and an unfair coin. And, as explained above, a P-value of 1 cannot prove that the null hypothesis is true. All that is possible with a significance test is that a small P-value can be taken as evidence that the NH is false. Hoping this helps! Ted. On 02-Sep-10 07:41:17, Kay Cecil Cichini wrote: i test the null that the coin is fair (p(succ) = p(fail) = 0.5) with one trail and get a p-value of 1. actually i want to proof the alternative H that the estimate is different from 0.5, what certainly can not be aproven here. but in reverse the p-value of 1 says that i can 100% sure that the estimate of 0.5 is true (??) - that's the point that astonishes me. thanks if anybody could clarify this for me, kay Zitat von Greg Snow greg.s...@imail.org: Try thinking this one through from first principles, you are essentially saying that your null hypothesis is that you are flipping a fair coin and you want to do a 2-tailed test. You then flip the coin exactly once, what do you expect to happen? The p-value of 1 just means that what you saw was perfectly consistent with what is predicted to happen flipping a single time. Does that help? If not, please explain what you mean a little better. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Kay Cichini Sent: Wednesday, September 01, 2010 3:06 PM To: r-help@r-project.org Subject: [R] general question on binomial test / sign test hello, i did several binomial tests and noticed for one sparse dataset that binom.test(1,1,0.5) gives a p-value of 1 for the null, what i can't quite grasp. that would say that the a prob of 1/2 has p-value of 0 ?? - i must be wrong but can't figure out the right interpretation.. best, kay - Kay Cichini Postgraduate student Institute of Botany Univ. of Innsbruck -- View this message in context: http://r.789695.n4.nabble.com/general- question-on-binomial-test-sign-test-tp2419965p2419965.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting
[R] Is there any package or function perform stepwise variable selection under GEE method?
Hi , I use library(gee),library(geepack),library(yags) perform GEE data analysis , but all of them cannot do variable selection! Both step and stepAIC can do variable selection based on AIC criterion under linear regression and glm, but they cannot work when model is based on GEE. I want to ask whether any variable selection function or package under GEE model avaliable now? Thanks! Best, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape to wide format takes extremely long
Hi: I did the following test using function ddply() in the plyr package on a toy data frame with 5 observations using five studies, 20 subjects per study, 25 cycles per subject, five days per cycle and four observations by type per day. No date-time variable was included. # Test data frame big - data.frame(study = factor(rep(1:5, each = 1)), subject = factor(rep(1:100, each = 500)), cycle = rep(rep(1:25, each = 20), 100), day = rep(rep(1:5, each = 4), 500), type = rep(c('ALB', 'ALP', 'ALT', 'AST'), 12500), obs = rpois(5, 70) ) dim(big) [1] 5 6 # 64-bit R on a Windows 7 box with 8Gb RAM and a 2.93GHz Core Duo chip. system.time(bigtab - ddply(big, .(study, subject, cycle, day), function(x) xtabs(obs ~ type, data = x))) user system elapsed 30.220.02 30.60 dim(bigtab) [1] 12500 8 head(bigtab) study subject cycle day ALB ALP ALT AST 1 1 1 1 1 77 80 67 70 2 1 1 1 2 60 54 70 70 3 1 1 1 3 71 77 69 65 4 1 1 1 4 62 71 73 68 5 1 1 1 5 78 67 69 78 6 1 1 2 1 71 69 74 69 tail(bigtab) study subject cycle day ALB ALP ALT AST 12495 5 10024 5 75 83 72 70 12496 5 10025 1 85 52 62 70 12497 5 10025 2 79 64 84 68 12498 5 10025 3 67 65 74 81 12499 5 10025 4 62 86 66 80 12500 5 10025 5 58 76 85 84 There may be an easier/more efficient way to do this with melt() and cast() in the reshape package, but moved on when I couldn't figure it out within ten minutes (probably because I was thinking 'xtabs of obs by type for study/subject/cycle/day combinations - that's the ticket!' :) Packages sqldf and data.table are other viable options for this sort of task, and now that there is a test data set to play with, it would be interesting to see what else can be done. I'd be surprised if this couldn't be done within a few seconds because the data frame is not that large. Anyway, HTH, Dennis On Thu, Sep 2, 2010 at 12:24 AM, Coen van Hasselt coenvanhass...@gmail.comwrote: Hello, I have a data.frame with the following format: head(clin2) Study Subject Type Obs Cycle Day Date Time 1 A001101 10108 ALB 44.098 1 2004-03-11 14:26 2 A001101 10108 ALP 95.098 1 2004-03-11 14:26 3 A001101 10108 ALT 61.098 1 2004-03-11 14:26 5 A001101 10108 AST 33.098 1 2004-03-11 14:26 I want to transform this data.frame so that I have Obs columns for each Type. The full dataset is 45000 rows long. For a short subset of 100 rows, reshaping takes 0.2 seconds, and produces what I want. All columns are either numeric or character format (incl. date/time). reshape(clin2, v.names=Obs, timevar=Type, direction=wide,idvar=c(Study,Subject,Cycle,Day,Date,Time),) Study Subject Cycle Day Date Time Obs.ALB Obs.ALP Obs.ALT Obs.AST 1 A001101 1010898 1 2004-03-11 14:26 44 95 61 33 11 A001101 10108 1 1 2004-03-12 14:01 41 85 39 33 21 A001101 10108 1 8 2004-03-22 10:34 40 90 70 34 30 A001101 10108 1 15 2004-03-29 09:56 45 97 66 48 [] However, when using the same reshape command for the full data.frame of 45000 rows, it still wasn't finished when run overnight (8 GB RAM + 8 GB swap in use). The time to process this data.frame from a 100-row subset to a 1000-row subset increases from 0.2 sec to 60 sec. I would greatly appreciate a advice why the time for reshaping is increasing exponentially with the nr. of rows, and how I can do this in an elegant way. Thanks! Coen. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to represent error bar in a plot legend?
On 09/02/2010 04:56 AM, josef.kar...@phila.gov wrote: I have a simple barplot of 4 mean values, each mean value has an associated 95% confidence interval drawn on the plot as an error bar. I want to make a legend on the plot that uses the error bar symbol, and explains 95% C.I. How do I show the error bar symbol in the legend? I could not find any pch values that are appropriate Hi Josef, There is a way to do this, but it is not too easy. First, use the my.symbols function in the TeachingDemos package to define the error bar symbol. Then, you can draw a legend without one or more symbol elements and place your error bar symbol where the symbol would have been. You can see how to do this in the code of the legendg function in the plotrix package by getting the necessary coordinates from the legend function. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape to wide format takes extremely long
Dear Dennis, cast() is in this case much faster. system.time(bigtab - ddply(big, .(study, subject, cycle, day), function(x) xtabs(obs ~ type, data = x))) user system elapsed 35.360.12 35.53 system.time(bigtab2 - cast(data = big, study + subject + cycle + day ~type, value = obs, fun = mean)) user system elapsed 4.090.004.09 I have the feeling that ddply() has a lot of overhead when the number of levels is large. HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Dennis Murphy Verzonden: donderdag 2 september 2010 11:34 Aan: Coen van Hasselt CC: r-help@r-project.org Onderwerp: Re: [R] reshape to wide format takes extremely long Hi: I did the following test using function ddply() in the plyr package on a toy data frame with 5 observations using five studies, 20 subjects per study, 25 cycles per subject, five days per cycle and four observations by type per day. No date-time variable was included. # Test data frame big - data.frame(study = factor(rep(1:5, each = 1)), subject = factor(rep(1:100, each = 500)), cycle = rep(rep(1:25, each = 20), 100), day = rep(rep(1:5, each = 4), 500), type = rep(c('ALB', 'ALP', 'ALT', 'AST'), 12500), obs = rpois(5, 70) ) dim(big) [1] 5 6 # 64-bit R on a Windows 7 box with 8Gb RAM and a 2.93GHz Core Duo chip. system.time(bigtab - ddply(big, .(study, subject, cycle, day), function(x) xtabs(obs ~ type, data = x))) user system elapsed 30.220.02 30.60 dim(bigtab) [1] 12500 8 head(bigtab) study subject cycle day ALB ALP ALT AST 1 1 1 1 1 77 80 67 70 2 1 1 1 2 60 54 70 70 3 1 1 1 3 71 77 69 65 4 1 1 1 4 62 71 73 68 5 1 1 1 5 78 67 69 78 6 1 1 2 1 71 69 74 69 tail(bigtab) study subject cycle day ALB ALP ALT AST 12495 5 10024 5 75 83 72 70 12496 5 10025 1 85 52 62 70 12497 5 10025 2 79 64 84 68 12498 5 10025 3 67 65 74 81 12499 5 10025 4 62 86 66 80 12500 5 10025 5 58 76 85 84 There may be an easier/more efficient way to do this with melt() and cast() in the reshape package, but moved on when I couldn't figure it out within ten minutes (probably because I was thinking 'xtabs of obs by type for study/subject/cycle/day combinations - that's the ticket!' :) Packages sqldf and data.table are other viable options for this sort of task, and now that there is a test data set to play with, it would be interesting to see what else can be done. I'd be surprised if this couldn't be done within a few seconds because the data frame is not that large. Anyway, HTH, Dennis On Thu, Sep 2, 2010 at 12:24 AM, Coen van Hasselt coenvanhass...@gmail.comwrote: Hello, I have a data.frame with the following format: head(clin2) Study Subject Type Obs Cycle Day Date Time 1 A001101 10108 ALB 44.098 1 2004-03-11 14:26 2 A001101 10108 ALP 95.098 1 2004-03-11 14:26 3 A001101 10108 ALT 61.098 1 2004-03-11 14:26 5 A001101 10108 AST 33.098 1 2004-03-11 14:26 I want to transform this data.frame so that I have Obs columns for each Type. The full dataset is 45000 rows long. For a short subset of 100 rows, reshaping takes 0.2 seconds, and produces what I want. All columns are either numeric or character format (incl. date/time). reshape(clin2, v.names=Obs, timevar=Type, direction=wide,idvar=c(Study,Subject,Cycle,Day,Date ,Time),) Study Subject Cycle Day Date Time Obs.ALB Obs.ALP Obs.ALT Obs.AST 1 A001101 1010898 1 2004-03-11 14:26 44 95 61 33 11 A001101 10108 1 1 2004-03-12 14:01 41 85 39 33 21 A001101 10108 1 8 2004-03-22 10:34 40 90
Re: [R] R CMD check Package(Windows): Error in inDL(x, as.logical(local), as.logical(now), ...) :
On 02/09/2010 2:29 AM, raje...@cse.iitm.ac.in wrote: Hi, I've built my own package in windows and when I run R CMD check Package-Name I get, * install options are ' --no-html' * installing *source* package 'AceTest' ... ** libs making DLL ... g++ ...etc. installing to PATH ... done ** R ** preparing package for lazy loading ** help Warning: ./man/AceTest-package.Rd:34: All text must be in a section Warning: ./man/AceTest-package.Rd:35: All text must be in a section *** installing help indices ** building package indices ... ** testing if installed package can be loaded Error in inDL(x, as.logical(local), as.logical(now), ...) : unable to load shared library 'H:/RTick/Project/Client/AceTest.Rcheck/AceTest/libs/AceTest.dll': LoadLibrary failure: The specified module could not be found. ERROR: loading failed The message The specified module could not be found comes from Windows. It probably means that your dll depends on some other dll and the other one is unavailable, but it might mean that AceTest.dll itself can't be loaded (permission problems, defective dll, etc.) In some cases Windows will pop up a dialog box with more information than that skimpy message, e.g. if you install the package and try to run library(AceTest) from within Rgui. Duncan Murdoch * removing 'H:/RTick/Project/Client/AceTest.Rcheck/AceTest' can someone point out what went wrong? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape to wide format takes extremely long
picking up on Thierry's example, I don't think you need any function because you are just reshaping (not aggregating). Therefore: bigtab2 - cast(data = big, study + subject + cycle + day ~type, value = obs) head(bigtab2) study subject cycle day ALB ALP ALT AST 1 1 1 1 1 66 71 83 76 2 1 1 1 2 66 87 78 58 3 1 1 1 3 72 84 78 61 4 1 1 1 4 72 63 68 69 5 1 1 1 5 64 68 89 89 6 1 1 2 1 78 65 65 76 system.time(bigtab2 - cast(data = big, study + subject + cycle + day ~type, value = obs)) user system elapsed 0.760 0.000 0.782 david -- View this message in context: http://r.789695.n4.nabble.com/reshape-to-wide-format-takes-extremely-long-tp2487153p2506575.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R graphics: Add labels to stacked bar chart
Hi, I am looking for a way to add labels, i.e. absolute values, into a stacked bar chart using the basic plot functions of R. The labels should be inside the stacked bars. For example, ### I have this dataset height = cbind(x = c(465, 91) / 465 * 100, y = c(840, 200) / 840 * 100, z = c(37, 17) / 37 * 100) ### and use this plot barplot(height, beside = FALSE, horiz=T, col = c(2, 3) ) how can I put the values from height within the respective stacked bars? Thank you! -- + Dipl.Biol. Jens Oldeland Biodiversity of Plants Biocentre Klein Flottbek and Botanical Garden University of Hamburg Ohnhorststr. 18 22609 Hamburg, Germany Tel:0049-(0)40-42816-407 Fax:0049-(0)40-42816-543 Mail: oldel...@botanik.uni-hamburg.de oldel...@gmx.de (for attachments 2mb!!) Skype: jens.oldeland http://www.biologie.uni-hamburg.de/bzf/fbda005/fbda005.htm http://jensoldeland.wordpress.com + __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R CMD check Package(Windows): Error in inDL(x, as.logical(local), as.logical(now), ...) :
On 02/09/2010 6:46 AM, raje...@cse.iitm.ac.in wrote: It is dependent on another dll but it did not give compilation errors. It seemed to link fine at that point. Why does it have a problem at this stage? Windows needs to be able to find the other DLL at load time. It will find it if it's in the same directory or on the PATH. Duncan Murdoch From: Duncan Murdoch murdoch.dun...@gmail.com To: raje...@cse.iitm.ac.in Cc: r-help r-help@r-project.org Sent: Thursday, September 2, 2010 4:05:14 PM Subject: Re: [R] R CMD check Package(Windows): Error in inDL(x, as.logical(local), as.logical(now), ...) : On 02/09/2010 2:29 AM, raje...@cse.iitm.ac.in wrote: Hi, I've built my own package in windows and when I run R CMD check Package-Name I get, * install options are ' --no-html' * installing *source* package 'AceTest' ... ** libs making DLL ... g++ ...etc. installing to PATH ... done ** R ** preparing package for lazy loading ** help Warning: ./man/AceTest-package.Rd:34: All text must be in a section Warning: ./man/AceTest-package.Rd:35: All text must be in a section *** installing help indices ** building package indices ... ** testing if installed package can be loaded Error in inDL(x, as.logical(local), as.logical(now), ...) : unable to load shared library 'H:/RTick/Project/Client/AceTest.Rcheck/AceTest/libs/AceTest.dll': LoadLibrary failure: The specified module could not be found. ERROR: loading failed The message The specified module could not be found comes from Windows. It probably means that your dll depends on some other dll and the other one is unavailable, but it might mean that AceTest.dll itself can't be loaded (permission problems, defective dll, etc.) In some cases Windows will pop up a dialog box with more information than that skimpy message, e.g. if you install the package and try to run library(AceTest) from within Rgui. Duncan Murdoch * removing 'H:/RTick/Project/Client/AceTest.Rcheck/AceTest' can someone point out what went wrong? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] draw a graph of the semi-partial R-squared /CAH
Dear r-help, I am using CAH. I would cut my dendogram. What is the command in R that allows draw a graph of the semi-partial R-squared ? Best Regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R CMD check Package(Windows): Error in inDL(x, as.logical(local), as.logical(now), ...) :
It is dependent on another dll but it did not give compilation errors. It seemed to link fine at that point. Why does it have a problem at this stage? From: Duncan Murdoch murdoch.dun...@gmail.com To: raje...@cse.iitm.ac.in Cc: r-help r-help@r-project.org Sent: Thursday, September 2, 2010 4:05:14 PM Subject: Re: [R] R CMD check Package(Windows): Error in inDL(x, as.logical(local), as.logical(now), ...) : On 02/09/2010 2:29 AM, raje...@cse.iitm.ac.in wrote: Hi, I've built my own package in windows and when I run R CMD check Package-Name I get, * install options are ' --no-html' * installing *source* package 'AceTest' ... ** libs making DLL ... g++ ...etc. installing to PATH ... done ** R ** preparing package for lazy loading ** help Warning: ./man/AceTest-package.Rd:34: All text must be in a section Warning: ./man/AceTest-package.Rd:35: All text must be in a section *** installing help indices ** building package indices ... ** testing if installed package can be loaded Error in inDL(x, as.logical(local), as.logical(now), ...) : unable to load shared library 'H:/RTick/Project/Client/AceTest.Rcheck/AceTest/libs/AceTest.dll': LoadLibrary failure: The specified module could not be found. ERROR: loading failed The message The specified module could not be found comes from Windows. It probably means that your dll depends on some other dll and the other one is unavailable, but it might mean that AceTest.dll itself can't be loaded (permission problems, defective dll, etc.) In some cases Windows will pop up a dialog box with more information than that skimpy message, e.g. if you install the package and try to run library(AceTest) from within Rgui. Duncan Murdoch * removing 'H:/RTick/Project/Client/AceTest.Rcheck/AceTest' can someone point out what went wrong? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error: could not find function ad.test
Hi, I'm trying to run an anderson-darling test for normality on a given variable 'Y': ad.test(Y) I think I need the 'nortest' package, but since it does not appear in any of the Ubuntu repositories for 2.10.1, I am wondering if it goes by the name of something else now? Thanks -- View this message in context: http://r.789695.n4.nabble.com/Error-could-not-find-function-ad-test-tp2501293p2501293.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R graphics: Add labels to stacked bar chart
On 09/02/2010 08:50 PM, Jens Oldeland wrote: ... I am looking for a way to add labels, i.e. absolute values, into a stacked bar chart using the basic plot functions of R. The labels should be inside the stacked bars. barpos-barplot(height,beside = FALSE, horiz=TRUE,col = c(2, 3)) library(plotrix) boxed.labels(c(height[1,]/2,height[1,]+height[2,]/2), rep(barpos,2),c(height[1,],round(height[2,],1))) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nlme formula from model specification
Dear R-community, I'm analysing some noise using the nlme-package. I'm writing in order to get my usage of lme verified. In practise, a number of samples have been processed by a machine measuring the same signal at four different channels. I want to model the noise. I have taken the noise (the signal is from position 1 to 3500, and after that there is only noise). My data looks like this: channel.matrix: pos channel0 channel1 channel2 channel3 samplenumber 1 350183 1211 2 3502370 141 3 350391 1331 4 3504373 141 5 350565451 6 350670 1601 ... 495 399552991 496 3996246 101 497 399732771 498 399824391 499 3999316 111 500 400003671 2301 350114392 2302 3502334 132 2303 350341852 2304 350431 1022 2305 350523582 2306 350605822 ... The model is channel0 ~ alpha_i + eps_{i, j} + channel1 + channel2 + channel3 where i is sample number, j is position, and: alpha_i: fixed effect for each samplenumber eps_{i, j}: random effect, here with correlation structure as AR(1) channel1, ..., channel3: fixed effect for each channel not depending on samplenumber nor position (And then afterwards I would model channel1 ~ ... + channel2 + channel3 etc.) I then use this function call: channel.matrix.grouped - groupedData(channel0 ~ pos | samplenumber, data = channel.matrix) fit - lme(channel0 ~ pos + samplenumber + channel1 + channel2 + channel3, random = ~ pos | samplenumber, correlation = corAR1(value = 0.5, form = ~ pos | samplenumber), data = channel.matrix.grouped) Is that the right way to express the model in (n)lme-notation? Cheers, Mikkel. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape to wide format takes extremely long
Hi: Thanks, David and Thierry. I knew what I did was inefficient, but I'm not very adept with cast() yet. Thanks for the lesson! The time is less than half without the fun = mean statement, too. On my system, the timing of David's call was 2.06 s elapsed; with Thierry's, it was 4.88 s. Both big improvements over my band-aid attempt. Regards, Dennis On Thu, Sep 2, 2010 at 3:36 AM, carslaw david.cars...@kcl.ac.uk wrote: picking up on Thierry's example, I don't think you need any function because you are just reshaping (not aggregating). Therefore: bigtab2 - cast(data = big, study + subject + cycle + day ~type, value = obs) head(bigtab2) study subject cycle day ALB ALP ALT AST 1 1 1 1 1 66 71 83 76 2 1 1 1 2 66 87 78 58 3 1 1 1 3 72 84 78 61 4 1 1 1 4 72 63 68 69 5 1 1 1 5 64 68 89 89 6 1 1 2 1 78 65 65 76 system.time(bigtab2 - cast(data = big, study + subject + cycle + day ~type, value = obs)) user system elapsed 0.760 0.000 0.782 david -- View this message in context: http://r.789695.n4.nabble.com/reshape-to-wide-format-takes-extremely-long-tp2487153p2506575.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error: could not find function ad.test
Hi: One option to search for functions in R is to download the sos package: library(sos) findFn('Anderson-Darling') On my system (Win 7), it found 51 matches. The two likeliest packages are nortest and ADGofTest. To answer your question, nortest still exists on CRAN. I can't comment on Ubuntu, but it's possible you may have to install it manually from the gzip sources. I'm sure that others with hands-on Ubuntu experience can provide a more complete answer. HTH, Dennis On Thu, Sep 2, 2010 at 2:42 AM, DrCJones matthias.godd...@gmail.com wrote: Hi, I'm trying to run an anderson-darling test for normality on a given variable 'Y': ad.test(Y) I think I need the 'nortest' package, but since it does not appear in any of the Ubuntu repositories for 2.10.1, I am wondering if it goes by the name of something else now? Thanks -- View this message in context: http://r.789695.n4.nabble.com/Error-could-not-find-function-ad-test-tp2501293p2501293.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nlme formula from model specification
Dear Mikkel, You need to do some reading on terminology. In your model the fixed effects are channel 1, 2 and 3. samplenumber is a random effect and the error term is an error term The model you described has the notation below. You do not need to create the grouped data structure. lme(channel0 ~ pos + samplenumber + channel1 + channel2 + channel3, random = ~ 1 | samplenumber, correlation = corAR1(value = 0.5, form = ~ pos | samplenumber), data = channel.matrix) HTH, Thierry PS There is a dedicated mailing list for mixed models: R-sig-mixed-models ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Mikkel Meyer Andersen Verzonden: donderdag 2 september 2010 13:30 Aan: r-help@r-project.org Onderwerp: [R] nlme formula from model specification Dear R-community, I'm analysing some noise using the nlme-package. I'm writing in order to get my usage of lme verified. In practise, a number of samples have been processed by a machine measuring the same signal at four different channels. I want to model the noise. I have taken the noise (the signal is from position 1 to 3500, and after that there is only noise). My data looks like this: channel.matrix: pos channel0 channel1 channel2 channel3 samplenumber 1 350183 1211 2 3502370 141 3 350391 1331 4 3504373 141 5 350565451 6 350670 1601 ... 495 399552991 496 3996246 101 497 399732771 498 399824391 499 3999316 111 500 400003671 2301 350114392 2302 3502334 132 2303 350341852 2304 350431 1022 2305 350523582 2306 350605822 ... The model is channel0 ~ alpha_i + eps_{i, j} + channel1 + channel2 + channel3 where i is sample number, j is position, and: alpha_i: fixed effect for each samplenumber eps_{i, j}: random effect, here with correlation structure as AR(1) channel1, ..., channel3: fixed effect for each channel not depending on samplenumber nor position (And then afterwards I would model channel1 ~ ... + channel2 + channel3 etc.) I then use this function call: channel.matrix.grouped - groupedData(channel0 ~ pos | samplenumber, data = channel.matrix) fit - lme(channel0 ~ pos + samplenumber + channel1 + channel2 + channel3, random = ~ pos | samplenumber, correlation = corAR1(value = 0.5, form = ~ pos | samplenumber), data = channel.matrix.grouped) Is that the right way to express the model in (n)lme-notation? Cheers, Mikkel. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __
[R] how to cluster vectors of factors
Hello all I wonder what can i use to cluster vectors which composed of several factors. lets say around 30 different factors compose a vector, and if the factor is present then it encoded as 1, if not presented then it will be encoded as 0. I was thinking of using hierarchical clustering, as i know the distance between two vector were calculated through euclidean distance function, but i dont think this distance would be correct to separate the data, cause two vector with different composition, could end up having similar distance to another vector. hope someone could give me some clue! -- View this message in context: http://r.789695.n4.nabble.com/how-to-cluster-vectors-of-factors-tp2514654p2514654.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there any package or function perform stepwise variable selection under GEE method?
The absence of stepwise methods works to your advantage, as these yield invalid statistical inference and inflated regression coefficients. Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Thu, 2 Sep 2010, 黃敏慈 wrote: Hi , I use library(gee),library(geepack),library(yags) perform GEE data analysis , but all of them cannot do variable selection! Both step and stepAIC can do variable selection based on AIC criterion under linear regression and glm, but they cannot work when model is based on GEE. I want to ask whether any variable selection function or package under GEE model avaliable now? Thanks! Best, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nlme formula from model specification
Dear Thierry, Thanks for the quick answer. I'm moving this to r-sig-mixed-models (but also posting on r-help to notify). I reserved Mixed-effects models in S and S-PLUS by Pinheiro and Bates, New York : Springer, 2000. Do you know any other good references? Cheers, Mikkel. 2010/9/2 ONKELINX, Thierry thierry.onkel...@inbo.be: Dear Mikkel, You need to do some reading on terminology. In your model the fixed effects are channel 1, 2 and 3. samplenumber is a random effect and the error term is an error term The model you described has the notation below. You do not need to create the grouped data structure. lme(channel0 ~ pos + samplenumber + channel1 + channel2 + channel3, random = ~ 1 | samplenumber, correlation = corAR1(value = 0.5, form = ~ pos | samplenumber), data = channel.matrix) HTH, Thierry PS There is a dedicated mailing list for mixed models: R-sig-mixed-models ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Mikkel Meyer Andersen Verzonden: donderdag 2 september 2010 13:30 Aan: r-help@r-project.org Onderwerp: [R] nlme formula from model specification Dear R-community, I'm analysing some noise using the nlme-package. I'm writing in order to get my usage of lme verified. In practise, a number of samples have been processed by a machine measuring the same signal at four different channels. I want to model the noise. I have taken the noise (the signal is from position 1 to 3500, and after that there is only noise). My data looks like this: channel.matrix: pos channel0 channel1 channel2 channel3 samplenumber 1 3501 8 3 12 1 1 2 3502 3 7 0 14 1 3 3503 9 1 13 3 1 4 3504 3 7 3 14 1 5 3505 6 5 4 5 1 6 3506 7 0 16 0 1 ... 495 3995 5 2 9 9 1 496 3996 2 4 6 10 1 497 3997 3 2 7 7 1 498 3998 2 4 3 9 1 499 3999 3 1 6 11 1 500 4000 0 3 6 7 1 2301 3501 1 4 3 9 2 2302 3502 3 3 4 13 2 2303 3503 4 1 8 5 2 2304 3504 3 1 10 2 2 2305 3505 2 3 5 8 2 2306 3506 0 5 8 2 2 ... The model is channel0 ~ alpha_i + eps_{i, j} + channel1 + channel2 + channel3 where i is sample number, j is position, and: alpha_i: fixed effect for each samplenumber eps_{i, j}: random effect, here with correlation structure as AR(1) channel1, ..., channel3: fixed effect for each channel not depending on samplenumber nor position (And then afterwards I would model channel1 ~ ... + channel2 + channel3 etc.) I then use this function call: channel.matrix.grouped - groupedData(channel0 ~ pos | samplenumber, data = channel.matrix) fit - lme(channel0 ~ pos + samplenumber + channel1 + channel2 + channel3, random = ~ pos | samplenumber, correlation = corAR1(value = 0.5, form = ~ pos | samplenumber), data = channel.matrix.grouped) Is that the right way to express the model in (n)lme-notation? Cheers, Mikkel. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang
Re: [R] general question on binomial test / sign test
thanks a lot for the elaborations. your explanations clearly brought to me that either binom.test(1,1,0.5,two-sided) or binom.test(0,1,0.5) giving a p-value of 1 simply indicate i have abolutely no ensurance to reject H0. considering binom.test(0,1,0.5,alternative=greater) and binom.test(1,1,0.5,alternative=less) where i get a p-value of 1 and 0.5,respectively - am i right in stating that for the first estimate 0/1 i have no ensurance at all for rejection of H0 and for the second estimate = 1/1 i have same chance for beeing wrong in either rejecting or keeping H0. many thanks, kay Zitat von ted.hard...@manchester.ac.uk: You state: in reverse the p-value of 1 says that i can 100% sure that the estimate of 0.5 is true. This is where your logic about significance tests goes wrong. The general logic of a singificance test is that a test statistic (say T) is chosen such that large values represent a discrepancy between possible data and the hypothesis under test. When you have the data, T evaluates to a value (say t0). The null hypothesis (NH) implies a distribution for the statistic T if the NH is true. Then the value of Prob(T = t0 | NH) can be calculated. If this is small, then the probability of obtaining data at least as discrepant as the data you did obtain is small; if sufficiently small, then the conjunction of NH and your data (as assessed by the statistic T) is so unlikely that you can decide to not believe that it is possible. If you so decide, then you reject the NH because the data are so discrepant that you can't believe it! This is on the same lines as the reductio ad absurdum in classical logic: An hypothesis A implies that an outcome B must occur. But I have observed that B did not occur. Therefore A cannot be true. But it does not follow that, if you observe that B did occur (which is *consistent* with A), then A must be true. A could be false, yet B still occur -- the only basis on which occurrence of B could *prove* that A must be true is when you have the prior information that B will occur *if and only if* A is true. In the reductio ad absurdum, and in the parallel logic of significance testing, all you have is B will occur *if* A is true. The only if part is not there. So you cannot deduce that A is true from the observation that B occurred, since what you have to start with allows B to occur if A is false (i.e. B will occur *if* A is true says nothing about what may or may not happen if A is false). So, in your single toss of a coin, it is true that I will observe either 'succ' or 'fail' if the coin is fair. But (as in the above) you cannot deduce that the coin is fair if you observe either 'succ' or 'fail', since it is possible (indeed necessary) that you obtain such an observation if the coin is not fair (even if the coin is the same, either 'succ' or 'fail', on both sides, therefore completely unfair). This is an attempt to expand Greg Snow's reply! Your 2-sided test takes the form T=1 if either outcome='succ' or outcome='fail'. And that is the only possible value for T since no other outcome is possible. Hence Prob(T==1) = 1 whether the coin is fair or not. It is not possible for such data to discriminate between a fair and an unfair coin. And, as explained above, a P-value of 1 cannot prove that the null hypothesis is true. All that is possible with a significance test is that a small P-value can be taken as evidence that the NH is false. Hoping this helps! Ted. On 02-Sep-10 07:41:17, Kay Cecil Cichini wrote: i test the null that the coin is fair (p(succ) = p(fail) = 0.5) with one trail and get a p-value of 1. actually i want to proof the alternative H that the estimate is different from 0.5, what certainly can not be aproven here. but in reverse the p-value of 1 says that i can 100% sure that the estimate of 0.5 is true (??) - that's the point that astonishes me. thanks if anybody could clarify this for me, kay Zitat von Greg Snow greg.s...@imail.org: Try thinking this one through from first principles, you are essentially saying that your null hypothesis is that you are flipping a fair coin and you want to do a 2-tailed test. You then flip the coin exactly once, what do you expect to happen? The p-value of 1 just means that what you saw was perfectly consistent with what is predicted to happen flipping a single time. Does that help? If not, please explain what you mean a little better. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Kay Cichini Sent: Wednesday, September 01, 2010 3:06 PM To: r-help@r-project.org Subject: [R] general question on binomial test / sign test hello, i did several binomial tests and noticed for one sparse dataset that binom.test(1,1,0.5) gives a p-value of 1 for the null, what i can't quite grasp. that would say that the a
Re: [R] how to represent error bar in a plot legend?
I think I found a simple solution, although it requires some tinkering to find the x,y coordinates of the plot region to place the text... barplot ( ) text(x= 2.9, y = 0.43, srt=90, labels = H, cex = 1.5, col=blue) #srt rotates the H character, so that it resembles an error bar text(x=3.5, y=0.432, labels = 95% C.I., cex=1.1) rect (2.62,.41,3.9,.45)#draws box around text [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lower triangle of the correlation matrix with xtable
Dear all, mydata-data.frame(x1=c(1,4,6),x2=c(3,1,2),x3=c(2,1,3)) cor(mydata) x1 x2x3 x1 1.000 -0.5960396 0.3973597 x2 -0.5960396 1.000 0.500 x3 0.3973597 0.500 1.000 I wonder if it is possible to fill only lower triangle of this correlation matrix? Using 'dist' doesn't seem to be useful as it doesnt allow to convert this table with xtable. print(xtable(cor(mydata),digits=3)) Any ideas? Cheers, Olga __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kolmogorov Smirnov p-values
I was just testing out ks.test() y - runif(1, min=0, max=1) ks.test(y, runif, min=0, max=1, alternative=greater) One-sample Kolmogorov-Smirnov test data: y D^+ = 0.9761, p-value 2.2e-16 alternative hypothesis: the CDF of x lies above the null hypothesis It seems that everytime I run it, I get a highly significant p-value (for two sided or one-sided , exact=TRUE or FALSE). Can anybody tell me what is going on ? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lower triangle of the correlation matrix with xtable
Like this? mydata-data.frame(x1=c(1,4,6),x2=c(3,1,2),x3=c(2,1,3)) as.dist(cor(mydata)) x1 x2 x2 -0.5960396 x3 0.3973597 0.500 Sarah On Thu, Sep 2, 2010 at 9:51 AM, Olga Lyashevska o...@herenstraat.nl wrote: Dear all, mydata-data.frame(x1=c(1,4,6),x2=c(3,1,2),x3=c(2,1,3)) cor(mydata) x1 x2 x3 x1 1.000 -0.5960396 0.3973597 x2 -0.5960396 1.000 0.500 x3 0.3973597 0.500 1.000 I wonder if it is possible to fill only lower triangle of this correlation matrix? Using 'dist' doesn't seem to be useful as it doesnt allow to convert this table with xtable. print(xtable(cor(mydata),digits=3)) Any ideas? Cheers, Olga -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov Smirnov p-values
Hi, Are you sure you don't want to do ks.test(y, punif, min=0, max=1, alternative=greater) instead of what you tried? Alain On 02-Sep-10 15:52, Samsiddhi Bhattacharjee wrote: ks.test(y, runif, min=0, max=1, alternative=greater) -- Alain Guillet Statistician and Computer Scientist SMCS - IMMAQ - Université catholique de Louvain Bureau c.316 Voie du Roman Pays, 20 B-1348 Louvain-la-Neuve Belgium tel: +32 10 47 30 50 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lower triangle of the correlation matrix with xtable
try lower.tri and see ??lower.tri Steve Friedman Ph. D. Spatial Statistical Analyst Everglades and Dry Tortugas National Park 950 N Krome Ave (3rd Floor) Homestead, Florida 33034 steve_fried...@nps.gov Office (305) 224 - 4282 Fax (305) 224 - 4147 Olga Lyashevska o...@herenstraat .nl To Sent by: R mailing list r-help-boun...@r- r-help@r-project.org project.orgcc Subject 09/02/2010 09:51 [R] lower triangle of the AMcorrelation matrix with xtable Dear all, mydata-data.frame(x1=c(1,4,6),x2=c(3,1,2),x3=c(2,1,3)) cor(mydata) x1 x2x3 x1 1.000 -0.5960396 0.3973597 x2 -0.5960396 1.000 0.500 x3 0.3973597 0.500 1.000 I wonder if it is possible to fill only lower triangle of this correlation matrix? Using 'dist' doesn't seem to be useful as it doesnt allow to convert this table with xtable. print(xtable(cor(mydata),digits=3)) Any ideas? Cheers, Olga __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov Smirnov p-values
oops sorryreally careless. thanks ! On Thu, Sep 2, 2010 at 10:03 AM, Alain Guillet alain.guil...@uclouvain.be wrote: Hi, Are you sure you don't want to do ks.test(y, punif, min=0, max=1, alternative=greater) instead of what you tried? Alain On 02-Sep-10 15:52, Samsiddhi Bhattacharjee wrote: ks.test(y, runif, min=0, max=1, alternative=greater) -- Alain Guillet Statistician and Computer Scientist SMCS - IMMAQ - Université catholique de Louvain Bureau c.316 Voie du Roman Pays, 20 B-1348 Louvain-la-Neuve Belgium tel: +32 10 47 30 50 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lower triangle of the correlation matrix with xtable
if I try as.dist I get the following error: On Thu, 2010-09-02 at 09:57 -0400, Sarah Goslee wrote: mydata-data.frame(x1=c(1,4,6),x2=c(3,1,2),x3=c(2,1,3)) as.dist(cor(mydata)) x1 x2 x2 -0.5960396 x3 0.3973597 0.500 print(xtable(as.dist(cor(mydata)),digits=3)) Error in UseMethod(xtable) : no applicable method for 'xtable' applied to an object of class dist Jorge's solution with upper.tri works fine! Thanks everyone for your prompt response. Cheers Olga __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] general question on binomial test / sign test
..i'd like to add that i actually wanted to test the location of differences of paired samples coming from an non-normal asymetric distribution. the alternative hypothesis was that negative differences are more often than in 0.5 of all cases. thus i tested (x=nr.diff.under.0,n=all.diffs,0.5,alternative=greater). then this one of many tests for a sparse dataset came up where x=0, and n=1). there i thought the H0 is x is less than 0.5, and i then had my trouble interpreting the p-value of 1. best, kay Kay Cichini wrote: thanks a lot for the elaborations. your explanations clearly brought to me that either binom.test(1,1,0.5,two-sided) or binom.test(0,1,0.5) giving a p-value of 1 simply indicate i have abolutely no ensurance to reject H0. considering binom.test(0,1,0.5,alternative=greater) and binom.test(1,1,0.5,alternative=less) where i get a p-value of 1 and 0.5,respectively - am i right in stating that for the first estimate 0/1 i have no ensurance at all for rejection of H0 and for the second estimate = 1/1 i have same chance for beeing wrong in either rejecting or keeping H0. many thanks, kay Zitat von ted.hard...@manchester.ac.uk: You state: in reverse the p-value of 1 says that i can 100% sure that the estimate of 0.5 is true. This is where your logic about significance tests goes wrong. The general logic of a singificance test is that a test statistic (say T) is chosen such that large values represent a discrepancy between possible data and the hypothesis under test. When you have the data, T evaluates to a value (say t0). The null hypothesis (NH) implies a distribution for the statistic T if the NH is true. Then the value of Prob(T = t0 | NH) can be calculated. If this is small, then the probability of obtaining data at least as discrepant as the data you did obtain is small; if sufficiently small, then the conjunction of NH and your data (as assessed by the statistic T) is so unlikely that you can decide to not believe that it is possible. If you so decide, then you reject the NH because the data are so discrepant that you can't believe it! This is on the same lines as the reductio ad absurdum in classical logic: An hypothesis A implies that an outcome B must occur. But I have observed that B did not occur. Therefore A cannot be true. But it does not follow that, if you observe that B did occur (which is *consistent* with A), then A must be true. A could be false, yet B still occur -- the only basis on which occurrence of B could *prove* that A must be true is when you have the prior information that B will occur *if and only if* A is true. In the reductio ad absurdum, and in the parallel logic of significance testing, all you have is B will occur *if* A is true. The only if part is not there. So you cannot deduce that A is true from the observation that B occurred, since what you have to start with allows B to occur if A is false (i.e. B will occur *if* A is true says nothing about what may or may not happen if A is false). So, in your single toss of a coin, it is true that I will observe either 'succ' or 'fail' if the coin is fair. But (as in the above) you cannot deduce that the coin is fair if you observe either 'succ' or 'fail', since it is possible (indeed necessary) that you obtain such an observation if the coin is not fair (even if the coin is the same, either 'succ' or 'fail', on both sides, therefore completely unfair). This is an attempt to expand Greg Snow's reply! Your 2-sided test takes the form T=1 if either outcome='succ' or outcome='fail'. And that is the only possible value for T since no other outcome is possible. Hence Prob(T==1) = 1 whether the coin is fair or not. It is not possible for such data to discriminate between a fair and an unfair coin. And, as explained above, a P-value of 1 cannot prove that the null hypothesis is true. All that is possible with a significance test is that a small P-value can be taken as evidence that the NH is false. Hoping this helps! Ted. On 02-Sep-10 07:41:17, Kay Cecil Cichini wrote: i test the null that the coin is fair (p(succ) = p(fail) = 0.5) with one trail and get a p-value of 1. actually i want to proof the alternative H that the estimate is different from 0.5, what certainly can not be aproven here. but in reverse the p-value of 1 says that i can 100% sure that the estimate of 0.5 is true (??) - that's the point that astonishes me. thanks if anybody could clarify this for me, kay Zitat von Greg Snow greg.s...@imail.org: Try thinking this one through from first principles, you are essentially saying that your null hypothesis is that you are flipping a fair coin and you want to do a 2-tailed test. You then flip the coin exactly once, what do you expect to happen? The p-value of 1 just means that what you saw was perfectly consistent with what is predicted to happen flipping a single time. Does that help? If not,
[R] date
Hello all, I've 2 strings that representing the start and end values of a date and time. For example, time1 - c(21/04/2005,23/05/2005,11/04/2005) time2 - c(15/07/2009, 03/06/2008, 15/10/2005) as.difftime(time1,time2) Time differences in secs [1] NA NA NA attr(,tzone) [1] How can i calculate the difference between this 2 string? Regards, Dunia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] star models
Hi, Have you gotten help on this? Lexi -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of ma...@unizar.es Sent: Saturday, August 28, 2010 5:08 PM To: r-help@r-project.org Subject: [R] star models Hi, I am traying to implement an STAR model, but I have some problems. I am following the instruction of the model, that they are in: http://bm2.genes.nig.ac.jp/RGM2/R_current/library/tsDyn/man/star.html that they are from: http://bm2.genes.nig.ac.jp/RGM2/pkg.php?p=tsDyn The model is: star(x, m=2, noRegimes, d = 1, steps = d, series, rob = FALSE, mTh, thDelay, thVar, sig=0.05, trace=TRUE, control=list(), ...) I have implemented the example: mod.star - star(log10(lynx), m=2, mTh=c(0,1), control=list(maxit=3000)) mod.star However, when I put my variables from my data.frame, R does not find my variables, so, for this reason I have try to create a matrix with my data variables, and to implement the model with the matrix, but I can not create the matrix, either. Please, could someone help me to implement this model? Thanks. Mercedes. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. DISCLAIMER:\ Sample Disclaimer added in a VBScript.\ ...{{dropped:3}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing COXPH models, one with age as a continuous variable, one with age as a three-level factor
sorry to bump in late, but I am doing similar things now and was browsing. IMHO anova is not appropriate here. it applies when the richer model has p more variables than the simpler model. this is not the case here. the competing models use different variables. you are left with IC. by transforming a continuous variable into categorical you are smoothing, which is the idea of GAM. if you look at what is offered in GAMs you may find better approximations f(age) as well as tools for testing among different f(age) transformations. regards. S. -- View this message in context: http://r.789695.n4.nabble.com/how-to-get-design-matrix-tp891987p2524289.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] date
Hi Dunia, You need to convert the character strings to Dates. time1 - as.Date(c(21/04/2005,23/05/2005,11/04/2005), %d/%m/%Y) time2 - as.Date(c(15/07/2009, 03/06/2008, 15/10/2005), %d/%m/%Y) time2-time1 Best, Ista On Thu, Sep 2, 2010 at 10:32 AM, Dunia Scheid dunia.sch...@gmail.com wrote: Hello all, I've 2 strings that representing the start and end values of a date and time. For example, time1 - c(21/04/2005,23/05/2005,11/04/2005) time2 - c(15/07/2009, 03/06/2008, 15/10/2005) as.difftime(time1,time2) Time differences in secs [1] NA NA NA attr(,tzone) [1] How can i calculate the difference between this 2 string? Regards, Dunia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How using the weights argument in nls2?
Good morning gentlemen! How using a weighted model in nls2? Values with the nls are logical since values with nls2 are not. I believe that this discrepancy is due to I did not include the weights argument in nls2. Here's an example: MOISTURE - c(28.41640, 28.47340, 29.05821, 28.52201, 30.92055, 31.07901, 31.35840, 31.69617, 32.07168, 31.87296, 31.35525, 32.66118, 33.23385, 32.72256, 32.57929, 32.12674, 52.35225, 52.77275, 64.90770, 64.90770, 85.23800, 84.43300, 68.96560, 68.41395, 70.82880, 71.18400, 96.13240, 96.07920, 95.35160, 94.71660, 87.59190, 88.63250, 89.78760, 90.17820, 88.46160, 87.10860, 94.86660, 94.51830, 75.79000, 76.98780, 144.70950, 143.88950, 111.58620, 112.71510, 120.85300, 121.43100, 116.34840, 114.87420, 195.35040, 191.36040, 265.35220, 267.25450, 227.13700, 228.78000, 238.37120, 242.70700, 299.54890, 291.04110, 220.09920, 219.82650, 236.79150, 243.70710, 208.79880, 208.12260, 417.21420, 429.59480, 360.91080, 371.66400, 357.72520, 360.53640, 383.82600, 383.82600, 434.02700, 432.57500, 440.56260, 438.32340, 468.69600, 469.82140, 497.93680, 497.17010) YEARS - rep(c(86,109, 132, 158, 184, 220, 254, 276, 310, 337),c(8,8,8,8,8,8,8,8,8,8)) VARIANCE - c(2.0879048 , 2.0879048,2.0879048,2.0879048, 2.0879048, 2.0879048,2.0879048,2.0879048,0.3442724,0.3442724, 0.3442724,0.3442724,0.3442724,0.3442724, 0.3442724, 0.3442724, 151.9481710, 151.9481710, 151.9481710, 151.9481710, 151.9481710, 151.9481710, 151.9481710, 151.9481710, 115.3208995, 115.3208995, 115.3208995, 115.3208995, 115.3208995, 115.3208995, 115.3208995, 115.3208995, 51.9965027, 51.9965027, 51.9965027, 51.9965027, 51.9965027, 51.9965027, 51.9965027, 51.9965027, 180.0496045, 180.0496045, 180.0496045, 180.0496045, 180.0496045, 180.0496045, 180.0496045, 180.0496045, 791.3223240, 791.3223240, 791.3223240, 791.3223240, 791.3223240, 791.3223240, 791.3223240, 791.3223240, 1280.0179973, 1280.0179973, 1280.0179973, 1280.0179973, 1280.0179973, 1280.0179973, 1280.0179973, 1280.0179973, 728.9582154, 728.9582154, 728.9582154, 728.9582154, 728.9582154, 728.9582154, 728.9582154, 728.9582154, 752.4591144, 752.4591144, 752.4591144, 752.4591144, 752.4591144, 752.4591144, 752.4591144, 752.4591144) test - data.frame(YEARS,MOISTURE,VARIANCE) mod.nls - nls(MOISTURE ~ A/(1+B*exp(-k*YEARS)), data = test, weights = 1/VARIANCE, start = list(A=1500, B=200, k=0.03345), control=list(maxiter = 500), trace=TRUE) summary(mod.nls) Following the example of pdf! st1 - expand.grid(A = seq(0, 2000, len = 10), B = seq(0, 500, len = 10), k = seq(-1, 10, len = 10)) st1 mod.nls2 -nls2(MOISTURE ~ A/(1+B*exp(-k*YEARS)), data = test, start = st1, algorithm=brute-force) mod.nls2 I appreciate everyone's attention. -- View this message in context: http://r.789695.n4.nabble.com/How-using-the-weights-argument-in-nls2-tp2524328p2524328.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help on glm and optim
Dear all, I'm trying to use the optim function to replicate the results from the glm using an example from the help page of glm, but I could not get the optim function to work. Would you please point out where I did wrong? Thanks a lot. The following is the code: # Step 1: fit the glm clotting - data.frame( u = c(5,10,15,20,30,40,60,80,100), lot1 = c(118,58,42,35,27,25,21,19,18), lot2 = c(69,35,26,21,18,16,13,12,12)) fit1 - glm(lot1 ~ log(u), data=clotting, family=Gamma) # Step 2: use optim # define loglikelihood function to be maximized over # theta is a vector of three parameters: intercept, cofficient for log(u) and dispersion parameter loglik - function(theta,data){ E - 1/(theta[1]+theta[2]*log(data$u)) V - theta[3]*E^2 loglik - sum(dgamma(data$lot1,shape=1/theta[3],rate=1/(E*theta[3]),log=T)) return(loglik) } # use the glm result as initial values theta - c(as.vector(coef(fit1)),0.002446059) fit2 - optim(theta, loglik, clotting, gr = NULL, hessian = TRUE, control = list(fnscale = -1)) # Then I got the following error message: Error in optim(theta, loglik, clotting, gr = NULL, hessian = TRUE, control = list(fnscale = -1)) : non-finite finite-difference value [3] Wayne (Yanwei) Zhang Statistical Research CNA Phone: 312-822-6296 Email: yanwei.zh...@cna.commailto:yanwei.zh...@cna.com NOTICE: This e-mail message, including any attachments and appended messages, is for the sole use of the intended recipients and may contain confidential and legally privileged information. If you are not the intended recipient, any review, dissemination, distribution, copying, storage or other use of all or any portion of this message is strictly prohibited. If you received this message in error, please immediately notify the sender by reply e-mail and delete this message in its entirety. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Linear models (lme4) - basic question
Hi, Sorry for a basic questions on linear models. I am trying to adjust raw data for both fixed and mixed effects. The data that I output should account for these effects, so that I can use the adjusted data for further analysis. For example, if I have the blood sugar levels for 30 patients, and I know that 'weight' is a fixed effect and that 'height' is a random effect, what I'd want as output is blood sugar levels that have been adjusted for these effects. library(lme4) sugar - c(1:10,sample(1:100,20)) weight - 1:30 height - rep(sample(1:15,15,replace=F),2) lm.adj - lmer(sugar ~ weight + (1|height)) adjusted.sugar - fitted(lm.adj) = I'm not sure if I'm doing this right...? thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on glm and optim
On Thu, 2 Sep 2010, Zhang,Yanwei wrote: Dear all, I'm trying to use the optim function to replicate the results from the glm using an example from the help page of glm, but I could not get the optim function to work. Would you please point out where I did wrong? Thanks a lot. The following is the code: # Step 1: fit the glm clotting - data.frame( u = c(5,10,15,20,30,40,60,80,100), lot1 = c(118,58,42,35,27,25,21,19,18), lot2 = c(69,35,26,21,18,16,13,12,12)) fit1 - glm(lot1 ~ log(u), data=clotting, family=Gamma) # Step 2: use optim # define loglikelihood function to be maximized over # theta is a vector of three parameters: intercept, cofficient for log(u) and dispersion parameter loglik - function(theta,data){ E - 1/(theta[1]+theta[2]*log(data$u)) V - theta[3]*E^2 loglik - sum(dgamma(data$lot1,shape=1/theta[3],rate=1/(E*theta[3]),log=T)) return(loglik) } # use the glm result as initial values theta - c(as.vector(coef(fit1)),0.002446059) fit2 - optim(theta, loglik, clotting, gr = NULL, hessian = TRUE, control = list(fnscale = -1)) # Then I got the following error message: Error in optim(theta, loglik, clotting, gr = NULL, hessian = TRUE, control = list(fnscale = -1)) : non-finite finite-difference value [3] If you use trace(loglik, tracer=quote(print(theta))) to trace the inputs to loglik() you will find that it is being called with negative values of theta[3] to get finite differences. One fix is to reparametrize and use the log scale rather than the scale as a parameter. -thomas Thomas Lumley Professor of Biostatistics University of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on glm and optim
Thomas, Thanks a lot. This solves my problem. Wayne (Yanwei) Zhang Statistical Research CNA -Original Message- From: Thomas Lumley [mailto:tlum...@u.washington.edu] Sent: Thursday, September 02, 2010 11:24 AM To: Zhang,Yanwei Cc: r-help@r-project.org Subject: Re: [R] Help on glm and optim On Thu, 2 Sep 2010, Zhang,Yanwei wrote: Dear all, I'm trying to use the optim function to replicate the results from the glm using an example from the help page of glm, but I could not get the optim function to work. Would you please point out where I did wrong? Thanks a lot. The following is the code: # Step 1: fit the glm clotting - data.frame( u = c(5,10,15,20,30,40,60,80,100), lot1 = c(118,58,42,35,27,25,21,19,18), lot2 = c(69,35,26,21,18,16,13,12,12)) fit1 - glm(lot1 ~ log(u), data=clotting, family=Gamma) # Step 2: use optim # define loglikelihood function to be maximized over # theta is a vector of three parameters: intercept, cofficient for log(u) and dispersion parameter loglik - function(theta,data){ E - 1/(theta[1]+theta[2]*log(data$u)) V - theta[3]*E^2 loglik - sum(dgamma(data$lot1,shape=1/theta[3],rate=1/(E*theta[3]),log=T)) return(loglik) } # use the glm result as initial values theta - c(as.vector(coef(fit1)),0.002446059) fit2 - optim(theta, loglik, clotting, gr = NULL, hessian = TRUE, control = list(fnscale = -1)) # Then I got the following error message: Error in optim(theta, loglik, clotting, gr = NULL, hessian = TRUE, control = list(fnscale = -1)) : non-finite finite-difference value [3] If you use trace(loglik, tracer=quote(print(theta))) to trace the inputs to loglik() you will find that it is being called with negative values of theta[3] to get finite differences. One fix is to reparametrize and use the log scale rather than the scale as a parameter. -thomas Thomas Lumley Professor of Biostatistics University of Washington, Seattle NOTICE: This e-mail message, including any attachments and appended messages, is for the sole use of the intended recipients and may contain confidential and legally privileged information. If you are not the intended recipient, any review, dissemination, distribution, copying, storage or other use of all or any portion of this message is strictly prohibited. If you received this message in error, please immediately notify the sender by reply e-mail and delete this message in its entirety. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear models (lme4) - basic question
Why should height be a random effect? I suspect you may need a tutorial on what a random effect in a mixed model is. I see no obvious reason why one would cluster on height. Perhaps if you clarify, it may become obvious either what your concerns are (and that your model is correct) or that you misunderstand and need a tutorial on random effects in mixed models. Some kind soul on this list may provide that (not me, though) -- or you can talk with your local statistician. Cheers, Bert On Thu, Sep 2, 2010 at 9:22 AM, James Nead james_n...@yahoo.com wrote: Hi, Sorry for a basic questions on linear models. I am trying to adjust raw data for both fixed and mixed effects. The data that I output should account for these effects, so that I can use the adjusted data for further analysis. For example, if I have the blood sugar levels for 30 patients, and I know that 'weight' is a fixed effect and that 'height' is a random effect, what I'd want as output is blood sugar levels that have been adjusted for these effects. library(lme4) sugar - c(1:10,sample(1:100,20)) weight - 1:30 height - rep(sample(1:15,15,replace=F),2) lm.adj - lmer(sugar ~ weight + (1|height)) adjusted.sugar - fitted(lm.adj) = I'm not sure if I'm doing this right...? thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear models (lme4) - basic question
Hi Bert, Thanks for the reply. Height, was just as an example. Perhaps, I should have said 'x' instead of height. Essentially, what I want to do is adjust the data for known effects. After I've done this, I can conduct further analysis on the data (for example, if another variable 'z' has an effect on sugar). So, if I have to adjust the data for a fixed effect (weight) and random effect (height/x), then am I using the model correctly? Thanks again for the reply! From: Bert Gunter gunter.ber...@gene.com Cc: r-help@r-project.org Sent: Thu, September 2, 2010 12:46:13 PM Subject: Re: [R] Linear models (lme4) - basic question Why should height be a random effect? I suspect you may need a tutorial on what a random effect in a mixed model is. I see no obvious reason why one would cluster on height. Perhaps if you clarify, it may become obvious either what your concerns are (and that your model is correct) or that you misunderstand and need a tutorial on random effects in mixed models. Some kind soul on this list may provide that (not me, though) -- or you can talk with your local statistician. Cheers, Bert Hi, Sorry for a basic questions on linear models. I am trying to adjust raw data for both fixed and mixed effects. The data that I output should account for these effects, so that I can use the adjusted data for further analysis. For example, if I have the blood sugar levels for 30 patients, and I know that 'weight' is a fixed effect and that 'height' is a random effect, what I'd want as output is blood sugar levels that have been adjusted for these effects. library(lme4) sugar - c(1:10,sample(1:100,20)) weight - 1:30 height - rep(sample(1:15,15,replace=F),2) lm.adj - lmer(sugar ~ weight + (1|height)) adjusted.sugar - fitted(lm.adj) = I'm not sure if I'm doing this right...? thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear models (lme4) - basic question
Sorry, forgot to mention that the processed data will be used as input for a classification algorithm. So, I need to adjust for known effects before I can use the data. thanks From: Bert Gunter gunter.ber...@gene.com Cc: r-help@r-project.org Sent: Thu, September 2, 2010 12:46:13 PM Subject: Re: [R] Linear models (lme4) - basic question Why should height be a random effect? I suspect you may need a tutorial on what a random effect in a mixed model is. I see no obvious reason why one would cluster on height. Perhaps if you clarify, it may become obvious either what your concerns are (and that your model is correct) or that you misunderstand and need a tutorial on random effects in mixed models. Some kind soul on this list may provide that (not me, though) -- or you can talk with your local statistician. Cheers, Bert Hi, Sorry for a basic questions on linear models. I am trying to adjust raw data for both fixed and mixed effects. The data that I output should account for these effects, so that I can use the adjusted data for further analysis. For example, if I have the blood sugar levels for 30 patients, and I know that 'weight' is a fixed effect and that 'height' is a random effect, what I'd want as output is blood sugar levels that have been adjusted for these effects. library(lme4) sugar - c(1:10,sample(1:100,20)) weight - 1:30 height - rep(sample(1:15,15,replace=F),2) lm.adj - lmer(sugar ~ weight + (1|height)) adjusted.sugar - fitted(lm.adj) = I'm not sure if I'm doing this right...? thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] general question on binomial test / sign test
Just to add to Ted's addition to my response. I think you are moving towards better understanding (and your misunderstandings are common), but to further clarify: First, make sure that you understand that the probability of A given B, p(A|B), is different from the probability of B given A, p(B|A). A simple example from probability class: I have 3 coins in my pocket, one has 2 heads, one has 2 tails, and the other is fair (p(H)=p(T)=0.5), I take one coin out at random (each has probability =1/3 of being chosen) and flip it, I observe Heads, what is the probability that it is the 2 headed coin? So here the probability of Heads given the coin is 2 headed is 1.0, but that does not mean that the probability that I chose the 2 headed coin is 1 given I saw heads (it is 2/3rds). The same applies with testing, the p-value is the probability of the data given that the null hypothesis is TRUE. Many people try to interpret it as the probability that the null hypothesis is true given the data, but that is not the case (not even for Bayesians unless they use a really weird prior). I think that you are starting to understand this part, high p-values mean that the data is consistent with the null, we cannot reject the null, but they do not prove the null. A great article for help in understanding p-values better is: Murdock, D, Tsai, Y, and Adcock, J (2008) _P-Values are Random Variables_. The American Statistician. (62) 242-245. Which talks about p-values as random variables. There are a couple of functions in the TeachingDemos package that implement some of the simulations discussed in the article, I would suggest playing with run.Pvalue.norm.sim and run.Pvalue.binom.sim. Notice that when the null hypothesis is true (and you have a big enough sample size for the binomial case) that the p-values follow a uniform distribution, the p-values 1.0, 0.5, 0.1, and 0.001 are all equally likely. As the difference between the null hypothesis and the truth increases you get more and more p-values close to 0. The real tricky bit about hypothesis testing is that we compute a single p-value, a single observation from a distribution, and based on that try to decide if the process that produced that observation is a uniform distribution or something else (that may be close to a uniform or very different). Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Kay Cecil Cichini Sent: Thursday, September 02, 2010 6:40 AM To: ted.hard...@manchester.ac.uk Cc: r-help@r-project.org Subject: Re: [R] general question on binomial test / sign test thanks a lot for the elaborations. your explanations clearly brought to me that either binom.test(1,1,0.5,two-sided) or binom.test(0,1,0.5) giving a p-value of 1 simply indicate i have abolutely no ensurance to reject H0. considering binom.test(0,1,0.5,alternative=greater) and binom.test(1,1,0.5,alternative=less) where i get a p-value of 1 and 0.5,respectively - am i right in stating that for the first estimate 0/1 i have no ensurance at all for rejection of H0 and for the second estimate = 1/1 i have same chance for beeing wrong in either rejecting or keeping H0. many thanks, kay Zitat von ted.hard...@manchester.ac.uk: You state: in reverse the p-value of 1 says that i can 100% sure that the estimate of 0.5 is true. This is where your logic about significance tests goes wrong. The general logic of a singificance test is that a test statistic (say T) is chosen such that large values represent a discrepancy between possible data and the hypothesis under test. When you have the data, T evaluates to a value (say t0). The null hypothesis (NH) implies a distribution for the statistic T if the NH is true. Then the value of Prob(T = t0 | NH) can be calculated. If this is small, then the probability of obtaining data at least as discrepant as the data you did obtain is small; if sufficiently small, then the conjunction of NH and your data (as assessed by the statistic T) is so unlikely that you can decide to not believe that it is possible. If you so decide, then you reject the NH because the data are so discrepant that you can't believe it! This is on the same lines as the reductio ad absurdum in classical logic: An hypothesis A implies that an outcome B must occur. But I have observed that B did not occur. Therefore A cannot be true. But it does not follow that, if you observe that B did occur (which is *consistent* with A), then A must be true. A could be false, yet B still occur -- the only basis on which occurrence of B could *prove* that A must be true is when you have the prior information that B will occur *if and only if* A is true. In the reductio ad absurdum, and
Re: [R] Linear models (lme4) - basic question
James Nead james_nead at yahoo.com writes: Sorry, forgot to mention that the processed data will be used as input for a classification algorithm. So, I need to adjust for known effects before I can use the data. I am trying to adjust raw data for both fixed and mixed effects. The data that I output should account for these effects, so that I can use the adjusted data for further analysis. For example, if I have the blood sugar levels for 30 patients, and I know that 'weight' is a fixed effect and that 'height' is a random effect, what I'd want as output is blood sugar levels that have been adjusted for these effects. What's not clear to me is what you mean by 'adjusted for'. fitted(lm.adj) will give predicted values based on the height and weight. I don't really know what the justification for/meaning of the adjustment is, so I don't know whether you want to predict on the basis of the heights, or whether you want to get a 'population-level' prediction, i.e. one with height effects set to zero. Maybe you want residuals(lm.adj) ...? I suggest that follow-ups go to r-sig-mixed-mod...@r-project.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] general question on binomial test / sign test
On Sep 2, 2010, at 2:01 PM, Greg Snow wrote: snipped much good material The real tricky bit about hypothesis testing is that we compute a single p-value, a single observation from a distribution, and based on that try to decide if the process that produced that observation is a uniform distribution or something else (that may be close to a uniform or very different). My friendly addition would be to point the OP in the direction of using qqplot() for the examination of distributional properties rather than doing any sort of hypothesis testing. There is a learning curve for using that tool, but it will pay off in the end. -- David. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Kay Cecil Cichini Sent: Thursday, September 02, 2010 6:40 AM To: ted.hard...@manchester.ac.uk Cc: r-help@r-project.org Subject: Re: [R] general question on binomial test / sign test thanks a lot for the elaborations. your explanations clearly brought to me that either binom.test(1,1,0.5,two-sided) or binom.test(0,1,0.5) giving a p-value of 1 simply indicate i have abolutely no ensurance to reject H0. considering binom.test(0,1,0.5,alternative=greater) and binom.test(1,1,0.5,alternative=less) where i get a p-value of 1 and 0.5,respectively - am i right in stating that for the first estimate 0/1 i have no ensurance at all for rejection of H0 and for the second estimate = 1/1 i have same chance for beeing wrong in either rejecting or keeping H0. many thanks, kay David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear models (lme4) - basic question
My apologies - I have made this more confusing than it needs to be. I had microarray gene expression data which I want to use for classification algorithms. However, I want to 'adjust' the data for all confounding factors (such as age, experiment number etc.), before I could use the data as input for the classification algorithms. Since the phenotype is known to be effected by age, I thought that this would be a fixed effect whereas something like 'beadchip' would be a random effect. Should I be looking at something else for this? From: Ben Bolker bbol...@gmail.com To: r-h...@stat.math.ethz.ch Sent: Thu, September 2, 2010 2:06:47 PM Subject: Re: [R] Linear models (lme4) - basic question James Nead james_nead at yahoo.com writes: Sorry, forgot to mention that the processed data will be used as input for a classification algorithm. So, I need to adjust for known effects before I can use the data. I am trying to adjust raw data for both fixed and mixed effects. The data that I output should account for these effects, so that I can use the adjusted data for further analysis. For example, if I have the blood sugar levels for 30 patients, and I know that 'weight' is a fixed effect and that 'height' is a random effect, what I'd want as output is blood sugar levels that have been adjusted for these effects. What's not clear to me is what you mean by 'adjusted for'. fitted(lm.adj) will give predicted values based on the height and weight. I don't really know what the justification for/meaning of the adjustment is, so I don't know whether you want to predict on the basis of the heights, or whether you want to get a 'population-level' prediction, i.e. one with height effects set to zero. Maybe you want residuals(lm.adj) ...? I suggest that follow-ups go to r-sig-mixed-mod...@r-project.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear models (lme4) - basic question
On 10-09-02 02:26 PM, James Nead wrote: My apologies - I have made this more confusing than it needs to be. I had microarray gene expression data which I want to use for classification algorithms. However, I want to 'adjust' the data for all confounding factors (such as age, experiment number etc.), before I could use the data as input for the classification algorithms. Since the phenotype is known to be effected by age, I thought that this would be a fixed effect whereas something like 'beadchip' would be a random effect. Should I be looking at something else for this? Sounds to me as though you should use residuals() rather than fitted() if you want to adjust for confounding factors. But since you've made up a nice small example, I think you should look at the results of fitted() and residuals() for your example and see if it's doing what you want. *From:* Ben Bolker bbol...@gmail.com *To:* r-h...@stat.math.ethz.ch *Sent:* Thu, September 2, 2010 2:06:47 PM *Subject:* Re: [R] Linear models (lme4) - basic question James Nead james_nead at yahoo.com http://yahoo.com writes: Sorry, forgot to mention that the processed data will be used as input for a classification algorithm. So, I need to adjust for known effects before I can use the data. I am trying to adjust raw data for both fixed and mixed effects. The data that I output should account for these effects, so that I can use the adjusted data for further analysis. For example, if I have the blood sugar levels for 30 patients, and I know that 'weight' is a fixed effect and that 'height' is a random effect, what I'd want as output is blood sugar levels that have been adjusted for these effects. What's not clear to me is what you mean by 'adjusted for'. fitted(lm.adj) will give predicted values based on the height and weight. I don't really know what the justification for/meaning of the adjustment is, so I don't know whether you want to predict on the basis of the heights, or whether you want to get a 'population-level' prediction, i.e. one with height effects set to zero. Maybe you want residuals(lm.adj) ...? I suggest that follow-ups go to r-sig-mixed-mod...@r-project.org mailto:r-sig-mixed-mod...@r-project.org __ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] general question on binomial test / sign test
David, The original poster was not looking at distributions and testing distributions, I referred to the distribution of the p-value to help them understand (in reference to the paper mentioned). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Thursday, September 02, 2010 12:12 PM To: Greg Snow Cc: Kay Cecil Cichini; ted.hard...@manchester.ac.uk; r-h...@r- project.org Subject: Re: [R] general question on binomial test / sign test On Sep 2, 2010, at 2:01 PM, Greg Snow wrote: snipped much good material The real tricky bit about hypothesis testing is that we compute a single p-value, a single observation from a distribution, and based on that try to decide if the process that produced that observation is a uniform distribution or something else (that may be close to a uniform or very different). My friendly addition would be to point the OP in the direction of using qqplot() for the examination of distributional properties rather than doing any sort of hypothesis testing. There is a learning curve for using that tool, but it will pay off in the end. -- David. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Kay Cecil Cichini Sent: Thursday, September 02, 2010 6:40 AM To: ted.hard...@manchester.ac.uk Cc: r-help@r-project.org Subject: Re: [R] general question on binomial test / sign test thanks a lot for the elaborations. your explanations clearly brought to me that either binom.test(1,1,0.5,two-sided) or binom.test(0,1,0.5) giving a p-value of 1 simply indicate i have abolutely no ensurance to reject H0. considering binom.test(0,1,0.5,alternative=greater) and binom.test(1,1,0.5,alternative=less) where i get a p-value of 1 and 0.5,respectively - am i right in stating that for the first estimate 0/1 i have no ensurance at all for rejection of H0 and for the second estimate = 1/1 i have same chance for beeing wrong in either rejecting or keeping H0. many thanks, kay David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear models (lme4) - basic question
Perhaps even more to the point, covariate adjustment and classification should not be separate. One should fit the appropriate model that does both. -- Bert On Thu, Sep 2, 2010 at 11:34 AM, Ben Bolker bbol...@gmail.com wrote: On 10-09-02 02:26 PM, James Nead wrote: My apologies - I have made this more confusing than it needs to be. I had microarray gene expression data which I want to use for classification algorithms. However, I want to 'adjust' the data for all confounding factors (such as age, experiment number etc.), before I could use the data as input for the classification algorithms. Since the phenotype is known to be effected by age, I thought that this would be a fixed effect whereas something like 'beadchip' would be a random effect. Should I be looking at something else for this? Sounds to me as though you should use residuals() rather than fitted() if you want to adjust for confounding factors. But since you've made up a nice small example, I think you should look at the results of fitted() and residuals() for your example and see if it's doing what you want. *From:* Ben Bolker bbol...@gmail.com *To:* r-h...@stat.math.ethz.ch *Sent:* Thu, September 2, 2010 2:06:47 PM *Subject:* Re: [R] Linear models (lme4) - basic question James Nead james_nead at yahoo.com http://yahoo.com writes: Sorry, forgot to mention that the processed data will be used as input for a classification algorithm. So, I need to adjust for known effects before I can use the data. I am trying to adjust raw data for both fixed and mixed effects. The data that I output should account for these effects, so that I can use the adjusted data for further analysis. For example, if I have the blood sugar levels for 30 patients, and I know that 'weight' is a fixed effect and that 'height' is a random effect, what I'd want as output is blood sugar levels that have been adjusted for these effects. What's not clear to me is what you mean by 'adjusted for'. fitted(lm.adj) will give predicted values based on the height and weight. I don't really know what the justification for/meaning of the adjustment is, so I don't know whether you want to predict on the basis of the heights, or whether you want to get a 'population-level' prediction, i.e. one with height effects set to zero. Maybe you want residuals(lm.adj) ...? I suggest that follow-ups go to r-sig-mixed-mod...@r-project.org mailto:r-sig-mixed-mod...@r-project.org __ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics 467-7374 http://devo.gene.com/groups/devo/depts/ncb/home.shtml __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why is vector assignment in R recreates the entire vector ?
Tal wrote: A friend recently brought to my attention that vector assignment actually recreates the entire vector on which the assignment is performed. ... I brought this up in r-devel a few months ago. You can read my posting, and the various replies, at http://www.mail-archive.com/r-de...@r-project.org/msg20089.html Some of the replies not only explain the process, but list lines in the source code where this takes place, enabling a closer look at how/when duplication occurs. Norm Matloff __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simultaneous equations
Dear all, I am relatively new to R and have had some difficulty in understanding the user manual for a package that I have downloaded to evaluate non-linear simultaneous equations. The package is called systemfit. Does anyone have any experience of this package? What I am trying to do is solve a non linear simultaneous equations... Could anyone give me an example (please) of the code that would be needed to return solutions for the following system using systemfit (or any other better package): y=1/x y=sin(x) within a range of say x=-10 to 10 (x in radians) Thanks, I really appreciate your help in advance. Ben -- View this message in context: http://r.789695.n4.nabble.com/Simultaneous-equations-tp2524645p2524645.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simultaneous equations
benhartley903 wrote: Dear all, I am relatively new to R and have had some difficulty in understanding the user manual for a package that I have downloaded to evaluate non-linear simultaneous equations. The package is called systemfit. Does anyone have any experience of this package? What I am trying to do is solve a non linear simultaneous equations... Could anyone give me an example (please) of the code that would be needed to return solutions for the following system using systemfit (or any other better package): y=1/x y=sin(x) within a range of say x=-10 to 10 (x in radians) Thanks, I really appreciate your help in advance. Ben systemfit is intended for estimating a system of simultaneous equations. You seem to want to solve a system. I assume you want to solve this system for x and y. You could use package nleqslv as follows: library(nleqslv) fun - function(x) { f - numeric(length(x)) f[1] - x[2] - 1/x[1] f[2] - x[2] - sin(x[1]) f } x.start - c(1,1) nleqslv(x.start,fun) # will give 1.1141571 0.8975395 x.start - c(2,2) nleqslv(x.start,fun) # will give 2.7726047 0.3606717 Berend -- View this message in context: http://r.789695.n4.nabble.com/Simultaneous-equations-tp2524645p2524710.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simultaneous equations
On 09/02/2010 09:25 PM, benhartley903 wrote: Dear all, I am relatively new to R and have had some difficulty in understanding the user manual for a package that I have downloaded to evaluate non-linear simultaneous equations. The package is called systemfit. Does anyone have any experience of this package? What I am trying to do is solve a non linear simultaneous equations... Could anyone give me an example (please) of the code that would be needed to return solutions for the following system using systemfit (or any other better package): y=1/x y=sin(x) within a range of say x=-10 to 10 (x in radians) Thanks, I really appreciate your help in advance. Ben Systemfit is not even in the same ballpark...! I'd just rewrite the above as x * sin(x) - 1 = 0 make a graph to bracket the roots and then use uniroot. f - function(x) x*sin(x)-1 curve(f, interval=c(-10.10) f(c(0,2,5,7,10)) [1] -1.000 0.8185949 -5.7946214 3.5989062 -6.4402111 So the roots are uniroot(f,interval=c(7,10))$root [1] 9.317241 uniroot(f,interval=c(5,7))$root [1] 6.43914 uniroot(f,interval=c(2,5))$root [1] 2.772631 uniroot(f,interval=c(0,2))$root [1] 1.114161 and 4 more symmetrically below zero. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing COXPH models, one with age as a continuous variable, one with age as a three-level factor
On Thu, 2 Sep 2010, stephenb wrote: sorry to bump in late, but I am doing similar things now and was browsing. IMHO anova is not appropriate here. it applies when the richer model has p more variables than the simpler model. this is not the case here. the competing models use different variables. A simple approach is to have the factor variable in the model and to formally test for added information given by the continuous variable (linear, quadratic, spline, etc). AIC could also be used. you are left with IC. by transforming a continuous variable into categorical you are smoothing, which is the idea of GAM. if you look at what is offered in GAMs you may find better approximations f(age) as well as tools for testing among different f(age) transformations. I don't follow that comment. Smoothing uses the full continuous variable to begin with. A restricted cubic spline function in age is a simple approach. E.g.: require(rms) dd - datadist(mydata); options(datadist='dd') f - cph(Surv(dtime,death) ~ rcs(age,4) + sex, data=mydata) plot(Predict(f, age)) Note that you can always expect the categorized version of age not to fit the data except sometimes when behavior is dictated by law (driving, drinking, military service, medicare). Frank regards. S. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Power analysis
I am aware this is fairly simple, but is currently driving me mad! Could someone help me out with conducting a post-hoc power analysis on a Wilcoxon test. I am being driven slightly mad by some conflicting advice! Thanks in advance, Lewis [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simultaneous equations
Thanks Peter Actually I should have specified. These are not actually the functions I ultimately want to solve can't be rearranged explicitly and substituted. I do need a way to solve simultaneously. Ben -- View this message in context: http://r.789695.n4.nabble.com/Simultaneous-equations-tp2524645p2524751.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ordering data by variable
Hi listers, I could order a data that like this: x-c(2,6,8,8,1) y-c(1,6,3,5,4) o-order(x) frame-rbind(x,y)[,o] But, I would like to know if there is a way to order my data without setting up a data frame. I would like to keep independent vectors x and y. Any suggestions? Thanks in advance, Marcio -- View this message in context: http://r.789695.n4.nabble.com/Ordering-data-by-variable-tp2524754p2524754.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ordering data by variable
Hi Marcio, Is this what you want? x - c(2,6,8,8,1) y - c(1,6,3,5,4) o - order(x) # If you want each vector order by x x[o] y[o] You can also use sort(), but then each vector would be sorted by itself, not both by x. HTH, Josh On Thu, Sep 2, 2010 at 1:48 PM, Mestat mes...@pop.com.br wrote: Hi listers, I could order a data that like this: x-c(2,6,8,8,1) y-c(1,6,3,5,4) o-order(x) frame-rbind(x,y)[,o] But, I would like to know if there is a way to order my data without setting up a data frame. I would like to keep independent vectors x and y. Any suggestions? Thanks in advance, Marcio -- View this message in context: http://r.789695.n4.nabble.com/Ordering-data-by-variable-tp2524754p2524754.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How using the weights argument in nls2?
On Thu, Sep 2, 2010 at 11:09 AM, Ivan Allaman ivanala...@yahoo.com.br wrote: Good morning gentlemen! How using a weighted model in nls2? Values with the nls are logical since values with nls2 are not. I believe that this discrepancy is due to I did not include the weights argument in nls2. Just to follow up, nls2 was ignoring the weights argument. This is now fixed in the development version of nls2. The weights and no weights versions below give different residual sum of squares showing that weights is not ignored: library(nls2) # grab development version source(http://nls2.googlecode.com/svn/trunk/R/nls2.R;) BOD2 - cbind(BOD, w = 1:6) nls2(demand ~ a + b*Time, data = BOD2, start = c(a = 1, b = 1), weights = w, alg = brute) # compare against nls2(demand ~ a + b*Time, data = BOD2, start = c(a = 1, b = 1), alg = brute) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing COXPH models, one with age as a continuous variable, one with age as a three-level factor
Totally agreed. I made a mistake in calling the categorization a GAM. If we apply a step function to the continuous age we get a limited range ordinal variable. Categorizing is creating several binary variables from the continuous (with treatment contrasts). Stephen B -Original Message- From: Frank Harrell [mailto:harre...@gmail.com] On Behalf Of Frank Harrell Sent: Thursday, September 02, 2010 4:23 PM To: Bond, Stephen Cc: r-help@r-project.org Subject: Re: [R] Comparing COXPH models, one with age as a continuous variable, one with age as a three-level factor On Thu, 2 Sep 2010, stephenb wrote: sorry to bump in late, but I am doing similar things now and was browsing. IMHO anova is not appropriate here. it applies when the richer model has p more variables than the simpler model. this is not the case here. the competing models use different variables. A simple approach is to have the factor variable in the model and to formally test for added information given by the continuous variable (linear, quadratic, spline, etc). AIC could also be used. you are left with IC. by transforming a continuous variable into categorical you are smoothing, which is the idea of GAM. if you look at what is offered in GAMs you may find better approximations f(age) as well as tools for testing among different f(age) transformations. I don't follow that comment. Smoothing uses the full continuous variable to begin with. A restricted cubic spline function in age is a simple approach. E.g.: require(rms) dd - datadist(mydata); options(datadist='dd') f - cph(Surv(dtime,death) ~ rcs(age,4) + sex, data=mydata) plot(Predict(f, age)) Note that you can always expect the categorized version of age not to fit the data except sometimes when behavior is dictated by law (driving, drinking, military service, medicare). Frank regards. S. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power analysis
Be happy, don't do post-hoc power analyses. The standard post-hoc power analysis is actually counterproductive. It is much better to just create confidence intervals. Or give a better description/justification showing that your case is not the standard/worse than useless version. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Lewis G. Dean Sent: Thursday, September 02, 2010 9:45 AM To: r-help@r-project.org Subject: [R] Power analysis I am aware this is fairly simple, but is currently driving me mad! Could someone help me out with conducting a post-hoc power analysis on a Wilcoxon test. I am being driven slightly mad by some conflicting advice! Thanks in advance, Lewis [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ordering data by variable
Suggestion: use the power of R. If x and y are independent then sorting y based on x is meaningless. If sorting y based on x is meaningful, then they are not independent. Trying to force non-independent things to pretend that they are independent just causes future headaches. Part of the great power of R is the ability to group things together that should be grouped. The wise learn this and use it (in some cases (mine) that wisdom comes at the expense of not having properly grouped in the past). Learn the power of with/within, data= arguments, and apply style functions, then you will be eager to combine things into data frames (or lists or ...) when appropriate. descend from soapbox -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Mestat Sent: Thursday, September 02, 2010 2:49 PM To: r-help@r-project.org Subject: [R] Ordering data by variable Hi listers, I could order a data that like this: x-c(2,6,8,8,1) y-c(1,6,3,5,4) o-order(x) frame-rbind(x,y)[,o] But, I would like to know if there is a way to order my data without setting up a data frame. I would like to keep independent vectors x and y. Any suggestions? Thanks in advance, Marcio -- View this message in context: http://r.789695.n4.nabble.com/Ordering- data-by-variable-tp2524754p2524754.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] general question on binomial test / sign test
On 02-Sep-10 18:01:55, Greg Snow wrote: Just to add to Ted's addition to my response. I think you are moving towards better understanding (and your misunderstandings are common), but to further clarify: [Wise words about P(A|B), P(B|A), P-values, etc., snipped] The real tricky bit about hypothesis testing is that we compute a single p-value, a single observation from a distribution, and based on that try to decide if the process that produced that observation is a uniform distribution or something else (that may be close to a uniform or very different). Indeed. And this is precisely why I began my original reply as follows: Zitat von ted.hard...@manchester.ac.uk: [...] The general logic of a singificance test is that a test statistic (say T) is chosen such that large values represent a discrepancy between possible data and the hypothesis under test. When you have the data, T evaluates to a value (say t0). The null hypothesis (NH) implies a distribution for the statistic T if the NH is true. Then the value of Prob(T = t0 | NH) can be calculated. If this is small, then the probability of obtaining data at least as discrepant as the data you did obtain is small; if sufficiently small, then the conjunction of NH and your data (as assessed by the statistic T) is so unlikely that you can decide to not believe that it is possible. If you so decide, then you reject the NH because the data are so discrepant that you can't believe it! The point is that the test statistic T represents *discrepancy* between data and NH in some sense. In what sense? That depends on what you are interested in finding out; and, whatever it is, there will be some T that represents it. It might be whether two samples come from distributions with equal means, or not. Then you might use T = mean(Ysample) - mean(Xsample). Large values of |T| represent discrepancy (in either direction) between data and an NH that the true means are equal. Large values of T, discrepancy in the positive direction, large values of -T diuscrepancy in the negative direction. Or it might be whether or not the two samples are drawn from populations with equal variances, when you might use T = var(Ysample)/var(Xsample). Or it might be whether the distribution from which X was sampled is symmetric, in which case you might use skewness(Xsample). Or you might be interested in whether the numbers falling into disjoint classes are consistent with hypothetical probabilities p1,...,pk of falling into these classes -- in which case you might use the chi-squared statistic T = sum(((ni - N*pi)^2)/(N*pi)). And so on. Once you have decided on what discrepant means, and chosen a statistic T to represent discrepancy, then the NH implies a distribution for T and you can calculate P-value = Prob(T = t0 | NH) where t0 is the value of T calculated from the data. *THEN* small P-value is in direct correspondence with large T, i.e. small P is equivalent to large discrepancy. And it is also the direct measure of how likely you were to get so large a discrepancy if the NH really was true. Thus the P-values, calculated from the distribution of (T | NH), are ordered, not just numerically from small P to large, but also equivalently by discrepancy (from large discrepancy to small). Thus the uniform distribution of P under the NH does not just mean that any value of P is as likely as any other, so So what? Why prefer on P-value to another? We also have that different parts of the [0,1] P-scale have different *meanings* -- the parts near 0 are highly discrepant from NH, the parts near 1 are highly consistent with NH, *with respect to the meaning of discrepancy implied by the choice of test statistic T*. So it helps to understand hypothesis testing if you keep in mind what the test statistic T *represents* in real terms. Greg's point about try to decide if the process that produced that observation is a uniform distribution or something else (that may be close to a uniform or very different) is not in the first instance relevant to the direct interpretation of small P-value as large discrepancy -- that involves only the Null Hypothesis NH, under which the P-values have a uniform distribution. Where it somes into its own is that an Alternative Hypothesis AH would correspond to some degree of discrepancy of a certain kind, and if T is well chosen then its distribution under AH will give large values of T greater probability than they would get under NH. Thus the AHs that are implied by a large value of a certain test statistic T are those AHs that give such values of T greater probability than they would get under NH. Thus we are now getting into the domain of the Power of the test to detect discrepancy. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 02-Sep-10 Time: 22:59:23
[R] specify the covariance matrix for random effect
Hi, I'm doing a generalized linear mixed model, and I currently use an R function called glmm. However, in this function they use a standard normal distribution for the random effect, which doesn't satisfy my case, i.e. my random effect also follows a normal distribution, but observations over time are somehow correlated, so the covariance matrix would be different the identity matrix. Besides, it's also different from the commonly used AR(1), AR(2) structures, so actually I need a way to write down the covariance matrix for the random effect in GLMM by myself. Does anyone know how to do this in R? Thank you in advance for any help. -- Best, Vicky [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NLS equation self starting non linear
This data are kilojoules of energy that are consumed in starving fish over a time period (Days). The KJ reach a lower asymptote and level off and I would like to use a non-linear plot to show this leveling off. The data are noisy and the sample sizes not the largest. I have tried selfstarting weibull curves and tried the following, both end with errors. Days-c(12, 12, 12, 12, 22, 22, 22, 22, 32, 32, 32, 32, 37, 38, 38, 39, 39, 39, 39, 39, 39, 39, 1, 1, 1, 1) Joules-c(8.782010, 8.540524, 8.507526, 11.296904, 4.232690, 13.026588, 10.213342, 4.771482, 4.560359, 6.146684, 9.651727, 8.064329, 9.419335, 7.129264, 6.079012, 7.095888, 17.996794, 7.028287, 8.028352, 5.497564, 11.607090, 9.987215, 11.065307, 21.433094, 20.366385, 22.475717) X11() par(cex=1.6) plot(Joules~Days,xlab=Days after fasting was initiated,ylab=Mean energy per fish (KJ)) model-nls(joules~a+b*exp(-c*Days),start=list(a=8,b=9,c=-.229), control=list(minFactor=1e-12),trace=TRUE) summary(model) init - nls(Joules~ SSweibull(Days,Asym,Drop,lrc,pwr), control=list(minFactor=1e-12),data=.GlobalEnv) init Data here is reproducible. As always, thank you...keith -- M. Keith Cox, Ph.D. Alaska NOAA Fisheries, National Marine Fisheries Service Auke Bay Laboratories 17109 Pt. Lena Loop Rd. Juneau, AK 99801 keith@noaa.gov marlink...@gmail.com U.S. (907) 789-6603 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] specify the covariance matrix for random effect
Qiu, Weiyu weiyu at ualberta.ca writes: Hi, I'm doing a generalized linear mixed model, and I currently use an R function called glmm. However, in this function they use a standard normal distribution for the random effect, which doesn't satisfy my case, i.e. my random effect also follows a normal distribution, but observations over time are somehow correlated, so the covariance matrix would be different the identity matrix. Besides, it's also different from the commonly used AR(1), AR(2) structures, so actually I need a way to write down the covariance matrix for the random effect in GLMM by myself. If you could get by with an AR or ARMA structure then you could use lme() with the 'correlation' argument from the nlme package. If you have enough data/are willing to fit a completely unstructured correlation matrix, you could use corSymm. See ?corStruct in the nlme package documentation: it is *in principle* possible to write functions to implement your own correlation structures, e.g. see corSymm nlme:::corSymm.corMatrix nlme:::coef.corNatural However, this will not be easy unless you are already a good R programmer and familiar with the nlme package. If you really need to do this I definitely recommend that you buy and study Pinheiro and Bates's 2000 book. It might also be worth taking a look at the cor* objects in the ape package, which are coded slightly more transparently. Followups should probably go to r-sig-mixed-mod...@r-project.org good luck Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing for emptyenv
Is there a complete list of these very handy and power functions in the base R? -- View this message in context: http://r.789695.n4.nabble.com/testing-for-emptyenv-tp2432922p2525031.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pairs with same xlim and ylim scale
When pairs draws plots, lower.panel invokes f.xy. Maybe there is something in f.xy incompatible with pairs. You can read the code of pairs to see what happens. pairs has two methods, as you can see in the help message (?pairs). According to your code, pairs is supposed to invoke Default S3 method. methods(pairs) [1] pairs.default pairs.formula* Non-visible functions are asterisked Therefore, you should check the code of the function pairs.default to see how error occurs. Just type pairs.default at the R command prompt and enter, you can get the source code of pairs.default. On 2010-9-2 15:15, Shi, Tao wrote: Hi Dejian, You're right on this! Do you know how to pass those two argument into lower.panel? Thanks! ...Tao From: Dejian Zhaozha...@ioz.ac.cn To:r-help@r-project.org Sent: Tue, August 31, 2010 6:10:16 PM Subject: Re: [R] pairs with same xlim and ylim scale I think you have successfully passed the xlim and ylim into the function pairs1. Compare the two graphs produced by the codes you provided, you can find the xlim and ylim in the second graph have been reset to the assigned value. It seems that the program halted in producing the second plot after adding xlim and ylim. According to the error message, the two added parameters were not used in lower.panel, or the customized function f.xy. On 2010-9-1 2:26, Shi, Tao wrote: Hi list, I have a function which basically is a wrapper of pairs with some useful panel functions. However, I'm having trouble to pass the xlim and ylim into the function so the x and y axes are in the same scale and 45 degree lines are exactly diagonal. I've looked at some old posts, they didn't help much. I [[elided Yahoo spam]] Thanks! ...Tao pairs1- function(x, ...) { f.xy- function(x, y, ...) { points(x, y, ...) abline(0, 1, col = 2) } panel.cor- function(x, y, digits=2, prefix=, cex.cor) { usr- par(usr); on.exit(par(usr)) par(usr = c(0, 1, 0, 1)) r- abs(cor(x, y, method=p, use=pairwise.complete.obs)) txt- format(c(r, 0.123456789), digits=digits)[1] txt- paste(prefix, txt, sep=) if(missing(cex.cor)) cex- 0.8/strwidth(txt) text(0.5, 0.5, txt, cex = cex * r) } panel.hist- function(x, ...) { usr- par(usr); on.exit(par(usr)) par(usr = c(usr[1:2], 0, 1.5) ) h- hist(x, plot = FALSE) breaks- h$breaks; nB- length(breaks) y- h$counts; y- y/max(y) rect(breaks[-nB], 0, breaks[-1], y, col=cyan, ...) } pairs(x, lower.panel=f.xy, upper.panel=panel.cor, diag.panel=panel.hist, ...) } x- rnorm(100, sd=0.2) x- cbind(x=x-0.1, y=x+0.1) pairs1(x) pairs1(x, xlim=c(-1,1), ylim=c(-1,1)) Error in lower.panel(...) : unused argument(s) (xlim = c(-1, 1), ylim = c(-1, 1)) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Making plots in big scatterplot matrix large enough to see
William, Thanks. I adapted your example by doing: library(psych) pdf(file=myfile.pdf,width=30,height=30) pairs.panels(data,gap=0) dev.off() The R part worked. I could see it doing so when I replaced the call to 'pdf' by windows(width=30,height=30) . The remaining problem was that Adobe Reader only displayed ten rows of my file, and then hung. That doesn't surprise me, because it's an unreliable piece of software, often hanging on PDFs I find on Google. Annoying, though. Jocelyn Paine http://www.j-paine.org http://www.spreadsheet-parts.org +44 (0)7768 534 091 On Tue, 31 Aug 2010, William Revelle wrote: Jocelyn, In a partial answer to your question, try setting gap=0 in the calls to pairs. This will make the plots closer together. (You might also find pairs.panels in the psych package useful, -- it implements one of the help examples for pairs to report the histogram on the diagonal and reports the correlations in the upper off diagonal). On a Mac, I just tried setting quartz(width=30, height=30) #make a big graphics window #then library(psych) my.data - sim.item(24) #create 500 cases of 24 variables pairs.panels(my.data, gap=0) #the gap =0 makes the plots right next to each other #And then save the graphics window as a pdf. I can open this in a pdf and scroll around pretty easily. Bill At 5:21 AM +0100 8/31/10, Jocelyn Paine wrote: I've got a data frame with 23 columns, and wanted to plot a scatterplot matrix of it. I called pairs( df ) where 'df' is my data frame. This did generate the matrix, but the plotting window did not expand to make the individual plots large enough to see. Each one was only about 10 pixels high and wide. I tried sending the plot to a file, with a high and wide image, by doing png( plot.png, width = 4000, height = 4000 ) but I got these errors: Error in png( plot.png, width = 4000, height = 4000 ) : unable to start device In addition: Warning messages: 1: In png( plot.png, width = 4000, height = 4000 ) : Unable to allocate bitmap 2: In png( plot.png, width = 4000, height = 4000 ) : opening device failed The messages aren't helpful, because they don't tell you _why_ R can't start the device, allocate it, or open it. The documentation for png says: Windows imposes limits on the size of bitmaps: these are not documented in the SDK and may depend on the version of Windows. It seems that width and height are each limited to 2^15-1. However, 2^15-1 is 32767, so that isn't the problem here. I tried various values for height and width. 2400 was OK, but 2500 wasn't. So it seems R can't produce plots that are more than about 2400 pixels square. This is with R 2.10.1. Why is png failing on big images? Also, what's the recommended way to make a file containing a scatterplot matrix when you have lots of variables? 'pairs' is a very useful function, but obviously one does need to be careful when doing this, and I don't know what experts would recommend. Do you loop round the variables plotting each pair to a different file? I was hoping that I could put them all into one very big image and view parts of it at a time. Thanks, Jocelyn Paine http://www.j-paine.org http://www.spreadsheet-parts.org +44 (0)7768 534 091 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- William Revelle http://personality-project.org/revelle.html Professor http://personality-project.org Department of Psychology http://www.wcas.northwestern.edu/psych/ Northwestern University http://www.northwestern.edu/ Use R for psychology http://personality-project.org/r It is 6 minutes to midnight http://www.thebulletin.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Making plots in big scatterplot matrix large enough to see
Greg, thanks for the suggestion. That's useful to know for future work. It's not so good in this case, because I'm making the plots for a colleague who doesn't know R, and it would be a bother for me to have to send him several files and him reassemble them. What I did was to use pairs.panels, as suggested by William Revelle on this thread. I'd like to ask a general question about the interface though. There's a size below which individual scatterplots are not legible. It makes no sense at all for a scatterplot routine to plot them at that size or smaller. So why didn't the author(s) of 'pairs' program it so that it always makes them at least legible size, and expands the image window until it can fit them all in? Regards, Jocelyn Paine http://www.j-paine.org +44 (0)7768 534 091 Jocelyn's Cartoons: http://www.j-paine.org/blog/jocelyns_cartoons/ On Tue, 31 Aug 2010, Greg Snow wrote: Look at the pairs2 function in the TeachingDemos package, this lets you produce smaller portions of the total scatterplot matrix at a time (with bigger plots), you could print the smaller portions then assemble the full matrix on a large wall, or just use it to look at potentially interesting parts. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Jocelyn Paine Sent: Monday, August 30, 2010 10:21 PM To: r-help@r-project.org Subject: [R] Making plots in big scatterplot matrix large enough to see I've got a data frame with 23 columns, and wanted to plot a scatterplot matrix of it. I called pairs( df ) where 'df' is my data frame. This did generate the matrix, but the plotting window did not expand to make the individual plots large enough to see. Each one was only about 10 pixels high and wide. I tried sending the plot to a file, with a high and wide image, by doing png( plot.png, width = 4000, height = 4000 ) but I got these errors: Error in png( plot.png, width = 4000, height = 4000 ) : unable to start device In addition: Warning messages: 1: In png( plot.png, width = 4000, height = 4000 ) : Unable to allocate bitmap 2: In png( plot.png, width = 4000, height = 4000 ) : opening device failed The messages aren't helpful, because they don't tell you _why_ R can't start the device, allocate it, or open it. The documentation for png says: Windows imposes limits on the size of bitmaps: these are not documented in the SDK and may depend on the version of Windows. It seems that width and height are each limited to 2^15-1. However, 2^15-1 is 32767, so that isn't the problem here. I tried various values for height and width. 2400 was OK, but 2500 wasn't. So it seems R can't produce plots that are more than about 2400 pixels square. This is with R 2.10.1. Why is png failing on big images? Also, what's the recommended way to make a file containing a scatterplot matrix when you have lots of variables? 'pairs' is a very useful function, but obviously one does need to be careful when doing this, and I don't know what experts would recommend. Do you loop round the variables plotting each pair to a different file? I was hoping that I could put them all into one very big image and view parts of it at a time. Thanks, Jocelyn Paine http://www.j-paine.org http://www.spreadsheet-parts.org +44 (0)7768 534 091 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Making plots in big scatterplot matrix large enough to see
On 2010-08-31 13:49, Greg Snow wrote: Look at the pairs2 function in the TeachingDemos package, this lets you produce smaller portions of the total scatterplot matrix at a time (with bigger plots), you could print the smaller portions then assemble the full matrix on a large wall, or just use it to look at potentially interesting parts. If I remember correctly, there's also the xysplom() function in pkg:HH. -Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] calculate monthly mean
Hello group Im trying to plot 3d with scatterplot packages, i got error say length(color) must be equal length(x) or 1 may data has dimensions (lon,lat,lev,time) ,the time in month i want to calculate the monthly mean for the time how can i make that , is there any function doing that Thanks a lot ##load rgl package library(rgl) library(fields) library(ncdf) library(scatterplot3d) ## open binary file to read nc - open.ncdf(/srv/ccrc/data05/z3236814/mm5/co2/2000/q.21.mon.nc) v1 - nc$var [[1]] v2 - nc$var [[2]] v3 - nc$var [[3]] data1 - get.var.ncdf(nc,v1) data2 - get.var.ncdf(nc,v2) data3 - get.var.ncdf(nc,v3) coldat = data1[1:111,1:101,23,1:60] ## creat colour hcol = cumsum(coldat) coldat = hcol hcol = hcol/max(hcol,na.rm=TRUE) col - hsv(h=hcol,s=1,v=1) X -scatterplot3d(data3[1:111,1:101],data2[1:111,1:101],data1[1:111,1:101,23,1:60],color=col) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power analysis
Agree with Greg's point. In fact it does not make logical sense in many cases. Similar to the use of the statistically unreliable reliability measure Cronbach's alpha in some non-statistical fields. -- View this message in context: http://r.789695.n4.nabble.com/Power-analysis-tp2524729p2524907.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] specify the covariance matrix for random effect
Hi, I'm doing a generalized linear mixed model, and I currently use an R function called glmm. However, in this function they use a standard normal distribution for the random effect, which doesn't satisfy my case, i.e. my random effect also follows a normal distribution, but observations over time are somehow correlated, so the covariance matrix would be different the identity matrix. Besides, it's also different from the commonly used AR(1), AR(2) structures, so actually I need a way to write down the covariance matrix for the random effect in GLMM by myself. Does anyone know how to do this in R? Thank you in advance for any help. -- Best, Vicky [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] date
try to use difftime() instead of as.difftime(). On Thu, Sep 2, 2010 at 10:32 PM, Dunia Scheid dunia.sch...@gmail.com wrote: Hello all, I've 2 strings that representing the start and end values of a date and time. For example, time1 - c(21/04/2005,23/05/2005,11/04/2005) time2 - c(15/07/2009, 03/06/2008, 15/10/2005) as.difftime(time1,time2) Time differences in secs [1] NA NA NA attr(,tzone) [1] How can i calculate the difference between this 2 string? Regards, Dunia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NLS equation self starting non linear
On Thu, Sep 2, 2010 at 7:39 PM, Marlin Keith Cox marlink...@gmail.com wrote: This data are kilojoules of energy that are consumed in starving fish over a time period (Days). The KJ reach a lower asymptote and level off and I would like to use a non-linear plot to show this leveling off. The data are noisy and the sample sizes not the largest. I have tried selfstarting weibull curves and tried the following, both end with errors. Days-c(12, 12, 12, 12, 22, 22, 22, 22, 32, 32, 32, 32, 37, 38, 38, 39, 39, 39, 39, 39, 39, 39, 1, 1, 1, 1) Joules-c(8.782010, 8.540524, 8.507526, 11.296904, 4.232690, 13.026588, 10.213342, 4.771482, 4.560359, 6.146684, 9.651727, 8.064329, 9.419335, 7.129264, 6.079012, 7.095888, 17.996794, 7.028287, 8.028352, 5.497564, 11.607090, 9.987215, 11.065307, 21.433094, 20.366385, 22.475717) X11() par(cex=1.6) plot(Joules~Days,xlab=Days after fasting was initiated,ylab=Mean energy per fish (KJ)) model-nls(joules~a+b*exp(-c*Days),start=list(a=8,b=9,c=-.229), control=list(minFactor=1e-12),trace=TRUE) summary(model) Note that Joules is defined above but joules is used as well. We have assumed they are the same. Also note that the formula is linear in log(joules) if a=0 so lets assume a=0 and use lm to solve. Also switch to the plinear algorithm which only requires starting values for the nonlinear parameters -- in this case only c (which we get from the lm). joules - Joules coef(lm(log(joules) ~ Days)) (Intercept)Days 2.61442015 -0.01614845 # so use 0.016 as starting value of c fm - nls(joules~cbind(1, exp(-c*Days)), start = list(c = 0.016), alg = plinear); fm Nonlinear regression model model: joules ~ cbind(1, exp(-c * Days)) data: parent.frame() c .lin1 .lin2 0.2314 8.3630 13.2039 residual sum-of-squares: 290.3 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Making plots in big scatterplot matrix large enough to see
On 2010-09-02 22:16, Jocelyn Paine wrote: Greg, thanks for the suggestion. That's useful to know for future work. It's not so good in this case, because I'm making the plots for a colleague who doesn't know R, and it would be a bother for me to have to send him several files and him reassemble them. What I did was to use pairs.panels, as suggested by William Revelle on this thread. I'd like to ask a general question about the interface though. There's a size below which individual scatterplots are not legible. It makes no sense at all for a scatterplot routine to plot them at that size or smaller. So why didn't the author(s) of 'pairs' program it so that it always makes them at least legible size, and expands the image window until it can fit them all in? Regards, Jocelyn Paine http://www.j-paine.org +44 (0)7768 534 091 Hmm, I had no trouble creating and viewing William's pdf file. I was also able to view the pairs plot fairly well on my screen. And that's on my laptop. Perhaps my display has better resolution than yours. Your suggestion to expand the image window until it can fit them all in would, at the very least, involve determining the resolution of the display device. But even then, there's bound to be someone who'll want a pairs plot for 1000 variables. As usual with R, improvements are always welcome. You could submit appropriate code and, if it is deemed useful, I'm fairly sure that a better pairs() function will become part of Rx.xx.x. -Peter Ehlers Jocelyn's Cartoons: http://www.j-paine.org/blog/jocelyns_cartoons/ On Tue, 31 Aug 2010, Greg Snow wrote: Look at the pairs2 function in the TeachingDemos package, this lets you produce smaller portions of the total scatterplot matrix at a time (with bigger plots), you could print the smaller portions then assemble the full matrix on a large wall, or just use it to look at potentially interesting parts. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Jocelyn Paine Sent: Monday, August 30, 2010 10:21 PM To: r-help@r-project.org Subject: [R] Making plots in big scatterplot matrix large enough to see I've got a data frame with 23 columns, and wanted to plot a scatterplot matrix of it. I called pairs( df ) where 'df' is my data frame. This did generate the matrix, but the plotting window did not expand to make the individual plots large enough to see. Each one was only about 10 pixels high and wide. I tried sending the plot to a file, with a high and wide image, by doing png( plot.png, width = 4000, height = 4000 ) but I got these errors: Error in png( plot.png, width = 4000, height = 4000 ) : unable to start device In addition: Warning messages: 1: In png( plot.png, width = 4000, height = 4000 ) : Unable to allocate bitmap 2: In png( plot.png, width = 4000, height = 4000 ) : opening device failed The messages aren't helpful, because they don't tell you _why_ R can't start the device, allocate it, or open it. The documentation for png says: Windows imposes limits on the size of bitmaps: these are not documented in the SDK and may depend on the version of Windows. It seems that width and height are each limited to 2^15-1. However, 2^15-1 is 32767, so that isn't the problem here. I tried various values for height and width. 2400 was OK, but 2500 wasn't. So it seems R can't produce plots that are more than about 2400 pixels square. This is with R 2.10.1. Why is png failing on big images? Also, what's the recommended way to make a file containing a scatterplot matrix when you have lots of variables? 'pairs' is a very useful function, but obviously one does need to be careful when doing this, and I don't know what experts would recommend. Do you loop round the variables plotting each pair to a different file? I was hoping that I could put them all into one very big image and view parts of it at a time. Thanks, Jocelyn Paine http://www.j-paine.org http://www.spreadsheet-parts.org +44 (0)7768 534 091 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NLS equation self starting non linear
On 2010-09-02 22:32, Gabor Grothendieck wrote: On Thu, Sep 2, 2010 at 7:39 PM, Marlin Keith Coxmarlink...@gmail.com wrote: This data are kilojoules of energy that are consumed in starving fish over a time period (Days). The KJ reach a lower asymptote and level off and I would like to use a non-linear plot to show this leveling off. The data are noisy and the sample sizes not the largest. I have tried selfstarting weibull curves and tried the following, both end with errors. Days-c(12, 12, 12, 12, 22, 22, 22, 22, 32, 32, 32, 32, 37, 38, 38, 39, 39, 39, 39, 39, 39, 39, 1, 1, 1, 1) Joules-c(8.782010, 8.540524, 8.507526, 11.296904, 4.232690, 13.026588, 10.213342, 4.771482, 4.560359, 6.146684, 9.651727, 8.064329, 9.419335, 7.129264, 6.079012, 7.095888, 17.996794, 7.028287, 8.028352, 5.497564, 11.607090, 9.987215, 11.065307, 21.433094, 20.366385, 22.475717) X11() par(cex=1.6) plot(Joules~Days,xlab=Days after fasting was initiated,ylab=Mean energy per fish (KJ)) model-nls(joules~a+b*exp(-c*Days),start=list(a=8,b=9,c=-.229), control=list(minFactor=1e-12),trace=TRUE) summary(model) Note that Joules is defined above but joules is used as well. We have assumed they are the same. Also note that the formula is linear in log(joules) if a=0 so lets assume a=0 and use lm to solve. Also switch to the plinear algorithm which only requires starting values for the nonlinear parameters -- in this case only c (which we get from the lm). joules- Joules coef(lm(log(joules) ~ Days)) (Intercept)Days 2.61442015 -0.01614845 # so use 0.016 as starting value of c fm- nls(joules~cbind(1, exp(-c*Days)), start = list(c = 0.016), alg = plinear); fm Nonlinear regression model model: joules ~ cbind(1, exp(-c * Days)) data: parent.frame() c .lin1 .lin2 0.2314 8.3630 13.2039 residual sum-of-squares: 290.3 Keith, You would get Gabor's solution from your nls() call if you did: 1. replace 'Joules' with 'joules' 2. replace 'c=-.229' with 'c=.229' in your start vector. You already have the 'minus' in your formula. I find it useful to add the curve you get with your start values to the scatterplot to see how reasonable the values are: plot(Joules~Days) curve(8 + 9 * exp(-.229 * x), add=TRUE) After you get a convergent solution, you can add a curve with the model values. -Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] running an exe in the background
Hi, I'd like to be able to run a .exe in the background whenever library(package-x) is called. Is this possible? ~Aks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.