Re: [R] Total effect of X on Y under presence of interaction effects
This is what I believe is referred to as supression in regression, where the correlation correlation between the independent and the dependent variable turns out to be of one sign whereas the regression coefficient turns out to be of the opposite sign. Read here about supression: http://www.uvm.edu/~dhowell/gradstat/psych341/lectures/MultipleRegression/multreg3.html HTH -- View this message in context: http://r.789695.n4.nabble.com/Total-effect-of-X-on-Y-under-presence-of-interaction-effects-tp3514137p3516446.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lm and anova
Hi! We have run a linear regression model with 3 explanatory variables and get the output below. Does anyone know what type of test the anova model below does and why we get so different result in terms of significant variables by the two tables? Thanks! /Sara summary(model) Call: lm(formula = log(HOBU) ~ Vole1 + Volelag + Year) Residuals: Min1QMedian3Q Max -0.757284 -0.166681 0.009478 0.181304 0.692916 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 80.041737 12.018726 6.660 1.40e-07 *** Vole10.005521 0.041626 0.133 0.8953 Volelag 0.033966 0.018392 1.847 0.0738 . Year-0.035927 0.006027 -5.961 1.08e-06 *** anova(model) Analysis of Variance Table Response: log(HOBU) Df Sum Sq Mean Sq F valuePr(F) Vole1 1 1.7877 1.7877 13.1772 0.0009486 *** Volelag1 0.5817 0.5817 4.2878 0.0462831 * Year 1 4.8205 4.8205 35.5323 1.082e-06 *** Residuals 33 4.4769 0.1357 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to extract information from the following dataset?
Hi all, I have never worked with this kind of data before, so Please help me out with it. I have the following data set, in a csv file, looks like the following: Jan 27, 2010 16:01:24,000 125 - - - Jan 27, 2010 16:06:24,000 125 - - - Jan 27, 2010 16:11:24,000 176 - - - Jan 27, 2010 16:16:25,000 159 - - - Jan 27, 2010 16:21:25,000 142 - - - Jan 27, 2010 16:26:24,000 142 - - - Jan 27, 2010 16:31:24,000 125 - - - Jan 27, 2010 16:36:24,000 125 - - - Jan 27, 2010 16:41:24,000 125 - - - Jan 27, 2010 16:46:24,000 125 - - - Jan 27, 2010 16:51:24,000 125 - - - Jan 27, 2010 16:56:24,000 125 - - - Jan 27, 2010 17:01:24,000 157 - - - Jan 27, 2010 17:06:24,000 172 - - - Jan 27, 2010 17:11:25,000 142 - - - Jan 27, 2010 17:16:24,000 125 - - - Jan 27, 2010 17:21:24,000 125 - - - Jan 27, 2010 17:26:24,000 125 - - - Jan 27, 2010 17:31:24,000 125 - - - Jan 27, 2010 17:36:24,000 125 - - - Jan 27, 2010 17:41:24,000 125 - - - Jan 27, 2010 17:46:24,000 125 - - - Jan 27, 2010 17:51:24,000 125 - - - .. The first few columns are month, day, year, time with OS3 accuracy. And the last number is the measurement I need to extract. I wonder if there is a easy way to just take out the measurements only from a specific day and hour, i.e. if I want measurements from Jan 27 2010 16:--:-- then I get 125,125,176,159,142,142,125,125,125,125,125,125. Many thanks!! -- Xin Zhang Ph.D Candidate Department of Statistics University of California, Riverside [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract information from the following dataset?
Xin Zhang wrote: Hi all, I have never worked with this kind of data before, so Please help me out with it. I have the following data set, in a csv file, looks like the following: Jan 27, 2010 16:01:24,000 125 - - - Jan 27, 2010 16:06:24,000 125 - - - Jan 27, 2010 16:11:24,000 176 - - - Jan 27, 2010 16:16:25,000 159 - - - Jan 27, 2010 16:21:25,000 142 - - - Jan 27, 2010 16:26:24,000 142 - - - Jan 27, 2010 16:31:24,000 125 - - - Jan 27, 2010 16:36:24,000 125 - - - Jan 27, 2010 16:41:24,000 125 - - - Jan 27, 2010 16:46:24,000 125 - - - Jan 27, 2010 16:51:24,000 125 - - - Jan 27, 2010 16:56:24,000 125 - - - Jan 27, 2010 17:01:24,000 157 - - - Jan 27, 2010 17:06:24,000 172 - - - Jan 27, 2010 17:11:25,000 142 - - - Jan 27, 2010 17:16:24,000 125 - - - Jan 27, 2010 17:21:24,000 125 - - - Jan 27, 2010 17:26:24,000 125 - - - Jan 27, 2010 17:31:24,000 125 - - - Jan 27, 2010 17:36:24,000 125 - - - Jan 27, 2010 17:41:24,000 125 - - - Jan 27, 2010 17:46:24,000 125 - - - Jan 27, 2010 17:51:24,000 125 - - - .. The first few columns are month, day, year, time with OS3 accuracy. And the last number is the measurement I need to extract. I wonder if there is a easy way to just take out the measurements only from a specific day and hour, i.e. if I want measurements from Jan 27 2010 16:--:-- then I get 125,125,176,159,142,142,125,125,125,125,125,125. Many thanks!! The easiest is in the shell, if you're using some flavour of unix : grep Jan 27, 2010 16 filein.txt | awk '{print $5}' fileout.txt and use fileout which will contain only the column of data you want. -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Binomial
Hi, I need to create a function which generates a Binomial random number without using the rbinom function. Do I need to use the choose function or am I better just using a sample? Thanks. -- View this message in context: http://r.789695.n4.nabble.com/Binomial-tp3516778p3516778.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Binomial
Am 12.05.2011 10:46, schrieb blutack: Hi, I need to create a function which generates a Binomial random number without using the rbinom function. Do I need to use the choose function or am I better just using a sample? Thanks. I think I remember other software who generates binomial data with e.g. pi=0.7 by pi - 0.7 x - runif(100)pi summary(x) -- Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Snow/Snowfall hangs on windows 7
On 28.04.2011 09:57, Truc wrote: Dear Anna ! I have the same problem with Window 7 - 64 bits. If I use R 2.12.2 with snow packages 0.3-3. It works well. But with R 2.13.0 with the same snow packages. It just hang. I start R (Run as administrator), turn off firewall ... But it seems R .13.0 version of socket connect to window has been changed ??? That 's my experience so far. I was just on a Windows 7 64bit machine and tried to verify some older reports. For this one: the example in ?parApply library(snow) cl - makeSOCKcluster(c(localhost,localhost)) parSapply(cl, 1:20, get(+), 3) works fine with R-2.13.0 in 32-bit and 64-bit and snow 0.3-3. Since you have not given a single line of code, it is hard to help. Uwe Ligges -- View this message in context: http://r.789695.n4.nabble.com/Snow-Snowfall-hangs-on-windows-7-tp3436724p3480368.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Binomial
On 12-May-11 09:02:45, Alexander Engelhardt wrote: Am 12.05.2011 10:46, schrieb blutack: Hi, I need to create a function which generates a Binomial random number without using the rbinom function. Do I need to use the choose function or am I better just using a sample? Thanks. I think I remember other software who generates binomial data with e.g. pi=0.7 by pi - 0.7 x - runif(100)pi summary(x) -- Alex That needs to be the other way round (and perhaps also convert it to 0/1): x - 1*(runif(100) pi) since Prob(runif pi) = (1 - pi). Comparison: pi - 0.7 x - runif(100)pi x[1:10] # [1] FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE sum(x)/100 # [1] 0.36 x - 1*(runif(100) pi) x[1:10] # [1] 0 0 1 1 1 1 1 0 1 0 sum(x)/100 # [1] 0.62 Ted E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 12-May-11 Time: 10:21:26 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract information from the following dataset?
Date: Thu, 12 May 2011 10:43:59 +0200 From: jose-marcio.mart...@mines-paristech.fr To: xzhan...@ucr.edu CC: r-help@r-project.org Subject: Re: [R] How to extract information from the following dataset? Xin Zhang wrote: Hi all, I have never worked with this kind of data before, so Please help me out with it. I have the following data set, in a csv file, looks like the following: Jan 27, 2010 16:01:24,000 125 - - - Jan 27, 2010 16:06:24,000 125 - - - Jan 27, 2010 16:11:24,000 176 - - - Jan 27, 2010 16:16:25,000 159 - - - Jan 27, 2010 16:21:25,000 142 - - - Jan 27, 2010 16:26:24,000 142 - - - Jan 27, 2010 16:31:24,000 125 - - - Jan 27, 2010 16:36:24,000 125 - - - Jan 27, 2010 16:41:24,000 125 - - - Jan 27, 2010 16:46:24,000 125 - - - Jan 27, 2010 16:51:24,000 125 - - - Jan 27, 2010 16:56:24,000 125 - - - Jan 27, 2010 17:01:24,000 157 - - - Jan 27, 2010 17:06:24,000 172 - - - Jan 27, 2010 17:11:25,000 142 - - - Jan 27, 2010 17:16:24,000 125 - - - Jan 27, 2010 17:21:24,000 125 - - - Jan 27, 2010 17:26:24,000 125 - - - Jan 27, 2010 17:31:24,000 125 - - - Jan 27, 2010 17:36:24,000 125 - - - Jan 27, 2010 17:41:24,000 125 - - - Jan 27, 2010 17:46:24,000 125 - - - Jan 27, 2010 17:51:24,000 125 - - - .. The first few columns are month, day, year, time with OS3 accuracy. And the last number is the measurement I need to extract. I wonder if there is a easy way to just take out the measurements only from a specific day and hour, i.e. if I want measurements from Jan 27 2010 16:--:-- then I get 125,125,176,159,142,142,125,125,125,125,125,125. Many thanks!! The easiest is in the shell, if you're using some flavour of unix : grep Jan 27, 2010 16 filein.txt | awk '{print $5}' fileout.txt and use fileout which will contain only the column of data you want. Nomrally that is what I do but the R POSIXct features work pretty easily. I guess I'd use bash text processing commands to put the data into a form you like, perhaps y-mo-day time and then read it in in as data frame. Usually I convert everything to time since epoch began because I like integers but there are some facilities here like round that work well with date-times. dx-as.POSIXct(2011-04-03 13:14:15) dx [1] 2011-04-03 13:14:15 CDT round(dx,hour) [1] 2011-04-03 13:00:00 CDT as.integer(dx) [1] 1301854455 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R won't start keeps crashing
Hi there, I am reletively new user I only dowloaded is about a week ago. I was getting along fine but last night I tried to selected save workspace. Since then R will not work And I really really need it. There are two eror massages: The first if a pop up box is Fatal error: unable to resolve data in .RData. The second is a message in the GUI: Error in loadNamespace(name) there is no package called vars. When I click to dismiss the message box the whole thing just shuts down!! I have tried reinstalling but it has made no difference. Please Help -- View this message in context: http://r.789695.n4.nabble.com/R-won-t-start-keeps-crashing-tp3516829p3516829.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm and anova
Hi! We have run a linear regression model with 3 explanatory variables and get the output below. Does anyone know what type of test the anova model below does and why we get so different result in terms of significant variables by the two tables? Thanks! /Sara summary(model) Call: lm(formula = log(HOBU) ~ Vole1 + Volelag + Year) Residuals: Min1QMedian3Q Max -0.757284 -0.166681 0.009478 0.181304 0.692916 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 80.041737 12.018726 6.660 1.40e-07 *** Vole10.005521 0.041626 0.133 0.8953 Volelag 0.033966 0.018392 1.847 0.0738 . Year-0.035927 0.006027 -5.961 1.08e-06 *** anova(model) Analysis of Variance Table Response: log(HOBU) Df Sum Sq Mean Sq F valuePr(F) Vole1 1 1.7877 1.7877 13.1772 0.0009486 *** Volelag1 0.5817 0.5817 4.2878 0.0462831 * Year 1 4.8205 4.8205 35.5323 1.082e-06 *** Residuals 33 4.4769 0.1357 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract information from the following dataset?
I have the following data set, in a csv file, looks like the following: Jan 27, 2010 16:01:24,000 125 - - - Jan 27, 2010 16:06:24,000 125 - - - .. The first few columns are month, day, year, time with OS3 accuracy. And the last number is the measurement I need to extract. I wonder if there is a easy way to just take out the measurements only from a specific day and hour -- Xin Zhang Ph.D Candidate Department of Statistics University of California, Riverside --- I use strptime to configure the date format in my times series dataset. First check to see how the dates are read. For example: # check the structure str(your_file) 'data.frame' ...etc This tells me that my original date is a factor but not in POSIXlt format. #check your column dates head(your_file) [1] 1984-01-26 1984-02-09 1984-03-01 1984-03-15 1984-03-29 1984-04-12 These are discrete column dates. #convert your date format your_file$date- strptime(your_file$date,%m/%d/%Y) call ?strptime for options Example: For a specific day or hour, strptime would utilize: strptime(your_file$date,%d/%I) for day and hour. Once you extract the type of date format you want, run str(your_file) again to confirm the format change. Does this answer your question? Best, - --- Heather A. Wright, PhD candidate Ecology and Evolution of Plankton Stazione Zoologica Anton Dohrn Villa Comunale 80121 - Napoli, Italy -- View this message in context: http://r.789695.n4.nabble.com/How-to-extract-information-from-the-following-dataset-tp3516752p3516952.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Total effect of X on Y under presence of interaction effects
I second David's first reply regarding the non-utility of individual coefficients, especially for low-order terms. Also, nonlinearity can be quite important. Properly modeling main effects through the use of flexible nonlinear functions can sometimes do away with the need for interaction terms. Back to the original question, it is easy to get total effects for each predictor. The anova function in the rms package does this, by combining lower and higher-order effects (main effects + interactions). Frank David Winsemius wrote: On May 11, 2011, at 6:26 PM, Matthew Keller wrote: Not to rehash an old statistical argument, but I think David's reply here is too strong (In the presence of interactions there is little point in attempting to assign meaning to individual coefficients.). As David notes, the simple effect of your coefficients (e.g., a) has an interpretation: it is the predicted effect of a when b, c, and d are zero. If the zero-level of b, c, and d are meaningful (e.g., if you have centered all your variables such that the mean of each one is zero), then the coefficient of a is the predicted slope of a at the mean level of all other predictors... And there is internal evidence that such a procedure was not performed in this instance. I think my advice applies here. -- David. Matt On Wed, May 11, 2011 at 2:40 PM, Greg Snow lt;greg.s...@imail.orggt; wrote: Just to add to what David already said, you might want to look at the Predict.Plot and TkPredict functions in the TeachingDemos package for a simple interface for visualizing predicted values in regression models. These plots are much more informative than a single number trying to capture total effect. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of David Winsemius Sent: Wednesday, May 11, 2011 7:48 AM To: Michael Haenlein Cc: r-help@r-project.org Subject: Re: [R] Total effect of X on Y under presence of interaction effects On May 11, 2011, at 4:26 AM, Michael Haenlein wrote: Dear all, this is probably more a statistics question than an R question but probably there is somebody who can help me nevertheless. I'm running a regression with four predictors (a, b, c, d) and all their interaction effects using lm. Based on theory I assume that a influences y positively. In my output (see below) I see, however, a negative regression coefficient for a. But several of the interaction effects of a with b, c and d have positive signs. I don't really understand this. Do I have to add up the coefficient for the main effect and the ones of all interaction effects to get a total effect of a on y? Or am I doing something wrong here? In the presence of interactions there is little point in attempting to assign meaning to individual coefficients. You need to use predict() (possibly with graphical or tabular displays) and produce estimates of one or two variable at relevant levels of the other variables. The other aspect about which your model is not informative, is the possibility that some of these predictors have non-linear associations with `y`. (The coefficient for `a` examined in isolation might apply to a group of subjects (or other units of analysis) in which the values of `b`, `c`, and `d` were all held at zero. Is that even a situation that would occur in your domain of investigation?) -- David. Thanks very much for your answer in advance, Regards, Michael Michael Haenlein Associate Professor of Marketing ESCP Europe Paris, France Call: lm(formula = y ~ a * b * c * d) Residuals: Min 1Q Median 3Q Max -44.919 -5.184 0.294 5.232 115.984 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 27.3067 0.8181 33.379 2e-16 *** a -11.0524 2.0602 -5.365 8.25e-08 *** b-2.5950 0.4287 -6.053 1.47e-09 *** c -22.0025 2.8833 -7.631 2.50e-14 *** d20.5037 0.3189 64.292 2e-16 *** a:b 15.1411 1.1862 12.764 2e-16 *** a:c 26.8415 7.2484 3.703 0.000214 *** b:c 8.3127 1.5080 5.512 3.61e-08 *** a:d 6.6221 0.8061 8.215 2.33e-16 *** b:d -2.0449 0.1629 -12.550 2e-16 *** c:d 10.0454 1.1506 8.731 2e-16 *** a:b:c 1.4137 4.1579 0.340 0.733862 a:b:d-6.1547 0.4572 -13.463 2e-16 *** a:c:d -20.6848 2.8832 -7.174 7.69e-13 *** b:c:d-3.4864 0.6041 -5.772 8.05e-09 *** a:b:c:d 5.6184 1.6539 3.397 0.000683 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 7.913 on 12272 degrees of freedom Multiple R-squared: 0.8845, Adjusted
[R] strength of seasonal component
Hi All, a) Is it possible to estimate the strength of seasonality in timeseries data. Say I have monthly mean prices of an ten different assets. I decompose the data using stl() and obtain the seasonal parameter for each month. Is it possible to order the assets based on the strength of seasonality? b) which gives a better estimate on seasonality stl() or a robust linear model like MASS::rlm(mean price ~ month), considering the fact that the variable analysed is price series. Many thanks for the insight and help Regards, Krishna [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R won't start keeps crashing
Delete the file .RData in your working directory and try to start R again. Uwe Ligges On 12.05.2011 11:09, Bazman76 wrote: Hi there, I am reletively new user I only dowloaded is about a week ago. I was getting along fine but last night I tried to selected save workspace. Since then R will not work And I really really need it. There are two eror massages: The first if a pop up box is Fatal error: unable to resolve data in .RData. The second is a message in the GUI: Error in loadNamespace(name) there is no package called vars. When I click to dismiss the message box the whole thing just shuts down!! I have tried reinstalling but it has made no difference. Please Help -- View this message in context: http://r.789695.n4.nabble.com/R-won-t-start-keeps-crashing-tp3516829p3516829.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to fit a random data into Beta distribution?
On May 11, 2011, at 11:17 PM, MikeK wrote: I am also trying to fit data to a beta distribution. In Ang and Tang, Probability Concepts in Engineering, 2nd Ed., page 127-9, they describe a variant of a beta distribution with additional parameters than the standard beta distribution, enabling specification of a max and min value other than 0,1. This would be very useful for my purposes. Any thoughts on how to fit a distribution directly to this variant of the beta distribution, without starting from scratch? Scale your data to [0,1], fit, predict, invert the scaling. xscaled - (x-min(x))/max(x) xrescaled - max(x)*xscaled + min(x) (Better check that I made the correct order of those operations. The first attempt was wrong ... I think.) -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] deiversity / density
Hi all, I have a point data set (SHP) with coordinates and a attribute (i.e. type of point). These points are scattered around a fairly big area. What i would like to do is to find a sub-area where density of points sombined with the diversity of type is the biggest. Does anyone have any idea if this is somehowe possible to do in R? Any idea would be greatly aprpeciated, m [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract information from the following dataset?
? subset day = x time y | time z --- On Thu, 5/12/11, hwright heather.wri...@maine.edu wrote: From: hwright heather.wri...@maine.edu Subject: Re: [R] How to extract information from the following dataset? To: r-help@r-project.org Received: Thursday, May 12, 2011, 6:18 AM I have the following data set, in a csv file, looks like the following: Jan 27, 2010 16:01:24,000 125 - - - Jan 27, 2010 16:06:24,000 125 - - - .. The first few columns are month, day, year, time with OS3 accuracy. And the last number is the measurement I need to extract. I wonder if there is a easy way to just take out the measurements only from a specific day and hour -- Xin Zhang Ph.D Candidate Department of Statistics University of California, Riverside --- I use strptime to configure the date format in my times series dataset. First check to see how the dates are read. For example: # check the structure str(your_file) 'data.frame' ...etc This tells me that my original date is a factor but not in POSIXlt format. #check your column dates head(your_file) [1] 1984-01-26 1984-02-09 1984-03-01 1984-03-15 1984-03-29 1984-04-12 These are discrete column dates. #convert your date format your_file$date- strptime(your_file$date,%m/%d/%Y) call ?strptime for options Example: For a specific day or hour, strptime would utilize: strptime(your_file$date,%d/%I) for day and hour. Once you extract the type of date format you want, run str(your_file) again to confirm the format change. Does this answer your question? Best, - --- Heather A. Wright, PhD candidate Ecology and Evolution of Plankton Stazione Zoologica Anton Dohrn Villa Comunale 80121 - Napoli, Italy -- View this message in context: http://r.789695.n4.nabble.com/How-to-extract-information-from-the-following-dataset-tp3516752p3516952.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R won't start keeps crashing
OK I did a seach for the files and got: .Rdata which is 206KB Canada.Rdata which is 3kB If I click on .Rdata I get the crash. If I click on Canada.Rdata the system starts? also they are stored in different places? .Rdata is in My documents Canada.RData is in My Documents\vars\vars\data I assume I should delete .Rdata, should I also delete the other file? Where should these files be stored? I think the .RData was a result of my trying to save the workspace. Are there files being saved to the correct location? Thanks for your help -- View this message in context: http://r.789695.n4.nabble.com/R-won-t-start-keeps-crashing-tp3516829p3517115.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Binomial
Dear R helpers, I am raising one query regarding this Binomial thread with the sole intention of learning something more as I understand R forum is an ocean of knowledge. I was going through all the responses, but wondered that original query was about generating Binomial random numbers while what the R code produced so far generates the Bernoulli Random no.s i.e. 0 and 1. True Binomial distribution is nothing but no of Bernoulli trials. As I said I am a moron and don't understand much about Statistics. Just couldn't stop from asking my stupid question. Regards Sarah --- On Thu, 5/12/11, David Winsemius dwinsem...@comcast.net wrote: From: David Winsemius dwinsem...@comcast.net Subject: Re: [R] Binomial To: Alexander Engelhardt a...@chaotic-neutral.de Cc: r-help@r-project.org, blutack x-jess-...@hotmail.co.uk Date: Thursday, May 12, 2011, 11:08 AM On May 12, 2011, at 5:02 AM, Alexander Engelhardt wrote: Am 12.05.2011 10:46, schrieb blutack: Hi, I need to create a function which generates a Binomial random number without using the rbinom function. Do I need to use the choose function or am I better just using a sample? Thanks. I think I remember other software who generates binomial data with e.g. pi=0.7 by pi - 0.7 I hope Allan knows this and is just being humorous here, but for the less experienced in the audience ... Choosing a different threshold variable name might be less error prone. `pi` is one of few built-in constants in R and there may be code that depends on that fact. pi [1] 3.141593 pi - 0.7 pi [1] 0.7 rm(pi) pi [1] 3.141593 x - runif(100)pi summary(x) Another method would be: x - sample(c(0,1) , 100, replace=TRUE, prob=c(0.7, 0.3) ) -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
So what was the final verdict on this discussion? I kind of lost track if anyone has a minute to summarize and critique my summary below. Apparently there were two issues, the comparison between R and Stata was one issue and the optimum solution another. As I understand it, there was some question about R numerical gradient calculation. This would suggest some features of the function may be of interest to consider. The function to be optimized appears to be, as OP stated, some function of residuals of two ( unrelated ) fits. The residual vectors e1 and e2 are dotted in various combinations creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which is the result to be minimized by choice of theta. Theta it seems is an 8 component vector, 4 components determine e1 and the other 4 e2. Presumably a unique solution would require that e1 and e2, both n-component vectors, point in different directions or else both could become aribtarily large while keeping the error signal at zero. For fixed magnitudes, colinearity would reduce the Error. The intent would appear to be to keep the residuals distributed similarly in the two ( unrelated) fits. I guess my question is, did anyone determine that there is a unique solution? or am I totally wrong here ( I haven't used these myself to any extent and just try to run some simple teaching examples, asking for my own clarification as much as anything). Thanks. From: rvarad...@jhmi.edu To: pda...@gmail.com; alex.ols...@gmail.com Date: Sat, 7 May 2011 11:51:56 -0400 CC: r-help@r-project.org Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata There is something strange in this problem. I think the log-likelihood is incorrect. See the results below from optimx. You can get much larger log-likelihood values than for the exact solution that Peter provided. ## model 18 lnl - function(theta,y1, y2, x1, x2, x3) { n - length(y1) beta - theta[1:8] e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3 e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3 e - cbind(e1, e2) sigma - t(e)%*%e logl - -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) # it looks like there is something wrong here return(-logl) } data - read.table(e:/computing/optimx_example.dat, header=TRUE, sep=,) attach(data) require(optimx) start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) # the warnings can be safely ignored in the optimx calls p1 - optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2, + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) p2 - optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2, + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) p3 - optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2, + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) Ravi. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of peter dalgaard [pda...@gmail.com] Sent: Saturday, May 07, 2011 4:46 AM To: Alex Olssen Cc: r-help@r-project.org Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata On May 6, 2011, at 14:29 , Alex Olssen wrote: Dear R-help, I am trying to reproduce some results presented in a paper by Anderson and Blundell in 1982 in Econometrica using R. The estimation I want to reproduce concerns maximum likelihood estimation of a singular equation system. I can estimate the static model successfully in Stata but for the dynamic models I have difficulty getting convergence. My R program which uses the same likelihood function as in Stata has convergence properties even for the static case. I have copied my R program and the data below. I realise the code could be made more elegant - but it is short enough. Any ideas would be highly appreciated. Better starting values would help. In this case, almost too good values are available: start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) which appears to be the _exact_ solution. Apart from that, it seems that the conjugate gradient methods have difficulties with this likelihood, for some less than obvious reason. Increasing the maxit gets you closer but still not satisfactory. I would suggest trying out the experimental optimx package. Apparently, some of the algorithms in there are much better at handling this likelihood, notably nlm and nlminb. ## model 18 lnl - function(theta,y1, y2, x1, x2, x3) { n - length(y1) beta - theta[1:8] e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3 e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3 e - cbind(e1, e2) sigma - t(e)%*%e logl - -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) return(-logl) } p - optim(0*c(1:8), lnl, method=BFGS, hessian=TRUE, y1=y1, y2=y2, x1=x1,
[R] Simple order() data frame question.
Clearly, I don't understand what order() is doing and as ususl the help for order seems to only confuse me more. For some reason I just don't follow the examples there. I must be missing something about the data frame sort there but what? I originally wanted to reverse-order my data frame df1 (see below) by aa (a factor) but since this was not working I decided to simplify and order by bb to see what was haqppening!! I'm obviously doing something stupid but what? (df1 - data.frame(aa=letters[1:10], bb=rnorm(10))) # Order in acending order by bb (df1[order(df1[,2]),] ) # seems to work fine # Order in decending order by bb. (df1[order(df1[,-2]),]) # does not seem to work === sessionInfo() R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 [4] LC_NUMERIC=CLC_TIME=English_Canada.1252 attached base packages: [1] grid grDevices datasets splines graphics stats tcltk utils methods base other attached packages: [1] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.5.2 svSocket_0.9-51 TinnR_1.0.3 R2HTML_2.2 [8] Hmisc_3.8-3 survival_2.36-9 loaded via a namespace (and not attached): [1] cluster_1.13.3 lattice_0.19-26 svMisc_0.9-61 tools_2.13.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple order() data frame question.
On 05/12/2011 08:32 AM, John Kane wrote: Clearly, I don't understand what order() is doing and as ususl the help for order seems to only confuse me more. For some reason I just don't follow the examples there. I must be missing something about the data frame sort there but what? I originally wanted to reverse-order my data frame df1 (see below) by aa (a factor) but since this was not working I decided to simplify and order by bb to see what was haqppening!! I'm obviously doing something stupid but what? (df1- data.frame(aa=letters[1:10], bb=rnorm(10))) # Order in acending order by bb (df1[order(df1[,2]),] ) # seems to work fine # Order in decending order by bb. (df1[order(df1[,-2]),]) # does not seem to work There is a 'decreasing' option described in the help file for 'order' which does what you want: df1- data.frame(aa=letters[1:10],bb=rnorm(10)) df1[order(df1[,2],decreasing=TRUE),] aa bb 6 f 3.16449690 7 g 2.44362935 8 h 0.80990322 1 a 0.06365513 5 e -0.33932586 9 i -0.52119533 2 b -0.65623164 4 d -0.86918700 3 c -1.86750927 10 j -2.21178676 df1[order(df1[,1],decreasing=TRUE),] aa bb 10 j -2.21178676 9 i -0.52119533 8 h 0.80990322 7 g 2.44362935 6 f 3.16449690 5 e -0.33932586 4 d -0.86918700 3 c -1.86750927 2 b -0.65623164 1 a 0.06365513 The expression 'df1[,-2]' removes the second column from df1; clearly not what you want here. -- Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple order() data frame question.
Try (df1[order(-df1[,2]),]) Adding the minus within the [ leaves out the column (in this case column 2). See ?[. HTH. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of John Kane Sent: donderdag 12 mei 2011 14:33 To: R R-help Subject: [R] Simple order() data frame question. Clearly, I don't understand what order() is doing and as ususl the help for order seems to only confuse me more. For some reason I just don't follow the examples there. I must be missing something about the data frame sort there but what? I originally wanted to reverse-order my data frame df1 (see below) by aa (a factor) but since this was not working I decided to simplify and order by bb to see what was haqppening!! I'm obviously doing something stupid but what? (df1 - data.frame(aa=letters[1:10], bb=rnorm(10))) # Order in acending order by bb (df1[order(df1[,2]),] ) # seems to work fine # Order in decending order by bb. (df1[order(df1[,-2]),]) # does not seem to work === sessionInfo() R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 [4] LC_NUMERIC=CLC_TIME=English_Canada.1252 attached base packages: [1] grid grDevices datasets splines graphics stats tcltk utils methods base other attached packages: [1] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.5.2 svSocket_0.9-51 TinnR_1.0.3 R2HTML_2.2 [8] Hmisc_3.8-3 survival_2.36-9 loaded via a namespace (and not attached): [1] cluster_1.13.3 lattice_0.19-26 svMisc_0.9-61 tools_2.13.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Binomial
Am 12.05.2011 13:19, schrieb Sarah Sanchez: Dear R helpers, I am raising one query regarding this Binomial thread with the sole intention of learning something more as I understand R forum is an ocean of knowledge. I was going through all the responses, but wondered that original query was about generating Binomial random numbers while what the R code produced so far generates the Bernoulli Random no.s i.e. 0 and 1. True Binomial distribution is nothing but no of Bernoulli trials. As I said I am a moron and don't understand much about Statistics. Just couldn't stop from asking my stupid question. Oh, yes. You can generate one B(20,0.7)-distributed random varible by summing up the like this: pie - 0.7 x - runif(20) x [1] 0.83108099 0.72843379 0.08862017 0.78477878 0.69230873 0.11229410 [7] 0.64483435 0.87748373 0.17448824 0.43549622 0.30374272 0.76274317 [13] 0.34832376 0.20876835 0.85280612 0.93810355 0.65720548 0.05557451 [19] 0.88041390 0.68938009 x - runif(20) pie x [1] FALSE FALSE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE [13] FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE sum(x) [1] 10 You could shorten this to sum(runif(20)0.7) [1] 12 Which would be the same as rbinom(1,20,0.5) [1] 6 or even qbinom(runif(1),20,0.5) [1] 12 Just play around a little, and learn from the help files: ?rbinom Have fun! --- On Thu, 5/12/11, David Winsemiusdwinsem...@comcast.net wrote: From: David Winsemiusdwinsem...@comcast.net Subject: Re: [R] Binomial To: Alexander Engelhardta...@chaotic-neutral.de Cc: r-help@r-project.org, blutackx-jess-...@hotmail.co.uk Date: Thursday, May 12, 2011, 11:08 AM I hope Allan knows this and is just being humorous here, but for the less experienced in the audience ... Choosing a different threshold variable name might be less error prone. `pi` is one of few built-in constants in R and there may be code that depends on that fact. pi [1] 3.141593 He didn't, or better, he forgot. Also, that Allan isn't related to me (I think) :) - Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple order() data frame question.
Ah, this never would have occured to me. It's rather obvious now but of course, I'll forget it again. Note to self: Put it in the cribsheet. Thanks very mcuy --- On Thu, 5/12/11, Nick Sabbe nick.sa...@ugent.be wrote: From: Nick Sabbe nick.sa...@ugent.be Subject: RE: [R] Simple order() data frame question. To: 'John Kane' jrkrid...@yahoo.ca, 'R R-help' r-h...@stat.math.ethz.ch Received: Thursday, May 12, 2011, 8:50 AM Try (df1[order(-df1[,2]),]) Adding the minus within the [ leaves out the column (in this case column 2). See ?[. HTH. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of John Kane Sent: donderdag 12 mei 2011 14:33 To: R R-help Subject: [R] Simple order() data frame question. Clearly, I don't understand what order() is doing and as ususl the help for order seems to only confuse me more. For some reason I just don't follow the examples there. I must be missing something about the data frame sort there but what? I originally wanted to reverse-order my data frame df1 (see below) by aa (a factor) but since this was not working I decided to simplify and order by bb to see what was haqppening!! I'm obviously doing something stupid but what? (df1 - data.frame(aa=letters[1:10], bb=rnorm(10))) # Order in acending order by bb (df1[order(df1[,2]),] ) # seems to work fine # Order in decending order by bb. (df1[order(df1[,-2]),]) # does not seem to work === sessionInfo() R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 [4] LC_NUMERIC=C LC_TIME=English_Canada.1252 attached base packages: [1] grid grDevices datasets splines graphics stats tcltk utils methods base other attached packages: [1] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.5.2 svSocket_0.9-51 TinnR_1.0.3 R2HTML_2.2 [8] Hmisc_3.8-3 survival_2.36-9 loaded via a namespace (and not attached): [1] cluster_1.13.3 lattice_0.19-26 svMisc_0.9-61 tools_2.13.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple order() data frame question.
Argh. I knew it was at least partly obvious. I never have been able to read the order() help page and understand what it is saying. THanks very much. By the way, to me it is counter-intuitive that the the command is df1[order(df1[,2],decreasing=TRUE),] For some reason I keep expecting it to be order( , df1[,2],decreasing=TRUE) So clearly I don't understand what is going on but at least I a lot better off. I may be able to get this graph to work. --- On Thu, 5/12/11, Patrick Breheny patrick.breh...@uky.edu wrote: From: Patrick Breheny patrick.breh...@uky.edu Subject: Re: [R] Simple order() data frame question. To: John Kane jrkrid...@yahoo.ca Cc: R R-help r-h...@stat.math.ethz.ch Received: Thursday, May 12, 2011, 8:44 AM On 05/12/2011 08:32 AM, John Kane wrote: Clearly, I don't understand what order() is doing and as ususl the help for order seems to only confuse me more. For some reason I just don't follow the examples there. I must be missing something about the data frame sort there but what? I originally wanted to reverse-order my data frame df1 (see below) by aa (a factor) but since this was not working I decided to simplify and order by bb to see what was haqppening!! I'm obviously doing something stupid but what? (df1- data.frame(aa=letters[1:10], bb=rnorm(10))) # Order in acending order by bb (df1[order(df1[,2]),] ) # seems to work fine # Order in decending order by bb. (df1[order(df1[,-2]),]) # does not seem to work There is a 'decreasing' option described in the help file for 'order' which does what you want: df1- data.frame(aa=letters[1:10],bb=rnorm(10)) df1[order(df1[,2],decreasing=TRUE),] aa bb 6 f 3.16449690 7 g 2.44362935 8 h 0.80990322 1 a 0.06365513 5 e -0.33932586 9 i -0.52119533 2 b -0.65623164 4 d -0.86918700 3 c -1.86750927 10 j -2.21178676 df1[order(df1[,1],decreasing=TRUE),] aa bb 10 j -2.21178676 9 i -0.52119533 8 h 0.80990322 7 g 2.44362935 6 f 3.16449690 5 e -0.33932586 4 d -0.86918700 3 c -1.86750927 2 b -0.65623164 1 a 0.06365513 The expression 'df1[,-2]' removes the second column from df1; clearly not what you want here. -- Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple order() data frame question.
On May 12, 2011, at 8:09 AM, John Kane wrote: Argh. I knew it was at least partly obvious. I never have been able to read the order() help page and understand what it is saying. THanks very much. By the way, to me it is counter-intuitive that the the command is df1[order(df1[,2],decreasing=TRUE),] For some reason I keep expecting it to be order( , df1[,2],decreasing=TRUE) So clearly I don't understand what is going on but at least I a lot better off. I may be able to get this graph to work. John, Perhaps it may be helpful to understand that order() does not actually sort() the data. It returns a vector of indices into the data, where those indices are the sorted ordering of the elements in the vector, or in this case, the column. So you want the output of order() to be used within the brackets for the row *indices*, to reflect the ordering of the column (or columns in the case of a multi-level sort) that you wish to use to sort the data frame rows. set.seed(1) x - sample(10) x [1] 3 4 5 7 2 8 9 6 10 1 # sort() actually returns the sorted data sort(x) [1] 1 2 3 4 5 6 7 8 9 10 # order() returns the indices of 'x' in sorted order order(x) [1] 10 5 1 2 3 8 4 6 7 9 # This does the same thing as sort() x[order(x)] [1] 1 2 3 4 5 6 7 8 9 10 set.seed(1) df1 - data.frame(aa = letters[1:10], bb = rnorm(10)) df1 aa bb 1 a -0.6264538 2 b 0.1836433 3 c -0.8356286 4 d 1.5952808 5 e 0.3295078 6 f -0.8204684 7 g 0.4874291 8 h 0.7383247 9 i 0.5757814 10 j -0.3053884 # These are the indices of df1$bb in sorted order order(df1$bb) [1] 3 6 1 10 2 5 7 9 8 4 # Get df1$bb in increasing order df1$bb[order(df1$bb)] [1] -0.8356286 -0.8204684 -0.6264538 -0.3053884 0.1836433 0.3295078 [7] 0.4874291 0.5757814 0.7383247 1.5952808 # Same thing as above sort(df1$bb) [1] -0.8356286 -0.8204684 -0.6264538 -0.3053884 0.1836433 0.3295078 [7] 0.4874291 0.5757814 0.7383247 1.5952808 You can't use the output of sort() to sort the data frame rows, so you need to use order() to get the ordered indices and then use that to extract the data frame rows in the sort order that you desire: df1[order(df1$bb), ] aa bb 3 c -0.8356286 6 f -0.8204684 1 a -0.6264538 10 j -0.3053884 2 b 0.1836433 5 e 0.3295078 7 g 0.4874291 9 i 0.5757814 8 h 0.7383247 4 d 1.5952808 df1[order(df1$bb, decreasing = TRUE), ] aa bb 4 d 1.5952808 8 h 0.7383247 9 i 0.5757814 7 g 0.4874291 5 e 0.3295078 2 b 0.1836433 10 j -0.3053884 1 a -0.6264538 6 f -0.8204684 3 c -0.8356286 Does that help? Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] changes in coxph in survival from older version?
On Wed, 2011-05-11 at 16:11 -0700, Shi, Tao wrote: Hi all, I found that the two different versions of survival packages, namely 2.36-5 vs. 2.36-8 or later, give different results for coxph function. Please see below and the data is attached. The second one was done on Linux, but Windows gave the same results. Could you please let me know which one I should trust? Thanks, In your case, neither. Your data set has 22 events and 17 predictors; the rule of thumb for a reliable Cox model is 10-20 events per predictor which implies no more than 2 for your data set. As a result, the coefficients of your model have very wide confidence intervals, the coef for Male for instance has se of 3.26, meaning the CI goes from 1/26 to 26 times the estimate; i.e., there is no biological meaning to the estimate. Nevertheless, why did coxph give a different answer? The later version 2.36-9 failed to converge (20 iterations) with a final log-likelihood of -19.94, the earlier code converges in 10 iterations to -19.91. In version 2.36-6 an extra check was put into the maximizer for coxph in response to an exceptional data set which caused the routine to fail due to overflow of the exp function; the Newton-Raphson iteration algorithm had made a terrible guess in it's iteration path, which can happen with all NR based search methods. I put a limit on the size the linear predictor in the Cox model of 21. The basic argument is that exp(linear-predictor) = relative risk for a subject, and that there is not much biological meaning for risks to be less than exp(-21) ~ 1/(population of the earh). There is more to the reasoning, interested parties should look at the comments in src/coxsafe.c, a 5 line routine with 25 lines of discussion. I will happily accept input the best value for the constant. I never expected to see a data set with both convergence of the LL and linear predictors larger than +-15. Looking at the fit (older code) round(fit2$linear.predictor, 2) [1] 2.26 0.89 4.96 -19.09 -12.10 1.39 2.82 3.10 [9] 18.57 -25.25 22.94 8.75 5.52 -27.64 14.88 -23.41 [17] 13.70 -28.45 -1.84 10.04 12.62 2.54 6.33 -8.76 [25] 9.68 4.39 2.92 3.51 6.02 -17.24 5.97 This says that, if the model is to be believed, you have several near immortals in the data set. (Everyone else on earth will perish first). Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] separate date and time
I have a combined date and time. I would like to separate them out into two columns so I can do things such as take the mean by time across all dates. meas-runif(435) nTime-seq(1303975800, 1304757000, 1800) nDateT-as.POSIXct(nTime, origin=1970-01-01) mat1-cbind(nDateT,meas) means1- aggregate(mat1$meas, list(nDateT), mean) This doesn't do anything as each day is different, but if I had just the time, it would take the mean outputing 48 values (for each 30 min). Also, sometimes there are missing meas to a specific time. Is there anyway to copy the previous meas if one is missing? - In theory, practice and theory are the same. In practice, they are not - Albert Einstein -- View this message in context: http://r.789695.n4.nabble.com/separate-date-and-time-tp3517571p3517571.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R won't start keeps crashing
This is not very informative. What exactly is crashing? What is your sessionInfo() output? On Thu, May 12, 2011 at 7:57 AM, Bazman76 h_a_patie...@hotmail.com wrote: OK I did a seach for the files and got: .Rdata which is 206KB Canada.Rdata which is 3kB If I click on .Rdata I get the crash. If I click on Canada.Rdata the system starts? Does it? How would I know? also they are stored in different places? Are they? .Rdata is in My documents Canada.RData is in My Documents\vars\vars\data I assume I should delete .Rdata, should I also delete the other file? Where should these files be stored? I think the .RData was a result of my trying to save the workspace. Are there files being saved to the correct location? Thanks for your help -- View this message in context: http://r.789695.n4.nabble.com/R-won-t-start-keeps-crashing-tp3516829p3517115.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. What happens when you try the ?load command from R ala: load(C:/path/to/file) -- === Jon Daily Technician === #!/usr/bin/env outside # It's great, trust me. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mtext text size (cex) doesn't match plot
thanks for reading the manual for me :X 2011/5/12 Prof Brian Ripley rip...@stats.ox.ac.uk: On Wed, 11 May 2011, George Locke wrote: Hi, I am using mtext instead of the ylab argument in some plots because i want to move it away from the numbers in the axis. However, the text in the X axis, for example: par(mar=c(5, 5.5, 4, 2)); plot(data, main=plot name, xlab= 'X axis', ylab=, font=2, cex.lab=1.5, font.lab=2, cex.main=1.8); mtext('Y axis', side=2, cex=1.5, line=4, font=2); This works fine, but if I then set par(mfrow=c(3,2)); the text produced by mtext becomes much larger than the text X axis produced by plot, despite their having identical cex specifications. In this case, the words Y axis become much larger than plot name. Note that without par(mfrow) the size of X axis and Y axis match iff their cex(.lab) arguments match. How can I make mtext produce text that exactly matches the xlab? In my limited experience fiddling around with this problem, the size of the mtext does not depend on par(mfrow), whereas the size of the xlab does, so if there were a formula that relates the actual size of text, Please do read the help! ?mtext says cex: character expansion factor. ‘NULL’ and ‘NA’ are equivalent to ‘1.0’. This is an absolute measure, not scaled by ‘par(cex)’ or by setting ‘par(mfrow)’ or ‘par(mfcol)’. so no 'limited experience fiddling around with this problem' was needed. And see ?par: ‘cex’ A numerical value giving the amount by which plotting text and symbols should be magnified relative to the default. This starts as ‘1’ when a device is opened, and is reset when the layout is changed, e.g. by setting ‘mfrow’. ‘mfcol, mfrow’ A vector of the form ‘c(nr, nc)’. Subsequent figures will be drawn in an ‘nr’-by-‘nc’ array on the device by _columns_ (‘mfcol’), or _rows_ (‘mfrow’), respectively. In a layout with exactly two rows and columns the base value of ‘cex’ is reduced by a factor of 0.83: if there are three or more of either rows or columns, the reduction factor is 0.66. cex argument, and par(mfrow), then I could use that to attenuate the cex argument of mtext. Any solution will do, so long as it maintains the relative sizes of the plot and the three text fields (main, x axis label, y axis label). library(fortunes); fortune(14) applies -- see the posting guide. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple order() data frame question.
I was wondering whether it would be possible to make a method for data.frame with sort(). I think it would be more intuitive than using the complex construction of df[order(df$a),] Is there any reason not to make it? Ivan Le 5/12/2011 15:40, Marc Schwartz a écrit : On May 12, 2011, at 8:09 AM, John Kane wrote: Argh. I knew it was at least partly obvious. I never have been able to read the order() help page and understand what it is saying. THanks very much. By the way, to me it is counter-intuitive that the the command is df1[order(df1[,2],decreasing=TRUE),] For some reason I keep expecting it to be order( , df1[,2],decreasing=TRUE) So clearly I don't understand what is going on but at least I a lot better off. I may be able to get this graph to work. John, Perhaps it may be helpful to understand that order() does not actually sort() the data. It returns a vector of indices into the data, where those indices are the sorted ordering of the elements in the vector, or in this case, the column. So you want the output of order() to be used within the brackets for the row *indices*, to reflect the ordering of the column (or columns in the case of a multi-level sort) that you wish to use to sort the data frame rows. set.seed(1) x- sample(10) x [1] 3 4 5 7 2 8 9 6 10 1 # sort() actually returns the sorted data sort(x) [1] 1 2 3 4 5 6 7 8 9 10 # order() returns the indices of 'x' in sorted order order(x) [1] 10 5 1 2 3 8 4 6 7 9 # This does the same thing as sort() x[order(x)] [1] 1 2 3 4 5 6 7 8 9 10 set.seed(1) df1- data.frame(aa = letters[1:10], bb = rnorm(10)) df1 aa bb 1 a -0.6264538 2 b 0.1836433 3 c -0.8356286 4 d 1.5952808 5 e 0.3295078 6 f -0.8204684 7 g 0.4874291 8 h 0.7383247 9 i 0.5757814 10 j -0.3053884 # These are the indices of df1$bb in sorted order order(df1$bb) [1] 3 6 1 10 2 5 7 9 8 4 # Get df1$bb in increasing order df1$bb[order(df1$bb)] [1] -0.8356286 -0.8204684 -0.6264538 -0.3053884 0.1836433 0.3295078 [7] 0.4874291 0.5757814 0.7383247 1.5952808 # Same thing as above sort(df1$bb) [1] -0.8356286 -0.8204684 -0.6264538 -0.3053884 0.1836433 0.3295078 [7] 0.4874291 0.5757814 0.7383247 1.5952808 You can't use the output of sort() to sort the data frame rows, so you need to use order() to get the ordered indices and then use that to extract the data frame rows in the sort order that you desire: df1[order(df1$bb), ] aa bb 3 c -0.8356286 6 f -0.8204684 1 a -0.6264538 10 j -0.3053884 2 b 0.1836433 5 e 0.3295078 7 g 0.4874291 9 i 0.5757814 8 h 0.7383247 4 d 1.5952808 df1[order(df1$bb, decreasing = TRUE), ] aa bb 4 d 1.5952808 8 h 0.7383247 9 i 0.5757814 7 g 0.4874291 5 e 0.3295078 2 b 0.1836433 10 j -0.3053884 1 a -0.6264538 6 f -0.8204684 3 c -0.8356286 Does that help? Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mtext text size (cex) doesn't match plot
On 2011-05-12 07:16, George Locke wrote: thanks for reading the manual for me :X For a bit more reading, you could check out ?title. You could replace your mtext() calls with title(ylab='Y axis', cex.lab=1.5, line=4, font.lab=2) Peter Ehlers 2011/5/12 Prof Brian Ripleyrip...@stats.ox.ac.uk: On Wed, 11 May 2011, George Locke wrote: Hi, I am using mtext instead of the ylab argument in some plots because i want to move it away from the numbers in the axis. However, the text in the X axis, for example: par(mar=c(5, 5.5, 4, 2)); plot(data, main=plot name, xlab= 'X axis', ylab=, font=2, cex.lab=1.5, font.lab=2, cex.main=1.8); mtext('Y axis', side=2, cex=1.5, line=4, font=2); This works fine, but if I then set par(mfrow=c(3,2)); the text produced by mtext becomes much larger than the text X axis produced by plot, despite their having identical cex specifications. In this case, the words Y axis become much larger than plot name. Note that without par(mfrow) the size of X axis and Y axis match iff their cex(.lab) arguments match. How can I make mtext produce text that exactly matches the xlab? In my limited experience fiddling around with this problem, the size of the mtext does not depend on par(mfrow), whereas the size of the xlab does, so if there were a formula that relates the actual size of text, Please do read the help! ?mtext says cex: character expansion factor. ‘NULL’ and ‘NA’ are equivalent to ‘1.0’. This is an absolute measure, not scaled by ‘par(cex)’ or by setting ‘par(mfrow)’ or ‘par(mfcol)’. so no 'limited experience fiddling around with this problem' was needed. And see ?par: ‘cex’ A numerical value giving the amount by which plotting text and symbols should be magnified relative to the default. This starts as ‘1’ when a device is opened, and is reset when the layout is changed, e.g. by setting ‘mfrow’. ‘mfcol, mfrow’ A vector of the form ‘c(nr, nc)’. Subsequent figures will be drawn in an ‘nr’-by-‘nc’ array on the device by _columns_ (‘mfcol’), or _rows_ (‘mfrow’), respectively. In a layout with exactly two rows and columns the base value of ‘cex’ is reduced by a factor of 0.83: if there are three or more of either rows or columns, the reduction factor is 0.66. cex argument, and par(mfrow), then I could use that to attenuate the cex argument of mtext. Any solution will do, so long as it maintains the relative sizes of the plot and the three text fields (main, x axis label, y axis label). library(fortunes); fortune(14) applies -- see the posting guide. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm and anova
Hi Sara, As the help page for anova.lm says, Specifying a single object gives a sequential analysis of variance table. That most likely also the answer to your second question. The anova function can be used to compare nested models, and this provides the flexibility to test arbitrary hypotheses, including all the ones given by different types of ANOVA tables. You may also find the Anova() function in the car package helpful. Best, Ista On Thu, May 12, 2011 at 2:37 AM, Sara Sjöstedt de Luna sara.de.l...@math.umu.se wrote: Hi! We have run a linear regression model with 3 explanatory variables and get the output below. Does anyone know what type of test the anova model below does and why we get so different result in terms of significant variables by the two tables? Thanks! /Sara summary(model) Call: lm(formula = log(HOBU) ~ Vole1 + Volelag + Year) Residuals: Min 1Q Median 3Q Max -0.757284 -0.166681 0.009478 0.181304 0.692916 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 80.041737 12.018726 6.660 1.40e-07 *** Vole1 0.005521 0.041626 0.133 0.8953 Volelag 0.033966 0.018392 1.847 0.0738 . Year -0.035927 0.006027 -5.961 1.08e-06 *** anova(model) Analysis of Variance Table Response: log(HOBU) Df Sum Sq Mean Sq F value Pr(F) Vole1 1 1.7877 1.7877 13.1772 0.0009486 *** Volelag 1 0.5817 0.5817 4.2878 0.0462831 * Year 1 4.8205 4.8205 35.5323 1.082e-06 *** Residuals 33 4.4769 0.1357 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm and anova
anova uses sequential sums of squares (type 1), summary adjusted sums of squares (type 3) Take for example the first line of each output. In summary this tests whether vole1 is needed ASSUMING volelag and year are already in the model (conclusion would then be: it isn't needed p=.89). Whereas in anova, it's testing do we need vole1 assuming nothing else is in the model (conclusion: vole1 is better than nothing. p=.0009). anova assumes all terms above it are in the model but terms below it are not so volelag assumes vole1 is in the model but not year. You can see how anova changes but summary doesn't by varying the order you put them in. so the final model I would fit here would probably end up being either year+volelag or just year, HTH, Paul -- View this message in context: http://r.789695.n4.nabble.com/lm-and-anova-tp3516748p3517356.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Binomial
Thanks a lot sir. Regards Sarah --- On Thu, 5/12/11, Alexander Engelhardt a...@chaotic-neutral.de wrote: From: Alexander Engelhardt a...@chaotic-neutral.de Subject: Re: [R] Binomial To: Sarah Sanchez sarah_sanche...@yahoo.com Cc: David Winsemius dwinsem...@comcast.net, r-help@r-project.org Date: Thursday, May 12, 2011, 12:53 PM Am 12.05.2011 13:19, schrieb Sarah Sanchez: Dear R helpers, I am raising one query regarding this Binomial thread with the sole intention of learning something more as I understand R forum is an ocean of knowledge. I was going through all the responses, but wondered that original query was about generating Binomial random numbers while what the R code produced so far generates the Bernoulli Random no.s i.e. 0 and 1. True Binomial distribution is nothing but no of Bernoulli trials. As I said I am a moron and don't understand much about Statistics. Just couldn't stop from asking my stupid question. Oh, yes. You can generate one B(20,0.7)-distributed random varible by summing up the like this: pie - 0.7 x - runif(20) x [1] 0.83108099 0.72843379 0.08862017 0.78477878 0.69230873 0.11229410 [7] 0.64483435 0.87748373 0.17448824 0.43549622 0.30374272 0.76274317 [13] 0.34832376 0.20876835 0.85280612 0.93810355 0.65720548 0.05557451 [19] 0.88041390 0.68938009 x - runif(20) pie x [1] FALSE FALSE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE [13] FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE sum(x) [1] 10 You could shorten this to sum(runif(20)0.7) [1] 12 Which would be the same as rbinom(1,20,0.5) [1] 6 or even qbinom(runif(1),20,0.5) [1] 12 Just play around a little, and learn from the help files: ?rbinom Have fun! --- On Thu, 5/12/11, David Winsemiusdwinsem...@comcast.net wrote: From: David Winsemiusdwinsem...@comcast.net Subject: Re: [R] Binomial To: Alexander Engelhardta...@chaotic-neutral.de Cc: r-help@r-project.org, blutackx-jess-...@hotmail.co.uk Date: Thursday, May 12, 2011, 11:08 AM I hope Allan knows this and is just being humorous here, but for the less experienced in the audience ... Choosing a different threshold variable name might be less error prone. `pi` is one of few built-in constants in R and there may be code that depends on that fact. pi [1] 3.141593 He didn't, or better, he forgot. Also, that Allan isn't related to me (I think) :) - Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Asking Favor For the Script of Median Filter
Here is one I wrote for the raster package. It searches a raster layer for NA's and takes the median of the number of non NA adjacent cells determined by neighbor count. You could turn your matrix into a raster to make it work or change the code. Hope you find it useful, Robert neighbor.filter - function(raster.layer,neighbor.count = 3) { require(raster) base.rast - raster.layer count - 1 NA.ind - which(is.na(base.rast[])) median.vals - matrix(NA,length(NA.ind),3) for (j in 1:length(NA.ind)) { row.ind.NA - rowFromCell(base.rast, NA.ind[j]) col.ind.NA - colFromCell(base.rast, NA.ind[j]) row.ind - c(row.ind.NA-1,row.ind.NA,row.ind.NA+1) col.ind - c(col.ind.NA-1,col.ind.NA,col.ind.NA+1) row.ind.check - expand.grid(row.ind,col.ind)[,1] col.ind.check - expand.grid(row.ind,col.ind)[,2] ind.del.1 - c(which(row.ind.check dim(base.rast)[1]),which(row.ind.check 1)) if (length(ind.del.1) 0) { row.ind.check - row.ind.check[-ind.del.1] col.ind.check - col.ind.check[-ind.del.1] } ind.del.2 - c(which(col.ind.check 1),which(col.ind.check dim(base.rast)[2])) if (length(ind.del.2) 0) { row.ind.check - row.ind.check[-ind.del.2] col.ind.check - col.ind.check[-ind.del.2] } if (length(which(base.rast[cellFromRowCol(base.rast, row.ind.check, col.ind.check)] 0)) = neighbor.count) { median.vals[count,c(1:3)] - c(NA.ind[j], median(base.rast[cellFromRowCol(base.rast, row.ind.check, col.ind.check)], na.rm = T), length(which(base.rast[cellFromRowCol(base.rast, row.ind.check, col.ind.check)] 0))) count - count + 1 } } median.vals - median.vals[which(median.vals[,1] 0),] base.rast[median.vals[,1]] - median.vals[,2] return(base.rast) } Robert Leaf, PhD NOAA Narragansett Laboratory -- View this message in context: http://r.789695.n4.nabble.com/Asking-Favor-For-the-Script-of-Median-Filter-tp3409462p3517365.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simple 95% confidence interval for a median
Hi! I have a data set of 86 values that are non-normally distributed (counts). The median value is 10. I want to get an estimate of the 95% confidence interval for this median value. I tried to use a one-sample Wiolcoxin test: wilcox.test(Comps,mu=10,conf.int=TRUE) and got the following output: Wilcoxon signed rank test with continuity correction data: Comps V = 2111, p-value = 0.05846 alternative hypothesis: true location is not equal to 10 95 percent confidence interval: 10.0 17.49993 sample estimates: (pseudo)median 12.50006 I wonder if someone would mind helping me out? What am I doing wrong? What is the '(psuedo)median'? Can I get R to estimate the confidence around the actual median of 10? With thanks, Georgie [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject)
#subject: type III sum of squares - anova() Anova() AnovaM() #R-version: 2.12.2 #Hello everyone, #I am currently evaluating experimental data of a two factor experiment. to illustrate de my problem I will use following #dummy dataset: Factor T1 has 3 levels (A,B,C) and factor T2 has 2 levels E and F. The design is #completly balanced, each factor combinations has 4 replicates. #the dataset looks like this: T1-(c(rep(c(A,B,C),each=8))) T2-(c(rep(rep(c(E,F),each=4),3))) RESPONSE-c(1,2,3,2,2,1,3,2,9,8,8,9,6,5,5,6,5,5,5,6,1,2,3,3) DF-as.data.frame(cbind(T1,T2,RESPONSE)) DF$RESPONSE-as.numeric(DF$RESPONSE) DF T1 T2 RESPONSE 1 A E1 2 A E2 3 A E3 4 A E2 5 A F2 6 A F1 7 A F3 8 A F2 9 B E7 10 B E6 11 B E6 12 B E7 13 B F5 14 B F4 15 B F4 16 B F5 17 C E4 18 C E4 19 C E4 20 C E5 21 C F1 22 C F2 23 C F3 24 C F3 library(biology) replications(RESPONSE ~ T1*T2,data=DF) T1T2 T1:T2 812 4 is.balanced(RESPONSE ~ T1*T2,data=DF) [1] TRUE #Now I would like to know whether T1, T2 or T1*T2 have a significant effect on RESPONSE. As far as I know, the #theory says that I should use a type III sum of squares, but the theory also says that if the design is completely #balanced, there is no difference between type I,II or III sum of squares. #so I first fit a linear model: my.anov-lm(RESPONSE~T1+T2+T1:T2) #then I do a normal Anova anova(my.anov) Analysis of Variance Table Response: RESPONSE Df Sum Sq Mean Sq F valuePr(F) T1 2 103.0 51.500 97.579 2.183e-10 *** T2 1 24.0 24.000 45.474 2.550e-06 *** T1:T2 2 12.0 6.000 11.368 0.000642 *** Residuals 189.5 0.528 #When I do the same with the Anova() function from the car package I get the same result Anova(my.anov) Anova Table (Type II tests) Response: RESPONSE Sum Sq Df F valuePr(F) T1 103.0 2 97.579 2.183e-10 *** T2 24.0 1 45.474 2.550e-06 *** T1:T2 12.0 2 11.368 0.000642 *** Residuals9.5 18 #(type two sees to be the default and type=I produces an error (why?)) #yet, when I specify type=III it gives me something completely different: Anova(my.anov,type=III) Anova Table (Type III tests) Response: RESPONSE Sum Sq Df F valuePr(F) (Intercept) 16.0 1 30.316 3.148e-05 *** T184.5 2 80.053 1.100e-09 *** T2 0.0 1 0.000 1.00 T1:T2 12.0 2 11.368 0.000642 *** Residuals 9.5 18 #an the AnovaM() function from the biology package does the same for type I and II and produces the following #result: library(biology) AnovaM(my.anov,type=III) Df Sum Sq Mean Sq F value Pr(F) T1 2 84.5 42.250 80.053 1.10e-09 *** T2 1 24.0 24.000 45.474 2.55e-06 *** T1:T22 12.0 6.000 11.368 0.000642 *** Residuals 189.5 0.528 #Is type 3 the Type I should use and why do the results differ if the design is balanced? I am really confused, it would #be great if someone could help me out! #Thanks a lot for your help! #/Fabian #University of Gothenburg [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Scale time series in a way that 90% of the data is in the -0.-9/ +0.9 range
Hello, How can i scale my time series in a way that 90% of the data is in the -0.-9/ +0.9 range? My approach is to first build a clean vector without those 10% far away from the mean require(outliers) y-rep(c(1,1,1,1,1,9),10) yc-y ycc-length(y)*0.1 for(j in 1:ycc) { cat(Remove,j) yc-rm.outlier(yc) } and then do my scaling based on the cleaned data for(k in 1:length(y)) { y[k]-(((y[k]-min(yc))/(max(yc)-min(yc)))*1.8)-0.9 } This works fine for the first three loops, but then strangely crashes : ( Remove 1Remove 2Remove 3Error in if (xor(((max(x, na.rm = TRUE) - mean(x, na.rm = TRUE)) (mean(x, : missing value where TRUE/FALSE needed In addition: Warning messages: 1: In max(x, na.rm = TRUE) : no non-missing arguments to max; returning -Inf 2: In min(x, na.rm = TRUE) : no non-missing arguments to min; returning Inf Any ideas for me? Thanks in advance, Mr. Q __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package update
To run RStudio as root on Ubuntu you would just do: sudo rstudio The packages in /usr/lib/R/library are the ones that came with the base install of R. J.J. Allaire -- View this message in context: http://r.789695.n4.nabble.com/package-update-tp3507479p3517539.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about glmnet
I believe you can in this sense: use model.matrix to create X for glmnet(X,y,...). However, when dropping variables this will drop the indicators individually, not per factor, which may not be what you are looking for. Good luck, David Katz Axel Urbiz wrote: Hi, Is it possible to include factor variables as model inputs using this package? I'm quite sure it is not possible, but would like to double check. Thanks, Axel. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/Question-about-glmnet-tp3006439p3517635.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R won't start keeps crashing
http://r.789695.n4.nabble.com/file/n3517669/R_crash.jpg OK when I click on the .RData file I get the screen above. Also when I start R from the desktop icon or from select if from programs I get the same result. The warning message is in focus and I can not move the focus to the GUI. When I either click to dismiss the message of click to close it the whole session shuts down. So I can not enter the commands you sugested into this corrupt version. However, when I click on the Canada.Rdata the R session starts and seems to function normally. Here are the results of the functions you requested. Not sure if they will help given that this version is workfing? sessionInfo() R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 [2] LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base load(C:/path/to/file) Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection In addition: Warning message: In readChar(con, 5L, useBytes = TRUE) : cannot open compressed file 'C:/path/to/file', probable reason 'No such file or directory' Sorry if I gave you the wrong information previously, just let me know what you want. -- View this message in context: http://r.789695.n4.nabble.com/R-won-t-start-keeps-crashing-tp3516829p3517669.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Errors and line numbers in scripts?
Is it possible to get R to report the line number of an error when a script is called with source()? I found the following post from 2009, but it's not clear to me if this ever made it into the release version: ws wrote: * Is there a way to have R return the line number in a script when it errors out? ** ** I call my script like: ** ** $ R --vanilla script.R output.txt ** ** I seem to remember a long discussion about this at some point, but I can't ** remember the outcome. ** ** * The current development version returns much more information about error locations. I don't know if it will handle this case (R doesn't get told the filename in case of input redirection, for example). Generally the goal is to report error lines during interactive use, batch use is assumed to already be debugged. Duncan Murdoch Thanks. - Elliot [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
Hi Fabian, You my find my discussion of types of SS helpful. My website has been down for some time, but you can retrieve it from http://psychology.okstate.edu/faculty/jgrice/psyc5314/SS_types.pdf among other places. Best, Ista On Thu, May 12, 2011 at 10:33 AM, Fabian fabian_ro...@gmx.de wrote: #subject: type III sum of squares - anova() Anova() AnovaM() #R-version: 2.12.2 #Hello everyone, #I am currently evaluating experimental data of a two factor experiment. to illustrate de my problem I will use following #dummy dataset: Factor T1 has 3 levels (A,B,C) and factor T2 has 2 levels E and F. The design is #completly balanced, each factor combinations has 4 replicates. #the dataset looks like this: T1-(c(rep(c(A,B,C),each=8))) T2-(c(rep(rep(c(E,F),each=4),3))) RESPONSE-c(1,2,3,2,2,1,3,2,9,8,8,9,6,5,5,6,5,5,5,6,1,2,3,3) DF-as.data.frame(cbind(T1,T2,RESPONSE)) DF$RESPONSE-as.numeric(DF$RESPONSE) DF T1 T2 RESPONSE 1 A E 1 2 A E 2 3 A E 3 4 A E 2 5 A F 2 6 A F 1 7 A F 3 8 A F 2 9 B E 7 10 B E 6 11 B E 6 12 B E 7 13 B F 5 14 B F 4 15 B F 4 16 B F 5 17 C E 4 18 C E 4 19 C E 4 20 C E 5 21 C F 1 22 C F 2 23 C F 3 24 C F 3 library(biology) replications(RESPONSE ~ T1*T2,data=DF) T1 T2 T1:T2 8 12 4 is.balanced(RESPONSE ~ T1*T2,data=DF) [1] TRUE #Now I would like to know whether T1, T2 or T1*T2 have a significant effect on RESPONSE. As far as I know, the #theory says that I should use a type III sum of squares, but the theory also says that if the design is completely #balanced, there is no difference between type I,II or III sum of squares. #so I first fit a linear model: my.anov-lm(RESPONSE~T1+T2+T1:T2) #then I do a normal Anova anova(my.anov) Analysis of Variance Table Response: RESPONSE Df Sum Sq Mean Sq F value Pr(F) T1 2 103.0 51.500 97.579 2.183e-10 *** T2 1 24.0 24.000 45.474 2.550e-06 *** T1:T2 2 12.0 6.000 11.368 0.000642 *** Residuals 18 9.5 0.528 #When I do the same with the Anova() function from the car package I get the same result Anova(my.anov) Anova Table (Type II tests) Response: RESPONSE Sum Sq Df F value Pr(F) T1 103.0 2 97.579 2.183e-10 *** T2 24.0 1 45.474 2.550e-06 *** T1:T2 12.0 2 11.368 0.000642 *** Residuals 9.5 18 #(type two sees to be the default and type=I produces an error (why?)) #yet, when I specify type=III it gives me something completely different: Anova(my.anov,type=III) Anova Table (Type III tests) Response: RESPONSE Sum Sq Df F value Pr(F) (Intercept) 16.0 1 30.316 3.148e-05 *** T1 84.5 2 80.053 1.100e-09 *** T2 0.0 1 0.000 1.00 T1:T2 12.0 2 11.368 0.000642 *** Residuals 9.5 18 #an the AnovaM() function from the biology package does the same for type I and II and produces the following #result: library(biology) AnovaM(my.anov,type=III) Df Sum Sq Mean Sq F value Pr(F) T1 2 84.5 42.250 80.053 1.10e-09 *** T2 1 24.0 24.000 45.474 2.55e-06 *** T1:T2 2 12.0 6.000 11.368 0.000642 *** Residuals 18 9.5 0.528 #Is type 3 the Type I should use and why do the results differ if the design is balanced? I am really confused, it would #be great if someone could help me out! #Thanks a lot for your help! #/Fabian #University of Gothenburg [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Errors and line numbers in scripts?
On 12/05/2011 11:02 AM, Elliot Joel Bernstein wrote: Is it possible to get R to report the line number of an error when a script is called with source()? I found the following post from 2009, but it's not clear to me if this ever made it into the release version: It does so for parse errors. It doesn't record lines of statements at the top level of a script for run-time errors, but if you define a function in your script, and call it at a different line, and an error occurs internally, then R will know which line of the function triggered the error. If you use options(error=recover) you'll get a display something like this: source(c:/temp/test.R) Error in f(0) : an error in f Enter a frame number, or 0 to exit 1: source(c:/temp/test.R) 2: eval.with.vis(ei, envir) 3: eval.with.vis(expr, envir, enclos) 4: g() 5: test.R#6: f(0) The call to g() was in the test.R script but was not recorded. However, the function g contains a call to f(0), and that one was recorded as being at line 6 of the file. Duncan Murdoch Here's the test.R file I used: -- f - function(x) { stop(an error in f) } g - function() { f(0) } g() ws wrote: * Is there a way to have R return the line number in a script when it errors out? ** ** I call my script like: ** ** $ R --vanilla script.R output.txt ** ** I seem to remember a long discussion about this at some point, but I can't ** remember the outcome. ** ** * The current development version returns much more information about error locations. I don't know if it will handle this case (R doesn't get told the filename in case of input redirection, for example). Generally the goal is to report error lines during interactive use, batch use is assumed to already be debugged. Duncan Murdoch Thanks. - Elliot [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple order() data frame question.
With data.table, the following is routine : DT[order(a)] # ascending DT[order(-a)] # descending, if a is numeric DT[a5,sum(z),by=c][order(-V1)] # sum of z group by c, just where a5, then show me the largest first DT[order(-a,b)] # order by a descending then by b ascending, if a and b are both numeric It avoids peppering your code with $, and becomes quite natural after a short while; especially compound queries such as the 3rd example. Matthew http://datatable.r-forge.r-project.org/ Ivan Calandra ivan.calan...@uni-hamburg.de wrote in message news:4dcbec8b.6040...@uni-hamburg.de... I was wondering whether it would be possible to make a method for data.frame with sort(). I think it would be more intuitive than using the complex construction of df[order(df$a),] Is there any reason not to make it? Ivan Le 5/12/2011 15:40, Marc Schwartz a écrit : On May 12, 2011, at 8:09 AM, John Kane wrote: Argh. I knew it was at least partly obvious. I never have been able to read the order() help page and understand what it is saying. THanks very much. By the way, to me it is counter-intuitive that the the command is df1[order(df1[,2],decreasing=TRUE),] For some reason I keep expecting it to be order( , df1[,2],decreasing=TRUE) So clearly I don't understand what is going on but at least I a lot better off. I may be able to get this graph to work. John, Perhaps it may be helpful to understand that order() does not actually sort() the data. It returns a vector of indices into the data, where those indices are the sorted ordering of the elements in the vector, or in this case, the column. So you want the output of order() to be used within the brackets for the row *indices*, to reflect the ordering of the column (or columns in the case of a multi-level sort) that you wish to use to sort the data frame rows. set.seed(1) x- sample(10) x [1] 3 4 5 7 2 8 9 6 10 1 # sort() actually returns the sorted data sort(x) [1] 1 2 3 4 5 6 7 8 9 10 # order() returns the indices of 'x' in sorted order order(x) [1] 10 5 1 2 3 8 4 6 7 9 # This does the same thing as sort() x[order(x)] [1] 1 2 3 4 5 6 7 8 9 10 set.seed(1) df1- data.frame(aa = letters[1:10], bb = rnorm(10)) df1 aa bb 1 a -0.6264538 2 b 0.1836433 3 c -0.8356286 4 d 1.5952808 5 e 0.3295078 6 f -0.8204684 7 g 0.4874291 8 h 0.7383247 9 i 0.5757814 10 j -0.3053884 # These are the indices of df1$bb in sorted order order(df1$bb) [1] 3 6 1 10 2 5 7 9 8 4 # Get df1$bb in increasing order df1$bb[order(df1$bb)] [1] -0.8356286 -0.8204684 -0.6264538 -0.3053884 0.1836433 0.3295078 [7] 0.4874291 0.5757814 0.7383247 1.5952808 # Same thing as above sort(df1$bb) [1] -0.8356286 -0.8204684 -0.6264538 -0.3053884 0.1836433 0.3295078 [7] 0.4874291 0.5757814 0.7383247 1.5952808 You can't use the output of sort() to sort the data frame rows, so you need to use order() to get the ordered indices and then use that to extract the data frame rows in the sort order that you desire: df1[order(df1$bb), ] aa bb 3 c -0.8356286 6 f -0.8204684 1 a -0.6264538 10 j -0.3053884 2 b 0.1836433 5 e 0.3295078 7 g 0.4874291 9 i 0.5757814 8 h 0.7383247 4 d 1.5952808 df1[order(df1$bb, decreasing = TRUE), ] aa bb 4 d 1.5952808 8 h 0.7383247 9 i 0.5757814 7 g 0.4874291 5 e 0.3295078 2 b 0.1836433 10 j -0.3053884 1 a -0.6264538 6 f -0.8204684 3 c -0.8356286 Does that help? Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] separate date and time
Schatzi adele_thompson at cargill.com writes: I have a combined date and time. I would like to separate them out into two columns so I can do things such as take the mean by time across all dates. meas-runif(435) nTime-seq(1303975800, 1304757000, 1800) nDateT-as.POSIXct(nTime, origin=1970-01-01) mat1-cbind(nDateT,meas) means1- aggregate(mat1$meas, list(nDateT), mean) This doesn't do anything as each day is different, but if I had just the time, it would take the mean outputing 48 values (for each 30 min). Also, sometimes there are missing meas to a specific time. Is there anyway to copy the previous meas if one is missing? - In theory, practice and theory are the same. In practice, they are not - Albert Einstein -- View this message in context: http://r.789695.n4.nabble.com/separate-date-and-time-tp3517571p3517571.html Sent from the R help mailing list archive at Nabble.com. Not sure if this is what you want, but you can use substr to split nDateT into date and time, and then use aggregate() in the time column in df1. meas-runif(435) nTime-seq(1303975800, 1304757000, 1800) nDateT-as.POSIXct(nTime, origin=1970-01-01) date - substr(nDateT, 1, 10) time - substr(nDateT, 12, 19) df1 - data.frame(date, time, meas) means1- aggregate(df1$meas, list(df1$time), mean) HTH, Ken __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple order() data frame question.
Thanks Matthew, I had data.table installed but totally forgot about it. I've only used it once or twice and, IIRC, that was last year. I remember thinking at the time that it was a very handy package but lack of need for this sort of thinglet me forget it. --- On Thu, 5/12/11, Matthew Dowle mdo...@mdowle.plus.com wrote: From: Matthew Dowle mdo...@mdowle.plus.com Subject: Re: [R] Simple order() data frame question. To: r-h...@stat.math.ethz.ch Received: Thursday, May 12, 2011, 11:23 AM With data.table, the following is routine : DT[order(a)] # ascending DT[order(-a)] # descending, if a is numeric DT[a5,sum(z),by=c][order(-V1)] # sum of z group by c, just where a5, then show me the largest first DT[order(-a,b)] # order by a descending then by b ascending, if a and b are both numeric It avoids peppering your code with $, and becomes quite natural after a short while; especially compound queries such as the 3rd example. Matthew http://datatable.r-forge.r-project.org/ Ivan Calandra ivan.calan...@uni-hamburg.de wrote in message news:4dcbec8b.6040...@uni-hamburg.de... I was wondering whether it would be possible to make a method for data.frame with sort(). I think it would be more intuitive than using the complex construction of df[order(df$a),] Is there any reason not to make it? Ivan Le 5/12/2011 15:40, Marc Schwartz a écrit : On May 12, 2011, at 8:09 AM, John Kane wrote: Argh. I knew it was at least partly obvious. I never have been able to read the order() help page and understand what it is saying. THanks very much. By the way, to me it is counter-intuitive that the the command is df1[order(df1[,2],decreasing=TRUE),] For some reason I keep expecting it to be order( , df1[,2],decreasing=TRUE) So clearly I don't understand what is going on but at least I a lot better off. I may be able to get this graph to work. John, Perhaps it may be helpful to understand that order() does not actually sort() the data. It returns a vector of indices into the data, where those indices are the sorted ordering of the elements in the vector, or in this case, the column. So you want the output of order() to be used within the brackets for the row *indices*, to reflect the ordering of the column (or columns in the case of a multi-level sort) that you wish to use to sort the data frame rows. set.seed(1) x- sample(10) x [1] 3 4 5 7 2 8 9 6 10 1 # sort() actually returns the sorted data sort(x) [1] 1 2 3 4 5 6 7 8 9 10 # order() returns the indices of 'x' in sorted order order(x) [1] 10 5 1 2 3 8 4 6 7 9 # This does the same thing as sort() x[order(x)] [1] 1 2 3 4 5 6 7 8 9 10 set.seed(1) df1- data.frame(aa = letters[1:10], bb = rnorm(10)) df1 aa bb 1 a -0.6264538 2 b 0.1836433 3 c -0.8356286 4 d 1.5952808 5 e 0.3295078 6 f -0.8204684 7 g 0.4874291 8 h 0.7383247 9 i 0.5757814 10 j -0.3053884 # These are the indices of df1$bb in sorted order order(df1$bb) [1] 3 6 1 10 2 5 7 9 8 4 # Get df1$bb in increasing order df1$bb[order(df1$bb)] [1] -0.8356286 -0.8204684 -0.6264538 -0.3053884 0.1836433 0.3295078 [7] 0.4874291 0.5757814 0.7383247 1.5952808 # Same thing as above sort(df1$bb) [1] -0.8356286 -0.8204684 -0.6264538 -0.3053884 0.1836433 0.3295078 [7] 0.4874291 0.5757814 0.7383247 1.5952808 You can't use the output of sort() to sort the data frame rows, so you need to use order() to get the ordered indices and then use that to extract the data frame rows in the sort order that you desire: df1[order(df1$bb), ] aa bb 3 c -0.8356286 6 f -0.8204684 1 a -0.6264538 10 j -0.3053884 2 b 0.1836433 5 e 0.3295078 7 g 0.4874291 9 i 0.5757814 8 h 0.7383247 4 d 1.5952808 df1[order(df1$bb, decreasing = TRUE), ] aa bb 4 d 1.5952808 8 h 0.7383247 9 i 0.5757814 7 g 0.4874291 5 e 0.3295078 2 b 0.1836433 10 j -0.3053884 1 a -0.6264538 6 f -0.8204684 3 c -0.8356286 Does that help? Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php -Inline
Re: [R] R won't start keeps crashing
I have some suggestions inline below. My biggest suggestion would be to read the help files that came with R, especially the section Invoking R in An Introduction to R. On Thu, May 12, 2011 at 10:24 AM, Bazman76 h_a_patie...@hotmail.com wrote: http://r.789695.n4.nabble.com/file/n3517669/R_crash.jpg OK when I click on the .RData file I get the screen above. Also when I start R from the desktop icon or from select if from programs I get the same result. This may mean that you answered yes to the exit prompt asking you to save your workspace, especially if you didn't attempt to save a file .Rdata The warning message is in focus and I can not move the focus to the GUI. When I either click to dismiss the message of click to close it the whole session shuts down. So I can not enter the commands you sugested into this corrupt version. However, when I click on the Canada.Rdata the R session starts and seems to function normally. Here are the results of the functions you requested. Not sure if they will help given that this version is workfing? sessionInfo() R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 [2] LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base load(C:/path/to/file) Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection In addition: Warning message: In readChar(con, 5L, useBytes = TRUE) : cannot open compressed file 'C:/path/to/file', probable reason 'No such file or directory' Replace path/to/file with the path to your file. Sorry if I gave you the wrong information previously, just let me know what you want. -- View this message in context: http://r.789695.n4.nabble.com/R-won-t-start-keeps-crashing-tp3516829p3517669.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- === Jon Daily Technician === #!/usr/bin/env outside # It's great, trust me. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R won't start keeps crashing
On May 12, 2011, at 10:24 AM, Bazman76 wrote: http://r.789695.n4.nabble.com/file/n3517669/R_crash.jpg OK when I click on the .RData file I get the screen above. Also when I start R from the desktop icon or from select if from programs I get the same result. Why do you even have this file around anymore? You have been told that it is corrupt and that trashing it is the way to go forward. (I suppose you could the error message regarding the missing `vars` package and see if that restores civil order, but if .Rdata doesn't have useful information that would be difficult to rebuild then just trash it.) -- David. The warning message is in focus and I can not move the focus to the GUI. When I either click to dismiss the message of click to close it the whole session shuts down. So I can not enter the commands you sugested into this corrupt version. However, when I click on the Canada.Rdata the R session starts and seems to function normally. Here are the results of the functions you requested. Not sure if they will help given that this version is workfing? sessionInfo() R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 [2] LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base load(C:/path/to/file) Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection In addition: Warning message: In readChar(con, 5L, useBytes = TRUE) : cannot open compressed file 'C:/path/to/file', probable reason 'No such file or directory' Sorry if I gave you the wrong information previously, just let me know what you want. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] separate date and time
That is wonderful. Thank you. Adele Ken Takagi wrote: Schatzi adele_thompson at cargill.com writes: I have a combined date and time. I would like to separate them out into two columns so I can do things such as take the mean by time across all dates. meas-runif(435) nTime-seq(1303975800, 1304757000, 1800) nDateT-as.POSIXct(nTime, origin=1970-01-01) mat1-cbind(nDateT,meas) means1- aggregate(mat1$meas, list(nDateT), mean) This doesn't do anything as each day is different, but if I had just the time, it would take the mean outputing 48 values (for each 30 min). Also, sometimes there are missing meas to a specific time. Is there anyway to copy the previous meas if one is missing? - In theory, practice and theory are the same. In practice, they are not - Albert Einstein -- View this message in context: http://r.789695.n4.nabble.com/separate-date-and-time-tp3517571p3517571.html Sent from the R help mailing list archive at Nabble.com. Not sure if this is what you want, but you can use substr to split nDateT into date and time, and then use aggregate() in the time column in df1. meas-runif(435) nTime-seq(1303975800, 1304757000, 1800) nDateT-as.POSIXct(nTime, origin=1970-01-01) date - substr(nDateT, 1, 10) time - substr(nDateT, 12, 19) df1 - data.frame(date, time, meas) means1- aggregate(df1$meas, list(df1$time), mean) HTH, Ken __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - In theory, practice and theory are the same. In practice, they are not - Albert Einstein -- View this message in context: http://r.789695.n4.nabble.com/separate-date-and-time-tp3517571p3517999.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem converting character to dates
Hi all, I've searched this problem and still I can't understand my results, so here goes: I have some time series I've imported from excel with the dates in text format from excel. Data was imported with RODBC, sqlQuery() function. I have these dates: adates [1] 01/2008 02/2008 03/2008 04/2008 05/2008 06/2008 07/2008 [8] 08/2008 09/2008 10/2008 11/2008 12/2008 13/2008 14/2008 I want the format week/year, so I do: as.Date(adates,format=c(%W/%y)) and get [1] 2020-05-12 2020-05-12 2020-05-12 2020-05-12 2020-05-12 [6] 2020-05-12 2020-05-12 2020-05-12 2020-05-12 2020-05-12 everything is equal to this: 2020-this month-today if I use strptime(dates, %W/%y) it's the same. Can you explain why this happens and how to solve it? Thanks in advance Assu -- View this message in context: http://r.789695.n4.nabble.com/problem-converting-character-to-dates-tp3517918p3517918.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] log transformation and mean question
I have question about log2 transformation and performing mean on log2 data. I am doing analysis for ELISA data. the OD values and the concentration values for the standards were log2 transformed before performing the lm. the OD values for samples were log2 transformed and coefficients of lm were applied to get the log2 concentration values. I then backtransformed these log2 concentrations and the trouble started. when i take the mean of log2 concentrations the value is different than the backtransformed concentrations. 100+1000/2 [1] 600 2^((log2(100)+log2(1000))/2) [1] 316.2278 What I am doing wrong to get the different values -- View this message in context: http://r.789695.n4.nabble.com/log-transformation-and-mean-question-tp3517825p3517825.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with mediation
Hello! I have problem with mediation analysis. I can do it with function mediate, when I have one mediator. But how I can do it if I have one independent variable and one dependent variable but 4 mediators? I have try function mediations, but it dosen't work. If I use mediate 4 times, each for every mediator, is it same? I want to know what is total mediate effect for 4 mediators. t.Mete __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Change font size in Windows
My day for dumb questions. How do I increase the type size in the Rgui console in Windows? (R-2.13.0, Windows 7) It looked to me that I just needed to change the font spec in Rconsole but that does not seem to be working. The R FAQ for Windows has a reference in Q3.4 to changing fonts, (Q5.2), but I don't see anything relevant there. Rconsole originally was: font = TT Courier New points = 10 style = normal # Style can be normal, bold, italic I changed this to points=14 and then in desperation to points=18 with no effect. I have both restarted R and done a complete reboot to no avail. What am I missing? Any advice would be most gratefully received === sessionInfo() R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 [4] LC_NUMERIC=CLC_TIME=English_Canada.1252 attached base packages: [1] grDevices datasets splines graphics stats tcltk utils methods base other attached packages: [1] svSocket_0.9-51 TinnR_1.0.3 R2HTML_2.2 Hmisc_3.8-3 survival_2.36-9 loaded via a namespace (and not attached): [1] cluster_1.13.3 grid_2.13.0 lattice_0.19-26 svMisc_0.9-61 tools_2.13.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm and anova
On May 12, 2011, at 15:30 , Paul Chatfield wrote: anova uses sequential sums of squares (type 1), Yes. summary adjusted sums of squares (type 3) No. Type III SS is a considerably stranger beast. summary() looks at the s.e. of individual coefficients. For 1 DF effects, this is often equivalent to Type II tests (not III, except when they happens to be equal), except when looking at main effect terms in the presence of interactions (in which case things get parametrization-dependent.) -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Saving misclassified records into dataframe within a loop
Greetings R world, I know some version of the this question has been asked before, but i need to save the output of a loop into a data frame to eventually be written to a postgres data base with dbWriteTable. Some background. I have developed classifications models to help identify problem accounts. The logic is this, if the model classifies the record as including variable X and it turns out that record does not have X then it should be reviewed(ie i need the row number/ID saved to a database). Generally i want to look at the misclassified records. This is a little hack i know, anyone got a better idea please let me know. Here is an example library(rpart) # grow tree fit - rpart(Kyphosis ~ Age + Number + Start, method=class, data=kyphosis) #predict prediction-predict(fit, kyphosis) #misclassification index function predict.function - function(x){ for (i in 1:length(kyphosis$Kyphosis)) { #the idea is that if the record is absent but the prediction is otherwise then show me that record if (((kyphosis$Kyphosis[i]==absent)==(prediction[i,1]==1)) == 0 ){ #THIS WORKS print( row.names(kyphosis[c(i),])) } } } predict.function(x) Now my issue is that i want to save these id to a data.frame so i can later save them to a database. This this an incorrect approach. Can I save each id to the postgres instance as it is found. i have a ignorant fear of lapply, but it seems it may hold the key. Ive tried predict.function - function(x){ results-as.data.frame(1) for (i in 1:length(kyphosis$Kyphosis)) { #the idea is that if the record is absent but the prediction is otherwise then show me that record if (((kyphosis$Kyphosis[i]==absent)==(prediction[i,1]==1)) == 0 ){ #THIS WORKS results[i,]- as.data.frame(row.names(kyphosis[c(i),])) } } } this does not work. results object does not get saved. Any Help would be greatly appreciated. Thanks John Dennison [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple 95% confidence interval for a median
Contrary to the commonly held assumption, the Wilcoxin test does not deal with medians in general. There are some specific cases/assumptions where the test/interval would apply to the median, if I remember correctly the assumptions include that the population distribution is symmetric and the only alternatives considered are shifts of the distribution (both assumptions that go contrary to what I would believe in most situations where I would want to use the Wilcoxin test). If you want an actual confidence interval on the true meadian, then you either need to make some assumptions about the distribution that the data comes from, or use a tool like the bootstrap. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Georgina Imberger Sent: Thursday, May 12, 2011 7:36 AM To: r-help@r-project.org Subject: [R] Simple 95% confidence interval for a median Hi! I have a data set of 86 values that are non-normally distributed (counts). The median value is 10. I want to get an estimate of the 95% confidence interval for this median value. I tried to use a one-sample Wiolcoxin test: wilcox.test(Comps,mu=10,conf.int=TRUE) and got the following output: Wilcoxon signed rank test with continuity correction data: Comps V = 2111, p-value = 0.05846 alternative hypothesis: true location is not equal to 10 95 percent confidence interval: 10.0 17.49993 sample estimates: (pseudo)median 12.50006 I wonder if someone would mind helping me out? What am I doing wrong? What is the '(psuedo)median'? Can I get R to estimate the confidence around the actual median of 10? With thanks, Georgie [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] log transformation and mean question
On 12-May-11 15:15:00, 1Rnwb wrote: I have question about log2 transformation and performing mean on log2 data. I am doing analysis for ELISA data. the OD values and the concentration values for the standards were log2 transformed before performing the lm. the OD values for samples were log2 transformed and coefficients of lm were applied to get the log2 concentration values. I then backtransformed these log2 concentrations and the trouble started. when i take the mean of log2 concentrations the value is different than the backtransformed concentrations. 100+1000/2 [1] 600 2^( ( log2(100)+log2(1000) )/2 ) [1] 316.2278 What I am doing wrong to get the different values Apart from the fact that I think your first line should be (100+1000)/2 # [1] 550 you are doing nothing whatever wrong! The difference is an inevitable result of the fact that, for any set of positive numbers X = c(x1,x2,...,xn), not all equal, mean(log(X)) log(mean(X)) This is because the curve of y = log(x) lies below the tangent to the curve at any given point. If that point is mean(X), and the tangent is y = a + b*x, then mean(log(X)) mean(a + b*X) = a + b*mean(X) = log(mean(X)) since y = a + b*x is tangent to y = log(x) at x = mean(X). This is a special case of a general result called Jensen's Inequality. Your second line is 2^mean(log2(X)) 2^log2(mean(X)) = mean(X). where X = c(100,1000). Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 12-May-11 Time: 17:37:45 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] group length
Hi I have four groups y1=c(1.214,1.180,1.199) y2=c(1.614,1.710,1.867,1.479) y3=c(1.361,1.270,1.375,1.299) y4=c(1.459,1.335) Is there a function that can give me the length for each, like the made up example below? function(length(y1:y2) [1] 3 4 4 2 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] group length
require(plyr) laply(list(y1, y2, y3, y4), length) Scott On Thursday, May 12, 2011 at 11:50 AM, Asan Ramzan wrote: Hi I have four groups y1=c(1.214,1.180,1.199) y2=c(1.614,1.710,1.867,1.479) y3=c(1.361,1.270,1.375,1.299) y4=c(1.459,1.335) Is there a function that can give me the length for each, like the made up example below? function(length(y1:y2) [1] 3 4 4 2 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] group length
sapply... y1=c(1.214,1.180,1.199) y2=c(1.614,1.710,1.867,1.479) y3=c(1.361,1.270,1.375,1.299) y4=c(1.459,1.335) sapply(list(y1,y2,y3,y4), length) [1] 3 4 4 2 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Asan Ramzan Sent: Thursday, May 12, 2011 12:50 PM To: r-help@r-project.org Subject: [R] group length Hi I have four groups y1=c(1.214,1.180,1.199) y2=c(1.614,1.710,1.867,1.479) y3=c(1.361,1.270,1.375,1.299) y4=c(1.459,1.335) Is there a function that can give me the length for each, like the made up example below? function(length(y1:y2) [1] 3 4 4 2 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R won't start keeps crashing
I will read the intor to R. When closing I got a pop up asking me if I wanted to save the workspace I just clicked yes? here is what I got load(C:/Documents and Settings/Hugh/My Documents/vars/vars/data) Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection In addition: Warning message: In readChar(con, 5L, useBytes = TRUE) : cannot open compressed file 'C:/Documents and Settings/Hugh/My Documents/vars/vars/data', probable reason 'Permission denied' -- View this message in context: http://r.789695.n4.nabble.com/R-won-t-start-keeps-crashing-tp3516829p3518035.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R won't start keeps crashing
will delete it, just wanted to try and sort out the bug -- View this message in context: http://r.789695.n4.nabble.com/R-won-t-start-keeps-crashing-tp3516829p3518036.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Survival Rate Estimates
Dear List, Is there an automated way to use the survival package to generate survival rate estimates and their standard errors? To be clear, *not *the survivorship estimates (which are cumulative), but the survival *rate * estimates... Thank you in advance for any help. Best, Brian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change font size in Windows
One simple way: Run R (the gui version) Click on the Edit menu Click on the GUI Preferences item. Select the font, size, style, colors, etc. that you want. If you click on Save then these become the new default. If you click on Apply, but don't save then they will last that session but not be the new defaults. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of John Kane Sent: Thursday, May 12, 2011 10:25 AM To: R R-help Subject: [R] Change font size in Windows My day for dumb questions. How do I increase the type size in the Rgui console in Windows? (R-2.13.0, Windows 7) It looked to me that I just needed to change the font spec in Rconsole but that does not seem to be working. The R FAQ for Windows has a reference in Q3.4 to changing fonts, (Q5.2), but I don't see anything relevant there. Rconsole originally was: font = TT Courier New points = 10 style = normal # Style can be normal, bold, italic I changed this to points=14 and then in desperation to points=18 with no effect. I have both restarted R and done a complete reboot to no avail. What am I missing? Any advice would be most gratefully received === sessionInfo() R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 [4] LC_NUMERIC=CLC_TIME=English_Canada.1252 attached base packages: [1] grDevices datasets splines graphics stats tcltk utils methods base other attached packages: [1] svSocket_0.9-51 TinnR_1.0.3 R2HTML_2.2 Hmisc_3.8-3 survival_2.36-9 loaded via a namespace (and not attached): [1] cluster_1.13.3 grid_2.13.0 lattice_0.19-26 svMisc_0.9-61 tools_2.13.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change font size in Windows
On 12.05.2011 18:25, John Kane wrote: My day for dumb questions. How do I increase the type size in the Rgui console in Windows? (R-2.13.0, Windows 7) It looked to me that I just needed to change the font spec in Rconsole but that does not seem to be working. The R FAQ for Windows has a reference in Q3.4 to changing fonts, (Q5.2), but I don't see anything relevant there. Rconsole originally was: font = TT Courier New points = 10 style = normal # Style can be normal, bold, italic I changed this to points=14 and then in desperation to points=18 with no effect. I have both restarted R and done a complete reboot to no avail. What am I missing? You probably edited *one* of at least two Rconsole files. The one in your home directory (probably C:/Users/Username/Documents/Rconsole) takes precedence. Uwe Ligges Any advice would be most gratefully received === sessionInfo() R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 [4] LC_NUMERIC=CLC_TIME=English_Canada.1252 attached base packages: [1] grDevices datasets splines graphics stats tcltk utils methods base other attached packages: [1] svSocket_0.9-51 TinnR_1.0.3 R2HTML_2.2 Hmisc_3.8-3 survival_2.36-9 loaded via a namespace (and not attached): [1] cluster_1.13.3 grid_2.13.0 lattice_0.19-26 svMisc_0.9-61 tools_2.13.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package update
On 9 May 2011 at 12:57, Uwe Ligges wrote: | | | On 08.05.2011 19:54, eric wrote: | I tried to update my packages using update.packages() | | I got the following message: | | The downloaded packages are in | ‘/tmp/RtmpyDYdTX/downloaded_packages’ | Warning in install.packages(update[instlib == l, Package], l, contriburl = | contriburl, : | 'lib = /usr/lib/R/library' is not writable | Error in install.packages(update[instlib == l, Package], l, contriburl = | contriburl, : | unable to install package | | How do I fix this ? | | If you want to update packages in R's default library in | /usr/lib/R/library, you will need root permissions. The way it is meant to work is that you (eric, the user) become a member of the group owning that directory -- and I picked group 'staff' for that. In the postinst (of the Debian / Ubuntu R packages), this directory is created as # edd 03 Apr 2003 cf Section 10.1.2 of Debian Policy if [ ! -e /usr/local/lib/R ]; then if mkdir /usr/local/lib/R 2/dev/null; then chown root:staff /usr/local/lib/R chmod 2775 /usr/local/lib/R fi fi if [ ! -e /usr/local/lib/R/site-library ]; then if mkdir /usr/local/lib/R/site-library 2/dev/null; then chown root:staff /usr/local/lib/R/site-library chmod 2775 /usr/local/lib/R/site-library fi fi We could conceivably be fancier and create an R group on the system, but I felt this is best left to local admins. Alternatively, if you make that directory owned by 'you' then you don't need root either. You can check what group you are part of via 'id'. On my Ubuntu box, I am member of a few groups: edd@max:~$ id uid=1000(edd) gid=1000(edd) groups=1000(edd),4(adm),20(dialout),24(cdrom),27(sudo),44(video),46(plugdev),50(staff),107(lpadmin),115(admin),122(sambashare),124(libvirtd) edd@max:~$ Hope this helps, Dirk | Uwe Ligges | | | | | -- | View this message in context: http://r.789695.n4.nabble.com/package-update-tp3507479p3507479.html | Sent from the R help mailing list archive at Nabble.com. | | __ | R-help@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. | | __ | R-help@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. -- Gauss once played himself in a zero-sum game and won $50. -- #11 at http://www.gaussfacts.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem converting character to dates
On 12.05.2011 17:40, Assu wrote: Hi all, I've searched this problem and still I can't understand my results, so here goes: I have some time series I've imported from excel with the dates in text format from excel. Data was imported with RODBC, sqlQuery() function. I have these dates: adates [1] 01/2008 02/2008 03/2008 04/2008 05/2008 06/2008 07/2008 [8] 08/2008 09/2008 10/2008 11/2008 12/2008 13/2008 14/2008 I want the format week/year, so I do: as.Date(adates,format=c(%W/%y)) and get [1] 2020-05-12 2020-05-12 2020-05-12 2020-05-12 2020-05-12 [6] 2020-05-12 2020-05-12 2020-05-12 2020-05-12 2020-05-12 everything is equal to this: 2020-this month-today if I use strptime(dates, %W/%y) 1. you need an upper case Y 2. you need a weekday (otherwise the result is not well defined and you get today as the default) hence strptime(paste(0, adates, sep=/), %w/%W/%Y) should work. Uwe Ligges it's the same. Can you explain why this happens and how to solve it? Thanks in advance Assu -- View this message in context: http://r.789695.n4.nabble.com/problem-converting-character-to-dates-tp3517918p3517918.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] confint.multinom() slow?
Dear R-helpers, I'm doing a bivariate analysis with two factors, both with relatively many levels: 1. clustering, a factor with 35 levels 2. country, a factor with 24 levels n = 12,855 my.fit - multinom(clustering ~ country, maxit=300) converges after 280 iterations. I would like to get CI:s for the odds ratios, and have tried confint() my.cis - confint(my.fit) I started confint() a few hours ago, but now I'm getting suspicious, since it hasn't terminated yet. Perhaps I just lack the reasonable patience, but is such a long computational time for confint() to be expected here? Hans Ekbrand signature.asc Description: OpenPGP digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] group length
On Thu, May 12, 2011 at 10:00 AM, Ledon, Alain alain.le...@ally.com wrote: sapply... y1=c(1.214,1.180,1.199) y2=c(1.614,1.710,1.867,1.479) y3=c(1.361,1.270,1.375,1.299) y4=c(1.459,1.335) sapply(list(y1,y2,y3,y4), length) [1] 3 4 4 2 Or, if you don't want to name each object individually: sapply(mget(paste(y,1:4,sep=),sys.frame()),length) y1 y2 y3 y4 3 4 4 2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change font size in Windows
Definitely. I edited the one in the program files. I think i saw a ref to the home file but it did not sink in. Thanks --- On Thu, 5/12/11, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: From: Uwe Ligges lig...@statistik.tu-dortmund.de Subject: Re: [R] Change font size in Windows To: John Kane jrkrid...@yahoo.ca Cc: R R-help r-h...@stat.math.ethz.ch Received: Thursday, May 12, 2011, 1:09 PM On 12.05.2011 18:25, John Kane wrote: My day for dumb questions. How do I increase the type size in the Rgui console in Windows? (R-2.13.0, Windows 7) It looked to me that I just needed to change the font spec in Rconsole but that does not seem to be working. The R FAQ for Windows has a reference in Q3.4 to changing fonts, (Q5.2), but I don't see anything relevant there. Rconsole originally was: font = TT Courier New points = 10 style = normal # Style can be normal, bold, italic I changed this to points=14 and then in desperation to points=18 with no effect. I have both restarted R and done a complete reboot to no avail. What am I missing? You probably edited *one* of at least two Rconsole files. The one in your home directory (probably C:/Users/Username/Documents/Rconsole) takes precedence. Uwe Ligges Any advice would be most gratefully received === sessionInfo() R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 [4] LC_NUMERIC=C LC_TIME=English_Canada.1252 attached base packages: [1] grDevices datasets splines graphics stats tcltk utils methods base other attached packages: [1] svSocket_0.9-51 TinnR_1.0.3 R2HTML_2.2 Hmisc_3.8-3 survival_2.36-9 loaded via a namespace (and not attached): [1] cluster_1.13.3 grid_2.13.0 lattice_0.19-26 svMisc_0.9-61 tools_2.13.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple 95% confidence interval for a median
On May 12, 2011, at 18:33 , Greg Snow wrote: Contrary to the commonly held assumption, the Wilcoxin test does not deal with medians in general. There are some specific cases/assumptions where the test/interval would apply to the median, if I remember correctly the assumptions include that the population distribution is symmetric and the only alternatives considered are shifts of the distribution (both assumptions that go contrary to what I would believe in most situations where I would want to use the Wilcoxin test). Yes. Notice that the signed-rank Wilcoxon test does in fact assume symmetry under the null hypothesis, which does makes sense when looking at differences, but less so away from the null. As far as I remember, the pseudo median minimizes the absolute value of the signed-rank test statistic, but to be sure, read the reference on the help page. If you want an actual confidence interval on the true meadian, then you either need to make some assumptions about the distribution that the data comes from, or use a tool like the bootstrap. You can invert the binomial. Since 95 percent of the binomial distribution with p=.5, n=86 is between 35 and 52 you can generate a 95% CI for the median as sort(x)[c(34,53)]. There are a few demons lurking in the details, and it is easy be off-by-one, but you get the picture. Try this ci - replicate(5000, {x-rexp(86); sort(x)[c(34,53)] }) m - qexp(.5) ci - ci[,order(apply(ci,2,sum))] matplot(t(ci),pch=.) abline(h=m) sum(ci[1,]m) sum(ci[2,]m) (I get about 2% error in either direction, so slightly conservative. Taking c(35,52), I get 3% both ways, so I suppose I got the cutoff right. A bit earlier in the day and I might even be able to prove it...) BTW, I'm sure someone has improved on this with some sort of interpolation. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Georgina Imberger Sent: Thursday, May 12, 2011 7:36 AM To: r-help@r-project.org Subject: [R] Simple 95% confidence interval for a median Hi! I have a data set of 86 values that are non-normally distributed (counts). The median value is 10. I want to get an estimate of the 95% confidence interval for this median value. I tried to use a one-sample Wiolcoxin test: wilcox.test(Comps,mu=10,conf.int=TRUE) and got the following output: Wilcoxon signed rank test with continuity correction data: Comps V = 2111, p-value = 0.05846 alternative hypothesis: true location is not equal to 10 95 percent confidence interval: 10.0 17.49993 sample estimates: (pseudo)median 12.50006 I wonder if someone would mind helping me out? What am I doing wrong? What is the '(psuedo)median'? Can I get R to estimate the confidence around the actual median of 10? With thanks, Georgie [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Saving misclassified records into dataframe within a loop
John - In your example, the misclassified observations (as defined by your predict.function) will be kyphosis[kyphosis$Kyphosis == 'absent' prediction[,1] != 1,] so you could start from there. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Thu, 12 May 2011, John Dennison wrote: Greetings R world, I know some version of the this question has been asked before, but i need to save the output of a loop into a data frame to eventually be written to a postgres data base with dbWriteTable. Some background. I have developed classifications models to help identify problem accounts. The logic is this, if the model classifies the record as including variable X and it turns out that record does not have X then it should be reviewed(ie i need the row number/ID saved to a database). Generally i want to look at the misclassified records. This is a little hack i know, anyone got a better idea please let me know. Here is an example library(rpart) # grow tree fit - rpart(Kyphosis ~ Age + Number + Start, method=class, data=kyphosis) #predict prediction-predict(fit, kyphosis) #misclassification index function predict.function - function(x){ for (i in 1:length(kyphosis$Kyphosis)) { #the idea is that if the record is absent but the prediction is otherwise then show me that record if (((kyphosis$Kyphosis[i]==absent)==(prediction[i,1]==1)) == 0 ){ #THIS WORKS print( row.names(kyphosis[c(i),])) } } } predict.function(x) Now my issue is that i want to save these id to a data.frame so i can later save them to a database. This this an incorrect approach. Can I save each id to the postgres instance as it is found. i have a ignorant fear of lapply, but it seems it may hold the key. Ive tried predict.function - function(x){ results-as.data.frame(1) for (i in 1:length(kyphosis$Kyphosis)) { #the idea is that if the record is absent but the prediction is otherwise then show me that record if (((kyphosis$Kyphosis[i]==absent)==(prediction[i,1]==1)) == 0 ){ #THIS WORKS results[i,]- as.data.frame(row.names(kyphosis[c(i),])) } } } this does not work. results object does not get saved. Any Help would be greatly appreciated. Thanks John Dennison [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change font size in Windows
I looked at that yesterday and totally missed the font settings! I'm blaming the new glasses. Thank you. I'll probably do the Rconsole change but I's nice to know about this one if I'm using R on another machine. That way I cannot mess up someone else's setup./ --- On Thu, 5/12/11, Greg Snow greg.s...@imail.org wrote: From: Greg Snow greg.s...@imail.org Subject: RE: [R] Change font size in Windows To: John Kane jrkrid...@yahoo.ca, R R-help r-h...@stat.math.ethz.ch Received: Thursday, May 12, 2011, 1:07 PM One simple way: Run R (the gui version) Click on the Edit menu Click on the GUI Preferences item. Select the font, size, style, colors, etc. that you want. If you click on Save then these become the new default. If you click on Apply, but don't save then they will last that session but not be the new defaults. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of John Kane Sent: Thursday, May 12, 2011 10:25 AM To: R R-help Subject: [R] Change font size in Windows My day for dumb questions. How do I increase the type size in the Rgui console in Windows? (R-2.13.0, Windows 7) It looked to me that I just needed to change the font spec in Rconsole but that does not seem to be working. The R FAQ for Windows has a reference in Q3.4 to changing fonts, (Q5.2), but I don't see anything relevant there. Rconsole originally was: font = TT Courier New points = 10 style = normal # Style can be normal, bold, italic I changed this to points=14 and then in desperation to points=18 with no effect. I have both restarted R and done a complete reboot to no avail. What am I missing? Any advice would be most gratefully received === sessionInfo() R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 [4] LC_NUMERIC=C LC_TIME=English_Canada.1252 attached base packages: [1] grDevices datasets splines graphics stats tcltk utils methods base other attached packages: [1] svSocket_0.9-51 TinnR_1.0.3 R2HTML_2.2 Hmisc_3.8-3 survival_2.36-9 loaded via a namespace (and not attached): [1] cluster_1.13.3 grid_2.13.0 lattice_0.19-26 svMisc_0.9-61 tools_2.13.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] do.call and applying na.rm=TRUE
Hi all! I need to do something really simple using do.call. If I want to call the mean function inside do.call, how do I apply the condition na.rm=TRUE? So, I use do.call(mean, list(x)) where x is my data. This works fine if there are no NAs. Thanks, John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] do.call and applying na.rm=TRUE
?do.call Second argument is a list of arguments to pass. Try do.call(mean, list(x, na.rm = T)) On Thu, May 12, 2011 at 1:57 PM, John Kerpel john.ker...@gmail.com wrote: Hi all! I need to do something really simple using do.call. If I want to call the mean function inside do.call, how do I apply the condition na.rm=TRUE? So, I use do.call(mean, list(x)) where x is my data. This works fine if there are no NAs. Thanks, John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- === Jon Daily Technician === #!/usr/bin/env outside # It's great, trust me. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival Rate Estimates
On May 12, 2011, at 12:40 PM, Brian McLoone wrote: Dear List, Is there an automated way to use the survival package to generate survival rate estimates and their standard errors? To be clear, *not *the survivorship estimates (which are cumulative), but the survival *rate * estimates... Not entirely clear, but from context I suspect you mean instantaneous hazard? (Survival is not a rate but rather a proportion. Mortality can be a rate. The instantaneous hazard is the decrement in survival per unit time divided by the survival to that time.) So at each death the non-parametric estimate would divide current deaths (often 1 but ties are possible) by time since last death and then divide by proportion surviving. Or if you have a semi-parametric estimated function for survival (such as might be output from `basehaz` which calls `survfit`) take: -delta_survival/delta_time/survival tdata - data.frame(time =c(1,1,1,2,2,2,3,3,3,4,4,4), status=rep(c(1,0,2),4), n =c(12,3,2,6,2,4,2,0,2,3,3,5)) fit - survfit(Surv(time, time, status, type='interval') ~1, data=tdata, weight=n) T - c(0, fit$time) S - c(1, fit$surv) (-diff(S)/diff(T) )/fit$surv [1] 0.8602308 0.8247746 0.4044324 1.2115931 I don't know if Therneau's opinion about estimating smoothed hazards has changed: http://finzi.psych.upenn.edu/Rhelp10/2009-March/193104.html There is also a muhaz package which may generate standard errors for its estimates but I have read elsewhere that is does not do Cox models. http://finzi.psych.upenn.edu/R/library/muhaz/html/00Index.html -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] new package 'mvmeta' to perform multivariate meta-analysis
Dear R Community, I am pleased to announce the release of a new package called 'mvmeta', now available on CRAN (version 0.2.0). The package mvmeta provides some functions to perform fixed and random-effects multivariate meta-analysis and meta-regression. This modelling framework is exploited to pool multiple correlated outcomes across studies, and already applied in different fields: meta-analysis of randomized controlled trials reporting more than 1 outcome, multi-site observational studies estimating multi-parameterized associations, among others. The package is fully documented through help pages. A package vignette will be hopefully added soon. I hope that this package will be useful to your work. Any kind of feedback (questions, suggestions, bug-reports, etc.) is appreciated. Sincerely, Antonio Gasparrini London School of Hygiene Tropical Medicine Department of Social and Environmental Health Research 15-17 Tavistock Place, London WC1H 9SH, UK Office: 0044 (0)20 79272406 Mobile: 0044 (0)79 64925523 Skype contact: a.gasparrini http://www.lshtm.ac.uk/people/gasparrini.antonio ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with mediation
Hi: Try this: library(sos) # install first if you don't have it already findFn('mediation') You should find at least a half dozen packages from which to choose, at least three of which appear to be devoted to mediation analysis. HTH, Dennis On Thu, May 12, 2011 at 8:56 AM, Mervi Virtanen mervi.virta...@uta.fi wrote: Hello! I have problem with mediation analysis. I can do it with function mediate, when I have one mediator. But how I can do it if I have one independent variable and one dependent variable but 4 mediators? I have try function mediations, but it dosen't work. If I use mediate 4 times, each for every mediator, is it same? I want to know what is total mediate effect for 4 mediators. t.Mete __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fw: Help with PLSR
Hi I am attempting to use plsr which is part of the pls package in r. I amconducting analysis on datasets to identify which proteins/peptides are responsible for the variance between sample groups (Biomarker Spoting) in a multivariate fashion. I have a dataset in R called FullDataListTrans. as you can see below the structure of the data is 40 different rows representing a sample and 94,272 columns each representing a peptide. str(FullDataListTrans) num [1:40, 1:94727] 42 40.9 65 56 61.7 ... - attr(*, dimnames)=List of 2 ..$ : chr [1:40] X X.1 X.12 X.13 ... ..$ : NULL I have also created a vector GroupingList which gives the groupnames for each respective sample(row). GroupingList [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 [39] 4 4 str(GroupingList) int [1:40] 1 1 1 1 1 1 1 1 1 1 ... I am now stuck while conducting the plsr. I have tried various methods of creating structured lists etc and have got nowhere. I have also tried many incarnations of BHPLS1 - plsr(GroupingList ~ PCIList, ncomp = FeaturePresenceExpected[1], data = FullDataListTrans, validation = LOO) Where am I going wrong. Also what is the easiest method to identify which of the 94,000 peptides are most important to the variance between groups. Thanks in advance for any help Amit Patel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Read.xls in gdata
All, When I use gdata::read.xls to read in an excel file it seems to round the data to three decimal places and also converts the dates to factors. Does anyone know how to 1) get more precision in the numeric data and 2) how to prevent the dates from being converted to levels or factors? I tries settings as.is=TRUE, but that didn't help. Thanks, Roger *** This message is for the named person's use only. It may\...{{dropped:20}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Assistance R
Assistance R, When trying to insert data in txt format already set up R pr is the following error: Erro em scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : linha 1 não tinha 10 elementos I would like to know how to remedy this error pr I proceeded with my analysis, because I need to urgently Att Carlos Magno -- View this message in context: http://r.789695.n4.nabble.com/Assistance-R-tp3518289p3518289.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] extract integers from string
I have a vector with a long list of sentences that contain integers. I would like to extract the integers in a manner such that they are separate and manipulatable. for example: x[i] - sally has 20 dollars in her pocket and 3 marbles x[i+1] - 30 days ago john had a 400k house all sentences are different and contain a mixture of both integers and characters. i would like to get a conditional matrix such that: y[i,j] - 20y[i,j+1] - 3 y[i+1,j] - 30 y[i+1,j+1] - 400 based on some criteria (i.e. order, string length, keyword, etc...) the integers are sorted. most of my trouble is with finding the correct way to use gsub() or strsplit() such that the strings are integers that can be inputed into a matrix. thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sensitivity Analysis: Morris method - argument scale
Dear R-users, I have a question on the logical argument scale in the morris-function from the sensitivity package. Should it be set to TRUE or FALSE? Thanks, Chris [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract integers from string
Try this: library(gsubfn) strapply(x, \\d+, as.numeric, simplify = rbind) On Thu, May 12, 2011 at 3:06 PM, Alon Honig honey...@gmail.com wrote: I have a vector with a long list of sentences that contain integers. I would like to extract the integers in a manner such that they are separate and manipulatable. for example: x[i] - sally has 20 dollars in her pocket and 3 marbles x[i+1] - 30 days ago john had a 400k house all sentences are different and contain a mixture of both integers and characters. i would like to get a conditional matrix such that: y[i,j] - 20 y[i,j+1] - 3 y[i+1,j] - 30 y[i+1,j+1] - 400 based on some criteria (i.e. order, string length, keyword, etc...) the integers are sorted. most of my trouble is with finding the correct way to use gsub() or strsplit() such that the strings are integers that can be inputed into a matrix. thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assistance R
Am 12.05.2011 20:14, schrieb Carlosmagno: Assistance R, When trying to insert data in txt format already set up R pr is the following error: Erro em scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : linha 1 não tinha 10 elementos I would like to know how to remedy this error pr I proceeded with my analysis, because I need to urgently We need more information. What command do you use to read the file? How do the first 5 lines of your data.txt look like? -- Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assistance R
As Alexander Engelhardt says we need more information/ Please give us the code you are using and a sample of the data. However one thing you might want to do is check that the seperetor is for the data. You may be reading something like tab delimited data when you think it is comma delimited. Or something similar. --- On Thu, 5/12/11, Carlosmagno carlosmagno...@ig.com.br wrote: From: Carlosmagno carlosmagno...@ig.com.br Subject: [R] Assistance R To: r-help@r-project.org Received: Thursday, May 12, 2011, 2:14 PM Assistance R, When trying to insert data in txt format already set up R pr is the following error: Erro em scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : linha 1 não tinha 10 elementos I would like to know how to remedy this error pr I proceeded with my analysis, because I need to urgently Att Carlos Magno -- View this message in context: http://r.789695.n4.nabble.com/Assistance-R-tp3518289p3518289.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] DCC-GARCH model and AR(1)-GARCH(1,1) regression model
Hello, I have a rather complex problem... I will have to explain everything in detail because I cannot solve it by myself...i just ran out of ideas. So here is what I want to do: I take quotes of two indices - SP500 and DJ. And my first aim is to estimate coefficients of the DCC-GARCH model for them. This is how I do it: library(tseries) p1 = get.hist.quote(instrument = ^gspc,start = 2005-01-07,end = 2009-09-04,compression = w, quote=AdjClose) p2 = get.hist.quote(instrument = ^dji,start = 2005-01-07,end = 2009-09-04,compression = w, quote=AdjClose) p = cbind(p1,p2) y = diff(log(p))*100 y[,1] = y[,1]-mean(y[,1]) y[,2] = y[,2]-mean(y[,2]) T = length(y[,1]) library(ccgarch) library(fGarch) f1 = garchFit(~ garch(1,1), data=y[,1],include.mean=FALSE) f1 = f1@fit$coef f2 = garchFit(~ garch(1,1), data=y[,2],include.mean=FALSE) f2 = f2@fit$coef a = c(f1[1], f2[1]) A = diag(c(f1[2],f2[2])) B = diag(c(f1[3], f2[3])) dccpara = c(0.2,0.6) dccresults = dcc.estimation(inia=a, iniA=A, iniB=B, ini.dcc=dccpara,dvar=y, model=diagonal) dccresults$out DCCrho = dccresults$DCC[,2] matplot(DCCrho, type='l') dccresults$out deliver me the estimated coefficients of the DCC-GARCH model. And here is my first question: How can I check if these coefficients are significant or not? How can I test them for significance? second question would be: Is this true that matplot(DCCrho, type='l') shows conditional correlation between the two indices in question? Ok. This would be it when it comes to DCC-GARCH. Now, using conditional correlation obtained from the DCC-GARCH model, I want to test for structural shifts in conditional correlations. To be precise, I want to test whether the conditional correlations significantly increase in the turmoil period / during the Subprime crisis. The regression model is AR(1)-GARCH(1,1), using a dummy variable specified as: *** the equations, you can find in the attachment *** where the first equation is the conditional correlation among the two indices during the Subprime crisis, Dt is a dummy variable for the turmoil period, and the second equation (hij,t) is the conditional variance of eij,t The aim is, of course, to find the estimates of the regression model on structural shifts in the conditional correlations obtained in the DCC-GARCH model. I found an information that there is no function for AR(1)-GARCH(1,1) regression model. That's why it has to be done in two steps: 1) estimate the AR parameters 2) estimate the GARCH part of the model on the residuals from the AR model And this would be my rather poor idea of how to do it... library(timeSeries) library(fSeries) step1 = arma(DCCrho, order = c(1,0), include.intercept = TRUE) step1$res step11 = na.remove(step1$res) step2 = garch (step11, order = c(1,1), include.intercept = TRUE) To be honest I have no clue how to do it. I don't even now why do I get a missing value as a result of step1 (step1$res[1]) and how to account for it? Above, I just removed it but then I have a smaller number of observations...and this is probably wrong. And then these GARCH estimates on the residuals...does that make sense at all? I know the mail is quite looong, but hopefully, someone will find time to give me a hand because I have to solve the problem and I reached the point where I cannot move forward without someone's help. There is not much information on how to apply DCC-GARCH model and AR(1)-GARCH(1,1) regression model in the Internet. Hopefully, some of you are familiar with it. Thank you very much in advance, people of good will, for looking at what I wrote and helping me. Best regards Marcin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] foreach(): how to do calculations between two foreach loops?
Hi, On Wed, May 11, 2011 at 5:44 PM, Marius Hofert m_hof...@web.de wrote: Dear expeRts, is it possible to carry out calculations between different foreach() calls? As for nested loops, you want to carry out calcuations not depending on the inner loop only once and not for each iteration of the innermost loop. Cheers, Marius library(foreach) foreach(i=1:3) %:% foreach(j=1:2) %do% { i - i+1 print(paste(i,j)) } foreach(i=1:3) %:% i - i+1 # lengthy calculation which has to be done only once, not for each j foreach(j=1:2) %do% { print(paste(i,j)) } If I understand your question well, one solution might be to break it into two parallized parts, eg: R lengthy.calculation.i - foreach(i=1:3) %dopar% someIntenseFunction(i) R foreach(i=1:3) %:% foreach(j=1:3) %dopar% { anotherIntenseFunction(length.calculation.i[i], j) } I believe the left and right hand side of the %:% operator need a foreach object, so your original incantation wouldn't work. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mixed Ordinal logistic regression: marginal probabilities and standard errors for the marginal probabilities
Dear R-list helpers: I am trying to run an ordinal logistic regression using lmer function(I think that is the correct function, although I have test it out yet), it is going to be a mixed model with second level as random. Instead the regular estimate results, I need to get the marginal probabilities and standard errors for the marginal probabilities. Which option in lmer would give me those two kind of values? Thank you. Ya Ma [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] assigning creating missing rows and values
I have a dataset where I have missing times (11:00 and 16:00). I would like the outputs to include the missing time so that the final time vector looks like realt and has the previous time's value. Ex. If meas at time 15:30 is 0.45, then the meas for time 16:00 will also be 0.45. meas are the measurements and times are the times at which they were taken. meas-runif(18) times-c(08:30,09:00,09:30,10:00,10:30,11:30,12:00,12:30,13:00,13:30,14:00,14:30,15:00, 15:30 ,16:30,17:00,17:30,18:00) output-data.frame(meas,times) realt-c(08:30,09:00,09:30,10:00,10:30,11:00,11:30,12:00,12:30,13:00,13:30,14:00,14:30,15:00,15:30,16:00,16:30,17:00,17:30,18:00) - In theory, practice and theory are the same. In practice, they are not - Albert Einstein -- View this message in context: http://r.789695.n4.nabble.com/assigning-creating-missing-rows-and-values-tp3518633p3518633.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival Rate Estimates
On May 12, 2011, at 2:19 PM, David Winsemius wrote: On May 12, 2011, at 12:40 PM, Brian McLoone wrote: Dear List, Is there an automated way to use the survival package to generate survival rate estimates and their standard errors? To be clear, *not *the survivorship estimates (which are cumulative), but the survival *rate * estimates... Not entirely clear, but from context I suspect you mean instantaneous hazard? (Survival is not a rate but rather a proportion. Mortality can be a rate. The instantaneous hazard is the decrement in survival per unit time divided by the survival to that time.) So at each death the non-parametric estimate would divide current deaths (often 1 but ties are possible) by time since last death and then divide by proportion surviving. Or if you have a semi-parametric estimated function for survival (such as might be output from `basehaz` which calls `survfit`) take: -delta_survival/delta_time/survival tdata - data.frame(time =c(1,1,1,2,2,2,3,3,3,4,4,4), status=rep(c(1,0,2),4), n =c(12,3,2,6,2,4,2,0,2,3,3,5)) fit - survfit(Surv(time, time, status, type='interval') ~1, data=tdata, weight=n) T - c(0, fit$time) I was doing something else in this session and realized that using 'T' was _not_ a good choice here. T == TRUE [1] FALSE TRUE FALSE FALSE FALSE I (almost) always spell out TRUE but not everyone does. Better to use 'sT' or almost anything else. (But don't use: c, df, C, F, pi, rm, t, qt, pt, rt, dt,, df, rf, qf, ... ) rm(T) T == TRUE [1] TRUE -- David. S - c(1, fit$surv) (-diff(S)/diff(T) )/fit$surv [1] 0.8602308 0.8247746 0.4044324 1.2115931 I don't know if Therneau's opinion about estimating smoothed hazards has changed: http://finzi.psych.upenn.edu/Rhelp10/2009-March/193104.html There is also a muhaz package which may generate standard errors for its estimates but I have read elsewhere that is does not do Cox models. http://finzi.psych.upenn.edu/R/library/muhaz/html/00Index.html -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] assigning creating missing rows and values
On May 12, 2011, at 4:33 PM, Schatzi wrote: I have a dataset where I have missing times (11:00 and 16:00). I would like the outputs to include the missing time so that the final time vector looks like realt and has the previous time's value. Ex. If meas at time 15:30 is 0.45, then the meas for time 16:00 will also be 0.45. meas are the measurements and times are the times at which they were taken. meas-runif(18) times- c (08 : 30 ,09 : 00 ,09 : 30 ,10 : 00 ,10 :30,11:30,12:00,12:30,13:00,13:30,14:00,14:30,15:00, 15:30 ,16:30,17:00,17:30,18:00) output-data.frame(meas,times) realt- c (08 : 30 ,09 : 00 ,09 : 30 ,10 : 00 ,10 : 30 ,11 : 00 ,11 : 30 ,12 : 00 ,12 : 30 ,13 : 00 ,13 : 30 ,14 :00,14:30,15:00,15:30,16:00,16:30,17:00,17:30,18:00) Package 'zoo' has an 'na.locf' function which I believe stands for NA's last observation carried forward. So make a regular set of times, merge and carry forward. I'm pretty sure you can find may examples in the Archive. Gabor is very good about spotting places where his many contributions can be successfully deployed. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] assigning creating missing rows and values
... But beware: Last observation carried forward is a widely used but notoriously bad (biased) way to impute missing values; and, of course, inference based on such single imputation is bogus (how bogus depends on how much imputation, among other things, of course). Unfortunately, dealing with such data well requires considerable statistical sophistication, which is why statisticians are widely employed in the clinical trial business, where missing data in longitudinal series are relatively common. You may therefore find it useful to consult a local statistician if one is available. As an extreme -- and unrealistic -- example of the problem, suppose your series consisted of 12 hours of data measured every half hour and that one series had only two measurements, the first and the last. The first value is 10 and the last is 1. LOCF would fill in the missings as all 10's. Obviously, a dumb thing to do. For real data, the problem would not be so egregious, but the fundamental difficulty is the same. (Apologies to those for whom my post is a familiar, boring refrain. Unfortunately, I do not have the imagination to offer better). Cheers, Bert On Thu, May 12, 2011 at 1:43 PM, David Winsemius dwinsem...@comcast.net wrote: On May 12, 2011, at 4:33 PM, Schatzi wrote: I have a dataset where I have missing times (11:00 and 16:00). I would like the outputs to include the missing time so that the final time vector looks like realt and has the previous time's value. Ex. If meas at time 15:30 is 0.45, then the meas for time 16:00 will also be 0.45. meas are the measurements and times are the times at which they were taken. meas-runif(18) times-c(08:30,09:00,09:30,10:00,10:30,11:30,12:00,12:30,13:00,13:30,14:00,14:30,15:00, 15:30 ,16:30,17:00,17:30,18:00) output-data.frame(meas,times) realt-c(08:30,09:00,09:30,10:00,10:30,11:00,11:30,12:00,12:30,13:00,13:30,14:00,14:30,15:00,15:30,16:00,16:30,17:00,17:30,18:00) Package 'zoo' has an 'na.locf' function which I believe stands for NA's last observation carried forward. So make a regular set of times, merge and carry forward. I'm pretty sure you can find may examples in the Archive. Gabor is very good about spotting places where his many contributions can be successfully deployed. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] assigning creating missing rows and values
I am still working on the weights problem. If the animals do not eat (like after sunset), then no new feed weight will be calculated and no new row will be entered. Thus, if I just use the previous value, it should be correct for how much cumulative feed was eaten that day up to that point. I will play around with that package and try getting it to work for me. Thank you. -Original Message- From: gunter.ber...@gene.com [mailto:gunter.ber...@gene.com] Sent: Thursday, May 12, 2011 04:13 PM To: dwinsem...@comcast.net Cc: Thompson, Adele - adele_thomp...@cargill.com; r-help@r-project.org Subject: Re: [R] assigning creating missing rows and values ... But beware: Last observation carried forward is a widely used but notoriously bad (biased) way to impute missing values; and, of course, inference based on such single imputation is bogus (how bogus depends on how much imputation, among other things, of course). Unfortunately, dealing with such data well requires considerable statistical sophistication, which is why statisticians are widely employed in the clinical trial business, where missing data in longitudinal series are relatively common. You may therefore find it useful to consult a local statistician if one is available. As an extreme -- and unrealistic -- example of the problem, suppose your series consisted of 12 hours of data measured every half hour and that one series had only two measurements, the first and the last. The first value is 10 and the last is 1. LOCF would fill in the missings as all 10's. Obviously, a dumb thing to do. For real data, the problem would not be so egregious, but the fundamental difficulty is the same. (Apologies to those for whom my post is a familiar, boring refrain. Unfortunately, I do not have the imagination to offer better). Cheers, Bert On Thu, May 12, 2011 at 1:43 PM, David Winsemius dwinsem...@comcast.net wrote: On May 12, 2011, at 4:33 PM, Schatzi wrote: I have a dataset where I have missing times (11:00 and 16:00). I would like the outputs to include the missing time so that the final time vector looks like realt and has the previous time's value. Ex. If meas at time 15:30 is 0.45, then the meas for time 16:00 will also be 0.45. meas are the measurements and times are the times at which they were taken. meas-runif(18) times-c(08:30,09:00,09:30,10:00,10:30,11:30,12:00,12:30,13:00,13:30,14:00,14:30,15:00, 15:30 ,16:30,17:00,17:30,18:00) output-data.frame(meas,times) realt-c(08:30,09:00,09:30,10:00,10:30,11:00,11:30,12:00,12:30,13:00,13:30,14:00,14:30,15:00,15:30,16:00,16:30,17:00,17:30,18:00) Package 'zoo' has an 'na.locf' function which I believe stands for NA's last observation carried forward. So make a regular set of times, merge and carry forward. I'm pretty sure you can find may examples in the Archive. Gabor is very good about spotting places where his many contributions can be successfully deployed. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Saving misclassified records into dataframe within a loop
Having poked the problem a couple more times it appears my issue is that the object i save within the loop is not available after the function ends. I have no idea why it is acting in this manner. library(rpart) # grow tree fit - rpart(Kyphosis ~ Age + Number + Start, method=class, data=kyphosis) #predict prediction-predict(fit, kyphosis) #misclassification index function results-as.data.frame(1) predict.function - function(x){ j-0 for (i in 1:length(kyphosis$Kyphosis)) { if (((kyphosis$Kyphosis[i]==absent)==(prediction[i,1]==1)) == 0 ){ j-j+1 results[j,]-row.names(testing[c(i),]) print( row.names(kyphosis[c(i),])) } } { print(results) save(results, file=results) } } i can load results from file and my out put is there. how ever if i just type results i get the original 1. what is in the lords name is occurring. Thanks John On Thu, May 12, 2011 at 1:50 PM, Phil Spector spec...@stat.berkeley.eduwrote: John - In your example, the misclassified observations (as defined by your predict.function) will be kyphosis[kyphosis$Kyphosis == 'absent' prediction[,1] != 1,] so you could start from there. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Thu, 12 May 2011, John Dennison wrote: Greetings R world, I know some version of the this question has been asked before, but i need to save the output of a loop into a data frame to eventually be written to a postgres data base with dbWriteTable. Some background. I have developed classifications models to help identify problem accounts. The logic is this, if the model classifies the record as including variable X and it turns out that record does not have X then it should be reviewed(ie i need the row number/ID saved to a database). Generally i want to look at the misclassified records. This is a little hack i know, anyone got a better idea please let me know. Here is an example library(rpart) # grow tree fit - rpart(Kyphosis ~ Age + Number + Start, method=class, data=kyphosis) #predict prediction-predict(fit, kyphosis) #misclassification index function predict.function - function(x){ for (i in 1:length(kyphosis$Kyphosis)) { #the idea is that if the record is absent but the prediction is otherwise then show me that record if (((kyphosis$Kyphosis[i]==absent)==(prediction[i,1]==1)) == 0 ){ #THIS WORKS print( row.names(kyphosis[c(i),])) } } } predict.function(x) Now my issue is that i want to save these id to a data.frame so i can later save them to a database. This this an incorrect approach. Can I save each id to the postgres instance as it is found. i have a ignorant fear of lapply, but it seems it may hold the key. Ive tried predict.function - function(x){ results-as.data.frame(1) for (i in 1:length(kyphosis$Kyphosis)) { #the idea is that if the record is absent but the prediction is otherwise then show me that record if (((kyphosis$Kyphosis[i]==absent)==(prediction[i,1]==1)) == 0 ){ #THIS WORKS results[i,]- as.data.frame(row.names(kyphosis[c(i),])) } } } this does not work. results object does not get saved. Any Help would be greatly appreciated. Thanks John Dennison [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R development master class: SF, June 8-9
Hi all, I hope you don't mind the slightly off topic email, but I'm going to be teaching an R development master class in San Francisco on June 8-9. The basic idea of the class is to help you write better code, focused on the mantra of do not repeat yourself. In day one you will learn powerful new tools of abstraction, allowing you to solve a wider range of problems with fewer lines of code. Day two will teach you how to make packages, the fundamental unit of code distribution in R, allowing others to save time by allowing them to use your code. To get the most out of this course, you should have some experience programming in R already: you should be familiar with writing functions, and the basic data structures of R: vectors, matrices, arrays, lists and data frames. You will find the course particularly useful if you're an experienced R user looking to take the next step, or if you're moving to R from other programming languages and you want to quickly get up to speed with R's unique features. Both days will incorporate a mix of lectures and hands-on learning. Expect to learn about a topic and then immediately put it into practice with a small example. Plenty of help will be available if you get stuck - there will be one assistant for every 10 attendees. You'll receive a printed copy of all slides, as well as electronic access to the slides, code and data. The material covered in the course is currently being turned into a book. You can access the current draft at https://github.com/hadley/devtools/wiki/. More information, including a complete session outline for the two days is available at: http://www.revolutionanalytics.com/products/training/public/r-development.php Regards, Hadley PS. I'm also offering an internet version of the day one through statistics.com - http://www.statistics.com/courses/using-r/r-program-adv/ -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.