Re: [R] How to test if a slope is different than 1?
Hi Greg and others, Thanks for your replies. Okay, I'm convinced that the offset is the best approach and wonder if you might have a quick look at what I did. Here's the original model containing the slope (0.56) that I'd like to test if it's different from 1.0 model1 - glm(log(data$AB.obs+1,10) ~ log(data$SIZE,10) + data$YEAR) and its coefficients: Estimate Std. Error t value Pr(|t|) (Intercept)-1.182530.09119 -12.967 2e-16 *** log(data$SIZE, 10) 0.560010.02564 21.843 2e-16 *** data$YEAR2008 0.168230.04366 3.853 0.000152 *** data$YEAR2009 0.202990.04707 4.313 0.24 *** And here's the model with an offset term: model2 - glm(log(data$AB.obs+1,10) ~ log(data$SIZE,10) + offset(log(data$SIZE,10)) + data$YEAR) and its coefficients: Estimate Std. Error t value Pr(|t|) (Intercept)-1.182530.09119 -12.967 2e-16 *** log(data$SIZE, 10) -0.439990.02564 -17.162 2e-16 *** data$YEAR2008 0.168230.04366 3.853 0.000152 *** data$YEAR2009 0.202990.04707 4.313 0.24 *** So, if I understand correctly, the small P-value corresponding to the SIZE coefficient in model2 indicates that the slope of 0.56 in model1 is significantly different from 1.0, right? If I may ask one more question: could I use the offset to test if the slope of 0.56 is different from yet another value, e.g., 0.5? Much appreciated. Many thanks, Mark Na On Wed, Apr 25, 2012 at 3:27 PM, Greg Snow 538...@gmail.com wrote: Doesn't the p-value from using offset work for you? if you really need a p-value. The confint method is a quick and easy way to see if it is significantly different from 1 (see Rolf's response), but does not provide an exact p-value. I guess you could do confidence intervals at different confidence levels until you find the level such that one of the limits is close enough to 1, but that seems like way to much work. You could also compute the p-value by taking the slope minus 1 divided by the standard error and plug that into the pt function with the correct degrees of freedom. You could even write a function to do that for you, but it still seems more work than adding the offset to the formula. On Tue, Apr 24, 2012 at 8:17 AM, Mark Na mtb...@gmail.com wrote: Hi Greg. Thanks for your reply. Do you know if there is a way to use the confint function to get a p-value on this test? Thanks, Mark On Mon, Apr 23, 2012 at 3:10 PM, Greg Snow 538...@gmail.com wrote: One option is to subtract the continuous variable from y before doing the regression (this works with any regression package/function). The probably better way in R is to use the 'offset' function: formula = I(log(data$AB.obs + 1, 10)-log(data$SIZE,10)) ~ log(data$SIZE, 10) + data$Y formula = log(data$AB.obs + 1) ~ offset( log(data$SIZE,10) ) + log(data$SIZE,10) + data$Y Or you can use a function like 'confint' to find the confidence interval for the slope and see if 1 is in the interval. On Mon, Apr 23, 2012 at 12:11 PM, Mark Na mtb...@gmail.com wrote: Dear R-helpers, I would like to test if the slope corresponding to a continuous variable in my model (summary below) is different than one. I would appreciate any ideas for how I could do this in R, after having specified and run this model? Many thanks, Mark Na Call: lm(formula = log(data$AB.obs + 1, 10) ~ log(data$SIZE, 10) + data$Y) Residuals: Min 1Q Median 3Q Max -0.94368 -0.13870 0.04398 0.17825 0.63365 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept)-1.182820.09120 -12.9702e-16 *** log(data$SIZE, 10) 0.560090.02564 21.8462e-16 *** data$Y2008 0.168250.04366 3.854 0.000151 *** data$Y2009 0.203100.04707 4.315 0.238 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.2793 on 228 degrees of freedom Multiple R-squared: 0.6768, Adjusted R-squared: 0.6726 F-statistic: 159.2 on 3 and 228 DF, p-value: 2.2e-16 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained
Re: [R] How to test if a slope is different than 1?
Hi Greg. Thanks for your reply. Do you know if there is a way to use the confint function to get a p-value on this test? Thanks, Mark On Mon, Apr 23, 2012 at 3:10 PM, Greg Snow 538...@gmail.com wrote: One option is to subtract the continuous variable from y before doing the regression (this works with any regression package/function). The probably better way in R is to use the 'offset' function: formula = I(log(data$AB.obs + 1, 10)-log(data$SIZE,10)) ~ log(data$SIZE, 10) + data$Y formula = log(data$AB.obs + 1) ~ offset( log(data$SIZE,10) ) + log(data$SIZE,10) + data$Y Or you can use a function like 'confint' to find the confidence interval for the slope and see if 1 is in the interval. On Mon, Apr 23, 2012 at 12:11 PM, Mark Na mtb...@gmail.com wrote: Dear R-helpers, I would like to test if the slope corresponding to a continuous variable in my model (summary below) is different than one. I would appreciate any ideas for how I could do this in R, after having specified and run this model? Many thanks, Mark Na Call: lm(formula = log(data$AB.obs + 1, 10) ~ log(data$SIZE, 10) + data$Y) Residuals: Min 1Q Median 3Q Max -0.94368 -0.13870 0.04398 0.17825 0.63365 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept)-1.182820.09120 -12.9702e-16 *** log(data$SIZE, 10) 0.560090.02564 21.8462e-16 *** data$Y2008 0.168250.04366 3.854 0.000151 *** data$Y2009 0.203100.04707 4.315 0.238 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.2793 on 228 degrees of freedom Multiple R-squared: 0.6768, Adjusted R-squared: 0.6726 F-statistic: 159.2 on 3 and 228 DF, p-value: 2.2e-16 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to test if a slope is different than 1?
Dear R-helpers, I would like to test if the slope corresponding to a continuous variable in my model (summary below) is different than one. I would appreciate any ideas for how I could do this in R, after having specified and run this model? Many thanks, Mark Na Call: lm(formula = log(data$AB.obs + 1, 10) ~ log(data$SIZE, 10) + data$Y) Residuals: Min 1Q Median 3Q Max -0.94368 -0.13870 0.04398 0.17825 0.63365 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept)-1.182820.09120 -12.9702e-16 *** log(data$SIZE, 10) 0.560090.02564 21.8462e-16 *** data$Y2008 0.168250.04366 3.854 0.000151 *** data$Y2009 0.203100.04707 4.315 0.238 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.2793 on 228 degrees of freedom Multiple R-squared: 0.6768, Adjusted R-squared: 0.6726 F-statistic: 159.2 on 3 and 228 DF, p-value: 2.2e-16 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with plotting a square 1 x 3 plot and placement of outer margin text
Dear R-helpers, Please see the attached plot. The problem is that I have too much space between the x-axis label (which is mtext in an outer margin) and the plots. My par settings for this plot are: par(mfrow=c(1,3),oma=c(2,2,2,2),mar=c(5.1,4.1,4.1,2.1),pty=s) #here is the code that produces the three plots, which I have deleted for simplicity mtext(Log Wetland Area,side=1,outer=TRUE) It works fine (less space between plots and outer margin text)) when I set pty=m but then I get very long and skinny rectangular plots. I would like to keep the square plots. Any help would be much appreciated! Many thanks, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Post hoc test for lm() or glm() ?
Hi R-helpers, TukeyHSD() works for models fitted with aov(), but could anyone point me to a function that performs a similar post hoc test for models fitted with lm() or glm()? Thanks in advance, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Post hoc test for lm() or glm() ?
Thank you Richard and Frank for your very quick and helpful replies. Cheers, Mark On Thu, Feb 2, 2012 at 2:58 PM, Frank Harrell f.harr...@vanderbilt.edu wrote: The R multcomp package provides one general approach to multiplicity correction. For general contrasts in lm and glm, the rms package's ols and Glm functions make this even easier to use. Frank Mark Na wrote Hi R-helpers, TukeyHSD() works for models fitted with aov(), but could anyone point me to a function that performs a similar post hoc test for models fitted with lm() or glm()? Thanks in advance, Mark __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Post-hoc-test-for-lm-or-glm-tp4352761p4352799.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Post-hoc test on ANCOVA
Dear R-helpers, I have an ANCOVA with a significant effect of the factor, which has three levels. I wish to determine which of the levels are different from each other but, because my model was fitted with lm(), I cannot use TukeyHSD. For some reason, I get different results (no significant effect of the factor) when I fit the model using aov() so, for the moment, I am using lm(). Could anyone point me to a test and associated R function that will work on a fitted lm() or glm()? Many thanks, Mark Na __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to write a list object's name to a new dataframe in that list object
Hello R-helpers, I have a list that only contains dataframes. Each element of the list (i.e., each dataframe) has a unique name (one through ten). I wish to add a new column (called NAME) to each list element (i.e each datarame) and I want that column to contain the name of it's list element. e.g. the list element (i.e., dataframe) called one would get a new column called NAME that would contain the word one in every row. Could anyone help with that? Many thanks, Mark Na __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Poisson GLM with a logged dependent variable...just asking for trouble?
Dear R-helpers, I'm using a GLM with poisson errors to model integer count data as a function of one non-integer covariate. The model formula is: log(DV) ~ glm(log(IV,10),family=poisson). I'm getting a warning because the logged DV is no longer an integer. I have three questions: 1) Can I ignore the warning, or is logging the DV (resulting in non-integers) a serious violation of the Poisson error structure? 2) If the answer to #1 is no, don't ignore it, it's serious then can I use a quasipoisson error structure instead (does not give the same warning) and if so are there any pitfalls to using the quasipoisson model? Are there any better alternatives for count data where the counts must be logged? Or, should I just abandon logging the DV? In that case, how could I compare the fit of a Poisson model (without logging the DV) to that of a GLM with normal errors (with a logged DV). AIC would not be valid because the DVs are different, right? 3) The quasipoisson model doesn't return an AIC value. Why, and is there anything I can do to calculate AIC manually, that would allow me to compare this model to other models? Many thanks in advance for your help! Cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compare two dataframes
Hi Petr, Many thanks for your help. I like your solution because (and I did not know this) the unique function works on ALL the data at once (i.e., across all of the columns) which means I don't have to make a unique ID field by pasting together all of the rows or run through all of the columns iteratively (say, by using a loop). However, if the dataframe contains non-unique rows (two rows with exactly the same values in each column) then the unique function will delete one of them and that may not be desirable. So, caution is required. Thanks again for the time you took to help me better understand the unique function. Much appreciated. Děkuji! Mark On Fri, Dec 17, 2010 at 2:27 AM, Petr Savicky savi...@cs.cas.cz wrote: On Thu, Dec 16, 2010 at 01:02:29PM -0600, Mark Na wrote: Hello, I have two dataframes DF1 and DF2 that should be identical but are not (DF1 has some rows that aren't in DF2, and vice versa). I would like to produce a new dataframe DF3 containing rows in DF1 that aren't in DF2 (and similarly DF4 would contain rows in DF2 that aren't in DF1). The function unique(DF) removes duplicated rows of DF and keeps the unique rows in the order of their first occurrence. So, if DF1 does not contain duplicated rows, then unique(rbind(DF1, DF2)) contains first DF1 and then the rows, which are unique to DF2, if there are any. The order of the rows in the result depends on the order of the original data frames and if DF2 contains several instances of a row, which is not in DF1, we get only the first instance of this row in the difference. #MAKE SOME DATA cars$id - paste(cars$speed, cars$dist, sep=) #create unique ID field by pasting all columns together cars1 - cars[1:35, ] cars2 - cars[16:50, ] #EXTRACT UNIQUE ROWS cars1_unique - cars1[cars1$id %in% setdiff(cars1$id, cars2$id), ] #rows unique to cars1 (i.e., not in cars2) cars2_unique - cars2[cars2$id %in% setdiff(cars2$id, cars1$id), ] #rows unique to cars2 cars1_set - unique(cars1) cars2_set - unique(cars2) cars1_plus - unique(rbind(cars1_set, cars2_set)) cars2_plus - unique(rbind(cars2_set, cars1_set)) cars1_diff - cars2_plus[ - seq(nrow(cars2_set)), ] cars2_diff - cars1_plus[ - seq(nrow(cars1_set)), ] all(cars1_unique == cars1_diff) # [1] TRUE all(cars2_unique == cars2_diff) # [1] TRUE Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Compare two dataframes
Hello, I have two dataframes DF1 and DF2 that should be identical but are not (DF1 has some rows that aren't in DF2, and vice versa). I would like to produce a new dataframe DF3 containing rows in DF1 that aren't in DF2 (and similarly DF4 would contain rows in DF2 that aren't in DF1). I have a solution for this problem (see self contained example below) but it's awkward and requires making a new ID column by pasting together all of the columns in each DF and them comparing the two DFs based on this unique ID. Is there a better way? Many thanks for your help, Mark #compare two dataframes and extract uncommon rows #MAKE SOME DATA cars$id-paste(cars$speed,cars$dist,sep=) $create unique ID field by pasting all columns together cars1-cars[1:35,] cars2-cars[16:50,] #EXTRACT UNIQUE ROWS cars1_unique-cars1[cars1$id %in% setdiff(cars1$id,cars2$id),] #rows unique to cars1 (i.e., not in cars2) cars2_unique-cars2[cars2$id %in% setdiff(cars2$id,cars1$id),] #rows unique to cars2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to left or right truncate a character string?
Hi R-helpers, I have a character string, for example: lm(y ~ X2 + X3 + X4) from which I would like to strip off the leading and trailing quotation marks resulting in this: lm(y ~ X2 + X3 + X4) I have tried using gsub() but I can't figure out how to specify the quotation mark using a regular expression. Alternatively, I would like a function that lets me delete the leading (or trailing) X characters, and in this case X=1 (but it could be used more flexibly to delete several leading or trailing characters). I would appreciate help with either of these potential solutions (gsub and regex, or delete leading/trailing characters). Many thanks! Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to bind models into a list of models?
Hi R-helpers, I have a character object called dd that has 32 elements each of which is a model formula contained within quotation marks. Here's what it looks like: dd [1] lm(y ~ 1,data=Cement) lm(y ~ X,data=Cement) lm(y ~ X1,data=Cement) [4] lm(y ~ X2,data=Cement)lm(y ~ X3,data=Cement)lm(y ~ X4,data=Cement) [7] lm(y ~ X + X1,data=Cement)lm(y ~ X + X2,data=Cement)lm(y ~ X + X3,data=Cement) [10] lm(y ~ X + X4,data=Cement)lm(y ~ X1 + X2,data=Cement) lm(y ~ X1 + X3,data=Cement) [13] lm(y ~ X1 + X4,data=Cement) lm(y ~ X2 + X3,data=Cement) lm(y ~ X2 + X4,data=Cement) [16] lm(y ~ X3 + X4,data=Cement) lm(y ~ X + X1 + X2,data=Cement) lm(y ~ X + X1 + X3,data=Cement) [19] lm(y ~ X + X1 + X4,data=Cement) lm(y ~ X + X2 + X3,data=Cement) lm(y ~ X + X2 + X4,data=Cement) [22] lm(y ~ X + X3 + X4,data=Cement) lm(y ~ X1 + X2 + X3,data=Cement) lm(y ~ X1 + X2 + X4,data=Cement) [25] lm(y ~ X1 + X3 + X4,data=Cement) lm(y ~ X2 + X3 + X4,data=Cement) lm(y ~ X + X1 + X2 + X3,data=Cement) [28] lm(y ~ X + X1 + X2 + X4,data=Cement) lm(y ~ X + X1 + X3 + X4,data=Cement) lm(y ~ X + X2 + X3 + X4,data=Cement) [31] lm(y ~ X1 + X2 + X3 + X4,data=Cement) lm(y ~ X + X1 + X2 + X3 + X4,data=Cement) I would like to convert this object into a list called Cand.models with 32 list elements each of which would contain one of the above model formulae. When I print the list, the models should run, so the first few elements of the list would look like this (see below output from a list I created by hand). Many thanks for any help you can provide! Mark Cand.models [[1]] Call: lm(formula = y ~ 1, data = Cement) Coefficients: (Intercept) 95.42 [[2]] Call: lm(formula = y ~ X, data = Cement) Coefficients: (Intercept)X 82.3081.874 [[3]] Call: lm(formula = y ~ X1, data = Cement) Coefficients: (Intercept) X1 81.4791.869 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to bind models into a list of models?
Many thanks Phil. This is perfect. I usually forget about lapply and try something more complicated. Your solution works really well. Best, Mark On Tue, Dec 14, 2010 at 3:45 PM, Phil Spector spec...@stat.berkeley.edu wrote: Mark - I believe lapply(dd,function(m)eval(parse(text=m))) will do what you want. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Tue, 14 Dec 2010, Mark Na wrote: Hi R-helpers, I have a character object called dd that has 32 elements each of which is a model formula contained within quotation marks. Here's what it looks like: dd [1] lm(y ~ 1,data=Cement) lm(y ~ X,data=Cement) lm(y ~ X1,data=Cement) [4] lm(y ~ X2,data=Cement) lm(y ~ X3,data=Cement) lm(y ~ X4,data=Cement) [7] lm(y ~ X + X1,data=Cement) lm(y ~ X + X2,data=Cement) lm(y ~ X + X3,data=Cement) [10] lm(y ~ X + X4,data=Cement) lm(y ~ X1 + X2,data=Cement) lm(y ~ X1 + X3,data=Cement) [13] lm(y ~ X1 + X4,data=Cement) lm(y ~ X2 + X3,data=Cement) lm(y ~ X2 + X4,data=Cement) [16] lm(y ~ X3 + X4,data=Cement) lm(y ~ X + X1 + X2,data=Cement) lm(y ~ X + X1 + X3,data=Cement) [19] lm(y ~ X + X1 + X4,data=Cement) lm(y ~ X + X2 + X3,data=Cement) lm(y ~ X + X2 + X4,data=Cement) [22] lm(y ~ X + X3 + X4,data=Cement) lm(y ~ X1 + X2 + X3,data=Cement) lm(y ~ X1 + X2 + X4,data=Cement) [25] lm(y ~ X1 + X3 + X4,data=Cement) lm(y ~ X2 + X3 + X4,data=Cement) lm(y ~ X + X1 + X2 + X3,data=Cement) [28] lm(y ~ X + X1 + X2 + X4,data=Cement) lm(y ~ X + X1 + X3 + X4,data=Cement) lm(y ~ X + X2 + X3 + X4,data=Cement) [31] lm(y ~ X1 + X2 + X3 + X4,data=Cement) lm(y ~ X + X1 + X2 + X3 + X4,data=Cement) I would like to convert this object into a list called Cand.models with 32 list elements each of which would contain one of the above model formulae. When I print the list, the models should run, so the first few elements of the list would look like this (see below output from a list I created by hand). Many thanks for any help you can provide! Mark Cand.models [[1]] Call: lm(formula = y ~ 1, data = Cement) Coefficients: (Intercept) 95.42 [[2]] Call: lm(formula = y ~ X, data = Cement) Coefficients: (Intercept) X 82.308 1.874 [[3]] Call: lm(formula = y ~ X1, data = Cement) Coefficients: (Intercept) X1 81.479 1.869 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to plot effect of x1 while controlling for x2
Hello R-helpers, Please see a self-contained example below, in which I attempt to plot the effect of x1 on y, while controlling for x2. Is there a function that does the same thing, without having to specify that x2 should be held at its mean value? It works fine for this simple example, but might be cumbersome if the model was more complex (e.g., lots of x variables, and/or interactions). Many thanks, Mark #make some random data x1-rnorm(100) x2-rnorm(100,2,1) y-0.75*x1+0.35*x2 #fit a model model1-lm(y~x1+x2) #predict the effect of x1 on y, while controlling for x2 xv1-seq(min(x1),max(x1),0.1) yhat_x1-predict(model1,list(x1=xv1,x2=rep(mean(x2),length(xv1))),type=response) #plot the predicted values plot(y~x1,xlim=c(min(x1),max(x1)), ylim=c(min(y),max(y))) lines(xv1,yhat_x1) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Please help with min()
Hello, I have two vectors of length = 10 x-c(2,14,79,27,3,126,15,1,12,4) y-rep(4,10) and I would like to create a third vector of length = 10 that contains the smallest value at each position in the two above vectors. I have tried: z-min(x,y) but that doesn't work. With the example data above, the third vector would look like this. z [1] 2 4 4 4 3 4 4 1 4 4 Any help with this would be much appreciated, thanks! mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Conditional replacement of NA depending on value in the previous column
Dear R-helpers, I have a dataframe like this: ID X1 X2 X3 X4 X5 X6 49 1 1 1 0 NA NA 50 1 1 1 1 NA 1 I would like to convert a missing value (NA) that follows a 0 (zero) or another missing value (NA) into a 0 (zero). So, the above lines would be converted to: ID X1 X2 X3 X4 X5 X6 49 1 1 1 0 0 0 50 1 1 1 1 NA 1 I have been struggling with this all morning, so any help you could provide would be much appreciated. Thank you! Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Please help with loop, thanks
Dear R helpers, I would like to write a loop that makes 4 objects (called A, B, C, and D) each of which contains ten random numbers. This attempt: individuals-c(A,B,C,D) for(i in 1:length(individuals)) { individuals[i]-rnorm(10) } does not work because individuals[i] is not the proper way to extract each letter from the object called individuals (rather, it tries to assign the random numbers to various positions within individual) So, my question is, what should be to the left of the gets operator in the third line? Many thanks, Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Please help with loop, thanks
Many thanks to everyone who helped me solve this problem. I think I must have described my problem poorly, but Phil, Patrick and Jim were able to see through the haze and suggest that I use a list to contain the output from my loop. This solution works very well. Thanks again for your help, Mark On Thu, Mar 18, 2010 at 12:44 PM, Mark Na mtb...@gmail.com wrote: Dear R helpers, I would like to write a loop that makes 4 objects (called A, B, C, and D) each of which contains ten random numbers. This attempt: individuals-c(A,B,C,D) for(i in 1:length(individuals)) { individuals[i]-rnorm(10) } does not work because individuals[i] is not the proper way to extract each letter from the object called individuals (rather, it tries to assign the random numbers to various positions within individual) So, my question is, what should be to the left of the gets operator in the third line? Many thanks, Mark Na -- Mark Na University of Saskatchewan Saskatoon, Canada [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Please help with a basic function
Hello, I am learning how to use functions, but I'm running into a roadblock. I would like my function to do two things: 1) convert an object to a dataframe, 2) and then subset the dataframe. Both of these commands work fine outside the function, but I would like to wrap them in a function so I can apply the code iteratively to many such objects. Here's what I wrote, but it doesn't work: convert-function(d) { d-data.frame(d); #convert object to dataframe d-subset(d,select=c(time,coords.x1,coords.x2)) #select some columns } convert(data) #the problem is that data is the same as it was before running the function The objects being processed through my function are SpatialPointsDataFrames but I'm quite sure that's not my problem, as I can process these outside of the function (using the above code) ... it's when I try to wrap the code in a function that it doesn't work. Thanks, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Please help with a basic function
Many thanks for the replies to my call for help this morning. I didn't know about return() and that helped quite a bit. Best, Mark On Fri, Dec 11, 2009 at 10:00 AM, Paul Hiemstra p.hiems...@geo.uu.nlwrote: Hi Mark, This question would probably be better suited for the r-sig-geo mailing list. In addition, please read the posting guide and provide a piece of code that reproduces the problem. library(sp) convert-function(d) { d-data.frame(d); #convert object to dataframe d-subset(d,select=c(zinc,x,y)) #select some columns d # - add this, or alternatively 'return(d)' } data(meuse) coordinates(meuse) = ~x+y convert(meuse) But maybe better, subsetting a SPDF can be done using: meuse[zinc] # Remains an SPDF # Returns a data.frame data.frame(coordinates(meuse), zinc = meuse$zinc) And some unrequested advice :). To process multiple files, take a look at lapply, both for reading and processing. all_data = lapply(list_of_files, function(file) { bla = read.table(file) coordinates(bla) = ~coor.x1 + coor.x2 return(bla) } # all data is now a list wit the SPDF's processed_data = lapply(all_data, function(dat) { return(data.frame(coordinates(dat), zinc = dat$zinc)) } ofcourse you can include the latter lapply stuff inside the first 'loading' lapply. all_data = lapply(list_of_files, function(file) { bla = read.table(file) bla = subset(bla, select = select=c(time,coords.x1,coords.x2)) coordinates(bla) = ~coor.x1 + coor.x2 return(bla) } hope this helps and good luck, Paul Mark Na wrote: Hello, I am learning how to use functions, but I'm running into a roadblock. I would like my function to do two things: 1) convert an object to a dataframe, 2) and then subset the dataframe. Both of these commands work fine outside the function, but I would like to wrap them in a function so I can apply the code iteratively to many such objects. Here's what I wrote, but it doesn't work: convert-function(d) { d-data.frame(d); #convert object to dataframe d-subset(d,select=c(time,coords.x1,coords.x2)) #select some columns } convert(data) #the problem is that data is the same as it was before running the function The objects being processed through my function are SpatialPointsDataFrames but I'm quite sure that's not my problem, as I can process these outside of the function (using the above code) ... it's when I try to wrap the code in a function that it doesn't work. Thanks, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 274 3113 Mon-Tue Phone: +3130 253 5773 Wed-Fri http://intamap.geo.uu.nl/~paul http://intamap.geo.uu.nl/%7Epaul -- Mark Na University of Saskatchewan Saskatoon, Canada [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to apply five lines of code to ten dataframes?
Hello R-helpers, I have 10 dataframes (named data1, data2, ... data10) and I would like to add 5 new columns to each dataframe using the following code: data1$LogDepth-log10(data1[,2]/data1[,4]) data1$LogArea-log10(data1[,3]/data1[,5]) data1$p-2*data1[,6]/data1[,7] data1$Exp-data1[,2]^(2/data1[,8]) data1$s-data1[,3]/data1[,9] ...but I would prefer not to repeat this chunk of code 10 times! I have struggled with setting up a loop to apply these 5 lines of code to each of the 10 dataframes, but I'm not having much luck. Any help would be much appreciated. Thank you, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Very basic R workflow question for Windows users
Hi all, If you use the R Editor (not another text editor), please read on... So, my usual R workflow involves having two windows open (R Console, R Editor), writing a line of code in the R Editor and then typing Ctrl-R to run that line. Then, quite frequently, I want to run a new command to check my output (e.g., dim(dataframe) or str(dataframe) or head(dataframe)) but I *do not want* to write that command in the R Editor, rather I want to edit it directly in the R Console. My (basic) question is: how do you use the keyboard (not mouse) to move the cursor from the Editor to the Console. The only (cumbersome) way I know is to type Alt-Tab and then Tab-Tab-Tab to get to the Console. Is there a better way? Thanks, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Convert a list of N dataframes to N dataframes
Hello, I have used the following command: datalist-list(data1,data2,data3,data4,data5,data6) to make a list of my dataframes, which I then manipulated with these commands: datalist-llply(datalist,LogDepth) datalist-llply(datalist,LogArea) datalist-llply(datalist,p) datalist-llply(datalist,Exp) datalist-llply(datalist,s) This worked very nicely (thanks for plyr, Hadley) but now I would like to unlist my list into the individual dataframes, preferably with their original names (data1, etc). I've tried to do this with: ldply(datalist,unlist) but that's not working. Any help with this would be much appreciated. Thanks, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to identify the rows in my dataframe with a negative value in any column?
Dear R-helpers, I have a dataframe that should not contain any negative values, but it does. I wish to print the rows from my dataframe that contain a negative value in any column. I've tried this: dataframe[dataframe0,] but it just returns a row of NAs. I would very much appreciate any help with this you could provide. Thanks, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to identify the rows in my dataframe with a negative value in any column?
Thanks Steve, this works very well! Mark On Tue, Nov 24, 2009 at 2:07 PM, Steve Lianoglou mailinglist.honey...@gmail.com wrote: Hi, On Nov 24, 2009, at 2:58 PM, Mark Na wrote: Dear R-helpers, I have a dataframe that should not contain any negative values, but it does. I wish to print the rows from my dataframe that contain a negative value in any column. I've tried this: dataframe[dataframe0,] but it just returns a row of NAs. I would very much appreciate any help with this you could provide. Imagine you had a data.frame like this: R df - data.frame(a=1:10, b=c(1:3,-4, 5:10), c=c(-1, 2:10)) This will return you a boolean vector of which rows have negative values: R has.neg - apply(df, 1, function(row) any(row 0)) If you want the actually index numbers: R which(has.neg) [1] 1 4 HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contacthttp://cbio.mskcc.org/%7Elianos/contact -- Mark Na University of Saskatchewan Saskatoon, Canada [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change positions of columns in data frame
Hi Joel, The answers you've received already, suggesting subscripting, are good because they strengthen your understanding of R subscripting. However, sometimes these methods produce strange column names. So, what I usually do is use the subset command. You don't have to provide anything for the subset argument (i.e., you'll keep all the rows) but you can re-order the columns by listing them in the order you want, within the select argument, like this DF-subset(DF,select=c(var3,var2,var1)) HTH, Mark 2009/10/23 Joel Fürstenberg-Hägg joel_furstenberg_h...@hotmail.com Hi all, Probably a simple question, but I just can't find a simple answear in the older threads or anywhere else. I've added some new vectors as columns in a data frame using cbind(). As they're all put as the last columns inte the data frame, I would like to move them to specific positions. How do you do to change the position of a column in a data frame? I know I can use fieldTrial0809=data.frame(Sample_ID=as.factor(fieldTrial0809$Sample_ID), Plant_ID=as.factor(fieldTrial0809$Plant_ID), ...) to create a new data frame with the given columns in the specified order, but there must be an easier way..? All the best, Joel _ Nya Windows 7 - Hitta en dator som passar dig! Mer information. http://windows.microsoft.com/shop [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Mark Na University of Saskatchewan Saskatoon, Canada [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] List of Windows time zones?
Hello, I would like to adapt the following code: data$datetime3-format(data$datetime2, tz=EST) for different time zones. For example, North America's CST and MDT (but those codes don't work). I have read ?Sys.timezone but I'm afraid it isn't very helpful. I'm just looking for a list of the timezones, not a history of how time zones have been dealt with throughout R's history. Previously asked messages (and replies) on R-help provide confusing and sometimes contradictory advice. Is there a simple list of timezones, for R on Windows? Or, can they be calculated on the fly (but e.g., GMT+6 does not work in the above code). Any help would be much appreciated as I am getting a bit frustrated, thanks! Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Make a blank dataframe with given dimensions
Hi R-helpers, I would like to make a blank dataframe, i.e. a dataframe without any rows. I would like the blank dataframe (which is to be called merged) to have 0 rows and 32 columns. Once I've made the dataframe, I'll specify the column names using: names(merged)-c(GRIDCODE,paste(VALUE,0:3,sep=_),paste(VALUE,5:30,sep=_),AREA) Then I'll add rows to it, using the loop (which is working fine). Thanks for any help! Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to rename columns that start with numbers?
Hello, My dataframe has new columns that start with the number 1 or 2 (resulting from a reshape cast command). Instead of having these columns automatically renamed by R so start with the letter X, I would like to rename these columns to start with the characters SURV_ (e.g., SURV_1, SURV_2). I can't seen to use grep() to identify and rename the columns starting with either 1 or 2. Any help would be much appreciated, thanks! (I know I could rename these manually, but the above is a simpler statement of the actual problem, which involves several dozen columns, so that's why I'd prefer not to so it manually) Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Compare lm() to glm(family=poisson)
Dear R-helpers, I would like to compare the fit of two models, one of which I fit using lm() and the other using glm(family=poisson). The latter doesn't provide r-squared, so I wonder how to go about comparing these models (they have the same formula). Thanks very much, Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Testing year effects in lm()
Dear R-helpers, I have a linear model with a year effect (year is coded as a factor), i.e. the parameter estimates for each level of my year variable have significant P values (see some output below) and I am interested in testing: a) the overall effect of year; b) the significance of each year vis-a-vis every other year (the model output only tests each year against the baseline year). I'd appreciate any help with how to perform these post-hoc tests in R. Many thanks, Mark Na Call: lm(formula = data$SR.obs ~ log(data$AREA, 10) + data$YEAR, subset = (data$AREA = 14.5)) Residuals: Min 1Q Median 3Q Max -5.3412 -1.3140 0.1108 1.1972 4.3126 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -9.4606 0.6144 -15.399 2e-16 *** log(data$AREA, 10) 3.9261 0.1734 22.644 2e-16 *** data$YEAR20081.0750 0.2854 3.767 0.000211 *** data$YEAR20091.5884 0.3073 5.169 5.18e-07 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 1.822 on 226 degrees of freedom Multiple R-squared: 0.6945, Adjusted R-squared: 0.6905 F-statistic: 171.3 on 3 and 226 DF, p-value: 2.2e-16 [1] AIC= 934.557 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Testing year effect in lm() ***failed first time, sending again
Dear R-helpers, I have a linear model with a year effect (year is coded as a factor), i.e. the parameter estimates for each level of my year variable have significant P values (see some output below) and I am interested in testing: a) the overall effect of year; b) the significance of each year vis-a-vis every other year (the model output only tests each year against the baseline year). I'd appreciate any help with how to perform these post-hoc tests in R. Many thanks, Mark Na Call: lm(formula = data$SR.obs ~ log(data$AREA, 10) + data$YEAR, subset = (data$AREA = 14.5)) Residuals: Min 1Q Median 3Q Max -5.3412 -1.3140 0.1108 1.1972 4.3126 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -9.4606 0.6144 -15.399 2e-16 *** log(data$AREA, 10) 3.9261 0.1734 22.644 2e-16 *** data$YEAR20081.0750 0.2854 3.767 0.000211 *** data$YEAR20091.5884 0.3073 5.169 5.18e-07 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 1.822 on 226 degrees of freedom Multiple R-squared: 0.6945, Adjusted R-squared: 0.6905 F-statistic: 171.3 on 3 and 226 DF, p-value: 2.2e-16 [1] AIC= 934.557 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Make my plots bigger and reduce white space around panels?
Hi, I have made a plot with panels (attached) using R code (below) and I'd like to increase the size of each panel and decrease the white space, especially the white space between: 1. rows of panels 2. the top panel and its title (which contains info on r2 and N) 3. each panel and its x label. I've dug around in the plot help files but can't seem to find how to do this. Any help much appreciated, thanks! Mark Na #ELEPHANT SPECIES RICHNESS par(mfrow=c(3,4),oma=c(0,0,2,0)) models-list(data$SR.elephant.obs~data$AREA, log(data$SR.elephant.obs+1,10)~log(data$AREA,10), data$SR.elephant.obs~log(data$AREA,10), log(data$SR.elephant.obs+1,10)~data$AREA) for (i in 1:length(models)){ #SCATTERPLOT model-lm(models[[i]]) plot(models[[i]],ylab=Elephant SR); abline(model);title(main=paste(r2=,round(summary(model)$r.squared,digits=3),, N=,dim(data)[1])) } for (i in 1:length(models)){#RESIDUALS VS FITTED VALUES PLOT model-lm(models[[i]]) plot.lm(model,which=1,sub.caption=NA) } for (i in 1:length(models)){#Q-Q PLOT model-lm(models[[i]]) plot.lm(model,which=2,sub.caption=NA) } title(main=ELEPHANT SPECIES RICHNESS,outer=TRUE); savePlot(SR_elephant.emf,type=emf); dev.off() [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Make my plots bigger and reduce white space around panels?
The plot is attached this time... On Tue, Jul 28, 2009 at 4:47 PM, Mark Na mtb...@gmail.com wrote: Hi, I have made a plot with panels (attached) using R code (below) and I'd like to increase the size of each panel and decrease the white space, especially the white space between: 1. rows of panels 2. the top panel and its title (which contains info on r2 and N) 3. each panel and its x label. I've dug around in the plot help files but can't seem to find how to do this. Any help much appreciated, thanks! Mark Na #ELEPHANT SPECIES RICHNESS par(mfrow=c(3,4),oma=c(0,0,2,0)) models-list(data$SR.elephant.obs~data$AREA, log(data$SR.elephant.obs+1,10)~log(data$AREA,10), data$SR.elephant.obs~log(data$AREA,10), log(data$SR.elephant.obs+1,10)~data$AREA) for (i in 1:length(models)){ #SCATTERPLOT model-lm(models[[i]]) plot(models[[i]],ylab=Elephant SR); abline(model);title(main=paste(r2=,round(summary(model)$r.squared,digits=3),, N=,dim(data)[1])) } for (i in 1:length(models)){#RESIDUALS VS FITTED VALUES PLOT model-lm(models[[i]]) plot.lm(model,which=1,sub.caption=NA) } for (i in 1:length(models)){#Q-Q PLOT model-lm(models[[i]]) plot.lm(model,which=2,sub.caption=NA) } title(main=ELEPHANT SPECIES RICHNESS,outer=TRUE); savePlot(SR_elephant.emf,type=emf); dev.off() -- Mark Na University of Saskatchewan Saskatoon, Canada __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reordering the columns of my dataframe
Hi R-helpers, I have written this line of code: data-cbind(data[,1],data[,2:6],data[,18],data[,7:17]) to reorder the columns of my dataframe, but I'm losing the column names of my 1st and 18th columns (they are now named data[,1] and data[,18] respectively). Can I use cbind to do this (without losing my column names) or is there another way? Many thanks, Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to perform a calculation in each element of my list?
Hi R-helpers, I have a list containing 10 elements, each of which is a dataframe. I wish to add a new column to each list element (dataframe) containing the product of the last two columns of each dataframe. I'd appreciate any pointers, thanks! Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to extract the upper xlim and ylim of my plot?
Dear R-helpers, I wish to place some text in a plot, at approx 10% of my upper xlim and approx 90% of my upper ylim, i.e. plot(log(all$SR,10)~log(all$AREA,10)) text(.1*max(xlim),.9*max(ylim),text to be placed) (I know how to give absolute coordinates for text location, but I wish to use relative coordinates). My code (above) doesn't work because I don't know how to properly extract the upper xlim and ylim values. Does anyone know how I could extract the upper xlim and ylim values (without using max(x-variable) or max (y-variable)...I wish to keep this as general as possible and not point to the original data. Thanks in advance, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Averaging dataframes that are stored in a list
Dear R-helpers, I have a list containing 5000 elements, each element is a dataframe containing one ID column (identical over the 5000 dataframes) and 9 numeric variables, e.g. ID VAR1 VAR2 VAR3 ... VAR9 I would like to create a new dataframe containing the ID column and the mean values of the 9 numeric variables. So, the structure of this new dataframe would be identical to the structure of the dataframes stored in my list (and the ID column would also be identical) but the values would be mean values. I've been attempting to do this with rowMeans and subscripting the list using double square brackets, but I can't get it to work. I'd appreciate any pointers to get me going, thanks! Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to combine two rows (in a dataframe) into a third row?
Hi Henrique other R-helpers, Thank you for helping me last week. I used Henrique's suggestion to develop some code (below) to combine two rows in my dataframe into a third row, and then delete the original two rows. It works well. My solution is not very elegant however; if there's a function (or a better way) to accomplish this in 1-2 lines (rather than my 6) I'd appreciate knowing about it. Many thanks, Mark Na #make some data for this example data-data.frame(c(1A,1B),c(10,15)) names(data)-c(id,value) data$value-as.numeric(as.character(data$value)) #combine two lines into one by summing their values in the value column fixed-data.frame() #create empty data frame to hold fixed rows fixed-rbind(fixed, aggregate(data[value],list(substr(data[,id],1,1)), sum)) #copy previous line as necessary for other fixes names(fixed)-c(id,value) #fix column names #bind the fixed line to the main dataframe and delete the original lines data-rbind(data,fixed) #add fixed lines to data data-data[-which(c(1A,1B) %in% data$id),] #delete lines from data rownames(data) - 1:nrow(data) #renumber rows On Thu, Jul 9, 2009 at 5:58 PM, Henrique Dallazuanna www...@gmail.comwrote: Try this: aggregate(x[VALUE], list(substr(x[,ID], 1, 1)), sum) On Thu, Jul 9, 2009 at 7:27 PM, Mark Na mtb...@gmail.com wrote: Dear R-helpers, I have two rows in my dataframe: IDVALUE 1A10 1B15 and I would like to combine these two rows into a single (new) row in my dataframe: IDVALUE 125 ...simply by specifying a new value for ID and summing the two VALUES. I have been trying to do this with with rbind, but it's not working. I'd appreciate any pointers. Thanks, Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomizing a dataframe
Greg's reply was just what I needed to get me going. I used his advice to produce a program which does just what I need. In case it helps someone else, my program is below. Mark Na library(reshape) data-read.csv(data.csv) datam-melt(data,id=(TREE)) #value = number of individuals datam-datam[rep(1:nrow(datam), datam$value),] #expand rows based on number of individuals rownames(datam) - 1:nrow(datam) #fix rownames datam-subset(datam,select=c(TREE,variable)) #drop columns names(datam)-c(TREE,SPECIES) #rename columns datap-data.frame(sample(datam$TREE),datam$SPECIES) #randomly permute TREE names(datap)-c(TREE,SPECIES) #rename columns datat-data.frame(table(datap)) #collapse rows based on number of individuals = Freq datac-cast(datat,TREE~SPECIES,value=Freq) #the final permuted table On Wed, Jul 8, 2009 at 11:28 AM, Greg Snow greg.s...@imail.org wrote: Here is one approach (there are others, some that are probably better, but this can get you started): 1. rearrange your data so that every insect is a single row with 2 columns: the tree id and the species (this new dataset will have as many rows as the sum of the values in the old dataset). The reshape package may be able to help with this step (you may also need the rep function). 2. randomly permute one of the 2 columns (see ?sample). 3. restructure the permuted data back to the original (the table function may be enough here, the reshape package will give more options). Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Mark Na Sent: Wednesday, July 08, 2009 9:54 AM To: r-help@r-project.org Subject: [R] Randomizing a dataframe Hi R-helpers, I have a dataframe (called data) with trees in rows (n=100) and insect species (n=10) in columns. My tree IDs are in a column called TREE and each species has a column labeled SPEC1, SPEC2, SPEC3, etc... I wish to randomize the values in my dataframe such that row and column totals are held constant, i.e. in my randomized data each tree will have the same number of individual insects as in the real data (constant row totals) and each species will have the same number of individuals as in the real data (constant column totals). I will eventually want to do this many times, but I would appreciate help getting started with the randomization. Thank you, Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to combine two rows (in a dataframe) into a third row?
Dear R-helpers, I have two rows in my dataframe: IDVALUE 1A10 1B15 and I would like to combine these two rows into a single (new) row in my dataframe: IDVALUE 125 ...simply by specifying a new value for ID and summing the two VALUES. I have been trying to do this with with rbind, but it's not working. I'd appreciate any pointers. Thanks, Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Randomizing a dataframe
Hi R-helpers, I have a dataframe (called data) with trees in rows (n=100) and insect species (n=10) in columns. My tree IDs are in a column called TREE and each species has a column labeled SPEC1, SPEC2, SPEC3, etc... I wish to randomize the values in my dataframe such that row and column totals are held constant, i.e. in my randomized data each tree will have the same number of individual insects as in the real data (constant row totals) and each species will have the same number of individuals as in the real data (constant column totals). I will eventually want to do this many times, but I would appreciate help getting started with the randomization. Thank you, Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to wrap my (working) code in a loop or function? (loop/function newbie alert)
Dear R-helpers, I have split a dataframe into a list with five elements, with the following code: datalist-split(data,data$UNIT) I would now like to run some code (below) on each element of the list to extract rows from the list elements; then I would like to rbind the extracted rows into a new dataframe containing all of the extracted rows from all of the list elements. I don't need any help with the code itself, it works fine for one chunk of data (e.g., a single dataframe). The code is: t0-match(times$START_DT, data$DATETIME) #MAKE A VECTOR OF START TIMES t1-match(times$STOP_DT, data$DATETIME) #MAKE A VECTOR OF STOP TIMES indices-mapply(FUN = :, t0, t1) #MAKES A LIST, EACH ELEMENT CONTAINS INDICES OF TIMES CORRESPONDING TO ONE WETLAND idex-times[rep(1:nrow(times), sapply(indices, length)), c(POND_ID,OBS,REP,PID), drop = FALSE] #MAKES A DATAFRAME tm-data[unlist(indices), ] #FLATTENS THE LIST OF INDICES INTO A DATAFRAME extracted-cbind(idex, tm) #BIND IDEX AND TM But now that I've split my data into a list with five elements, what I don't know how to do is wrap my code in a loop or function so I can run it on each of the five list elements and then rbind the extracted rows together into a new dataframe. (What I have now is 5 replicates of the above code, and I would like to replace that with a loop or function.) I have spent all morning on this, without much progress, so would appreciate any help you might be able to provide. Thanks! Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to select partially (not completely) unique rows?
Dear R-helpers, I know how to use unique to select unique rows, e.g. unique.rows-unique(dataframe) but I would like to select those rows that are unique only only TWO of my dataframe's columns (so, two rows with the same value on these two columns would not be kept, even if they had different values in other columns). For example, I have a dataframe with 10 columns, two of which are LATITUDE and LONGITUDE. I wish to keep only one row per unique combination of these two columns, so I've tried: unique.latlong-extracted[unique(paste(extracted$latitude,extracted$longitude)),] but this is returning a dataframe of missing values (NAs). Could anyone point me in the right direction? Thanks! Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to avoid ifelse statement converting factor to character
Hi R-helpers, Please see the below R output. The problem is that after running the ifelse statement, data$SOCIAL_STATUS is converted from a factor to a character. Is there some way I can avoid this conversion? Thanks in advance, Mark Na str(data) 'data.frame': 2100 obs. of 11 variables: $ DATE : Factor w/ 5 levels 4-Jun-09,7-May-09,..: 1 1 1 1 1 1 1 1 1 1 ... $ POND_ID: Factor w/ 113 levels 10,18,19,..: 8 8 8 8 8 8 8 8 8 8 ... $ STATUS : num 1 1 1 1 1 1 1 1 1 1 ... $ SPECIES: Factor w/ 25 levels AGWT,AMCO,..: 10 10 7 7 3 5 5 5 5 2 ... $ SOCIAL_STATUS : Factor w/ 8 levels A,B,D,E,..: 4 1 4 1 4 4 4 4 1 6 ... $ COUNT_OF_GROUPS: num 1 1 1 1 1 3 3 3 1 2 ... $ MALE : num 1 1 1 1 1 1 1 1 1 0 ... $ FEMALE : num 1 0 1 0 1 1 1 1 0 0 ... $ NOSEX : num 0 0 0 0 0 0 0 0 0 2 ... $ UPLAND : num 0 0 0 0 0 0 0 0 0 0 ... $ TAG: num 0 0 0 0 0 0 0 0 0 0 ... data$SOCIAL_STATUS-ifelse(data$SOCIAL_STATUS==B data$MALE4, C, data$SOCIAL_STATUS) str(data) 'data.frame': 2100 obs. of 11 variables: $ DATE : Factor w/ 5 levels 4-Jun-09,7-May-09,..: 1 1 1 1 1 1 1 1 1 1 ... $ POND_ID: Factor w/ 113 levels 10,18,19,..: 8 8 8 8 8 8 8 8 8 8 ... $ STATUS : num 1 1 1 1 1 1 1 1 1 1 ... $ SPECIES: Factor w/ 25 levels AGWT,AMCO,..: 10 10 7 7 3 5 5 5 5 2 ... $ SOCIAL_STATUS : chr 4 1 4 1 ... $ COUNT_OF_GROUPS: num 1 1 1 1 1 3 3 3 1 2 ... $ MALE : num 1 1 1 1 1 1 1 1 1 0 ... $ FEMALE : num 1 0 1 0 1 1 1 1 0 0 ... $ NOSEX : num 0 0 0 0 0 0 0 0 0 2 ... $ UPLAND : num 0 0 0 0 0 0 0 0 0 0 ... $ TAG: num 0 0 0 0 0 0 0 0 0 0 ... [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Apply as.factor (or as.numeric etc) to multiple columns
Hi R-helpers, I have a dataframe with 60columns and I would like to convert several columns to factor, others to numeric, and yet others to dates. Rather than having 60 lines like this: data$Var1-as.factor(data$Var1) I wonder if it's possible to write one line of code (per data type, e.g. factor) that would apply a function (e.g., as.factor) to several (non-contiguous) columns. So, I could then use 3 or 4 lines of code (for 3 or 4 data types) instead of 60. I have tried writing an apply function, but it failed. Thanks for any help you might be able to provide. Mark Na __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with ifelse statement
Hi R-helpers, I am trying to use this ifelse statement to recode a variable: data$SOCIAL_STATUS-ifelse(data$SOCIAL_STATUS==B data$MALE4, C, B) (i.e., if social status is B and there are more than 4 males, then recode social status to C; otherwise, leave it B) But, it's not working. See the below R output. Notice that there were 71 B observations before the re-code but 2098 B observations after the re-code. The only thing my code should do is REDUCE the number of B observations, not increase them. Can anyone see what I'm doing wrong? Thanks! Thanks, Mark Na str(data) 'data.frame': 2100 obs. of 13 variables: $ DATE :Class 'Date' num [1:2100] 14399 14399 14399 14399 14399 ... $ OBS: Factor w/ 7 levels AJG,LEB,MB,..: 3 3 3 3 3 3 3 3 3 3 ... $ POND_ID: Factor w/ 118 levels 1,10,100,..: 86 86 86 86 86 86 86 86 86 86 ... $ STATUS : num 1 1 1 1 1 1 1 1 1 1 ... $ SPECIES: Factor w/ 25 levels AGWT,AMAV,..: 16 16 12 12 4 7 7 7 7 3 ... $ SOCIAL_STATUS : Factor w/ 9 levels ,A,B,D,..: 5 2 5 2 5 5 5 5 2 8 ... $ COUNT_OF_GROUPS: num 1 1 1 1 1 3 3 3 1 2 ... $ MALE : num 1 1 1 1 1 1 1 1 1 0 ... $ FEMALE : num 1 0 1 0 1 1 1 1 0 0 ... $ NOSEX : num 0 0 0 0 0 0 0 0 0 2 ... $ UPLAND : num 0 0 0 0 0 0 0 0 0 0 ... $ TAG: num 0 0 0 0 0 0 0 0 0 0 ... $ COMMENT: chr ... length(which(data$SOCIAL_STATUS==B)) [1] 71 data$SOCIAL_STATUS-ifelse(data$SOCIAL_STATUS==B data$MALE4, C, B) length(which(data$SOCIAL_STATUS==B)) [1] 2098 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SAS-like method of recoding variables?
Dear R-helpers, I am helping a SAS user run some analyses in R that she cannot do in SAS and she is complaining about R's peculiar (to her!) way of recoding variables. In particular, she is wondering if there is an R package that allows this kind of SAS recoding: IF TYPE='TRUCK' and count=12 THEN VEHICLES=TRUCK+((CAR+BIKE)/2.2); Thanks for any help or suggestions you might be able to provide! Mark Na __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating row standard deviations
Hi R-helpers, I have been struggling with calculating row and column statistics, e.g. standard deviation. I know that datac$Mean-rowMeans(datac,na.rm=TRUE) will give me row means. I have tried to replicate those row means with the apply function: datac$Mean2-apply(datac,2,mean) so that I can replace the function argument with sd (instead of mean) to get standard deviations. But, I'm running into this error: dim(datac) [1] 17 271 datac$Mean2-apply(datac,2,mean) Error in dimnames(x) - dn : length of 'dimnames' [2] not equal to array extent Can anyone see what I'm doing wrong? Thanks! Mark Na __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to change ONLY the first character of each variable name
Dear R-helpers, I would like to adapt the following code names(data)-sub(M,MOLE,names(data)) which changes any occurrence of M (in my variable names) to MOLE such that it ONLY operates on the first character of each variable name, i.e. M will only be changed to MOLE if it's the first character of a variable. I would appreciate any help you might provide. Thanks! Mark Na __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to translate a dataframe into the R code that makes that dataframe?
Hi, I am helping another R user (off list) and I would like to email her an R script containing the data she needs and the code to solve her problem. I have made a small dummy dataset, but instead of sending her a CSV I would prefer to send the data embedded in the script, so there would be a like in the script like: my.df-c( etc, etc, etc I have made the dataframe (in a spreadsheet) and imported it into R (using read.csv) and now I wonder if there is a function to produce the code that makes that dataframe. Thanks for any help you can provide. Mark Na __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to read in only some columns of a data file
Use the colClasses argument in combination with NULL for the columns you don't want to read. For example, this code reads the first column as character data and does not import the remaining 10 columns: yourdata-read.csv(yourdata.csv,colClasses=c(rep(character,3),rep(NULL,10))) HTH, Mark Na On Wed, Jun 17, 2009 at 2:59 PM, liujb liujul...@yahoo.com wrote: Hello, I have a data file (.csv) that has a size of about 2.6 GB. I am not able to read in the whole data set because of the memory limit. I actually only need some columns (3 columns) of the data set, is there a way to read in specified columns? I am using windows. Thanks, Julia -- View this message in context: http://www.nabble.com/how-to-read-in-only-some-columns-of-a-data-file-tp24081974p24081974.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to extract all rows that contain the value of X in any column?
Hi R-helpers, I'm trying to use this code pvh_dnv-pvh[sapply(pvh==dnv),] to make a new dataframe containing the rows from pvh that contain the value of dnv in ANY column. But, it's not working. I get this error Error in match.fun(FUN) : element 1 is empty; the part of the args list of 'is.function' being evaluated was: (FUN) which, to me, is cryptic. I'd appreciate any help you might provide, thanks! Mark Na __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to subset my dataframe? (a bit tricky)
Hi R-helpers, I would like to subset my dataframe, keeping only those rows which satisfy the following conditions: 1) the string dnv is found in at least one column; 2) the value in the column previous to the one dnv is found in is not 0 Here's what my data look like: POND_ID 2009-05-07 2009-05-15 2009-05-21 2009-05-28 2009-06-04 4 101 0.15 0 dnv dnv dnv 7 102 0 dnv dnv dnv dnv 87 103 0.15 dnv 1 1 1 99 104 dnv 0.25 1 1 0.75 So, for above example, the new dataframe would not contain POND_ID 101 or 102 (because there is a 0 before the dnv) but it WOULD contain POND_ID 103 (because there is a 0.15 before the dnv) and 104 (because dnv occurs in the first column, so cannot be preceded by a 0). One extra twist: I would like to retain rows in the new dataframe which satisfy the above conditions even if they also have a 0 then dnv sequence preceding or following the problem , e.g., the following rows would be retained in the new dataframe POND_ID 2009-05-07 2009-05-15 2009-05-21 2009-05-28 2009-06-04 100 105 0.15 dnv 1 0 dnv 101 106 0 dnv 1 0.15 dnv Thanks in advance for any help you might provide. (I hope I've provided enough of an example; I could also provide a .csv file if that would help.) Mark Na __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to fix my nested conditional IF ELSE code?
Hi, I've been struggling most of the morning with an IF ELSE problem, and I wonder if someone might be able to sort me out. Here's what I need to do (dummy example, my data are more complicated): If type = A or B or C and status = a then count = 1 and status = b then count = 2 and status = c then count = 3 Else if type = D or E or F and status = a then count = 9 and status = b then count = 8 and status = c then count = 7 End Seems simple when I write it like that, but the R code is escaping me. Thanks! Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to fix my nested conditional IF ELSE code?
Thanks Gabor, this is quite clever and it's nice to see another way of doing it (without ifelse). Mark On Sun, Jun 14, 2009 at 6:51 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Note that TRUE and FALSE become 1 and 0 when used in arithmetic formulae so: result - with(DF, (type %in% c(A, B, C)) * (1 * (status == a) + 2 * (status == b) + 3 * (status == c)) + (type %in% c(D, E, F)) * (9 * (status == a) + 8 * (status == b) + 7 * (status == c))) If none of the conditions hold for row i then result[i] will be 0. On Sun, Jun 14, 2009 at 6:18 PM, Mark Namtb...@gmail.com wrote: Hi, I've been struggling most of the morning with an IF ELSE problem, and I wonder if someone might be able to sort me out. Here's what I need to do (dummy example, my data are more complicated): If type = A or B or C and status = a then count = 1 and status = b then count = 2 and status = c then count = 3 Else if type = D or E or F and status = a then count = 9 and status = b then count = 8 and status = c then count = 7 End Seems simple when I write it like that, but the R code is escaping me. Thanks! Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Expand a contingency table based on the value in one column
Hi R-helpers, I have the following (dummy) dataframe: test DATE LOCATION KIND CLASS COUNT 111 CAR A 2 211 TRUCK D 3 311 BUS E 4 412 CAR E 2 512 TRUCK A 7 612 BUS F 1 That I would like to turn into this: test2 DATE LOCATION KIND CLASS 1 11 CAR A 2 11 CAR A 3 11 TRUCK D 4 11 TRUCK D 5 11 TRUCK D 6 11 BUS E 7 11 BUS E 8 11 BUS E 9 11 BUS E 1012 CAR E 1112 CAR E 1212 TRUCK A 1312 TRUCK A 1412 TRUCK A 1512 TRUCK A 1612 TRUCK A 1712 TRUCK A 1812 TRUCK A 1912 BUS F So, basically it's a case of expanding (adding rows to) the first dataframe by the value in the COUNT column. I have solved this problem with the following code: test2-with(test, data.frame(DATE=rep(DATE,COUNT), LOCATION=rep(LOCATION,COUNT), KIND=rep(KIND,COUNT), CLASS=rep(CLASS,COUNT))) but I'm unsatisfied with that solution because it's verbose and I think there must a more elegant way. If I had more variables than 4 (which I do in my real data) it would be a nuisance to repeat each column within the rep function. I would prefer to do this with Base R or package(reshape) than relying on another package. Any ideas? Thanks! Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Specifying data type when creating a dataframe using RODBC
H R-helpers, I am using the following code to make a dataframe from an Excel spreadsheet: library(RODBC) channel - odbcConnectExcel(Spreadsheet.xls) Data - sqlFetch(channel, Tab1) odbcClose(channel) One column (several, actually) in the spreadsheet contains integers in its first few rows but later values in these columns contain a mixture of numbers, letters and symbols (it's an ID variable, containing e.g., 12, 14, 19, 19B, 19C, 19/20) R creates this column as a numeric variable (I think because its first few variables are numbers) but as soon as R gets to the non-numeric values (e.g., 19/20) it replaces them with NA. So, my question is: how can I specify that certain columns are to be read as character variables BEFORE the dataframe is created? I have tried using as.character() in the third line (above) but it creates a very long first column containing all of my data... Thanks for any help you might provide, Mark Na __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Read many .csv files into an R session automatically, without specifying filenames
Hi R-helpers, I would like to read into R all the .csv files that are in my working directory, without having to use a read.csv statement for each file. Each .csv would be read into a separate dataframe which would acquire the filename of the .csv. As an example: Mark-read.csv(Mark.csv) ...but the code (or command) would do this automatically for every .csv in the working directory without my specifying each file. I'd appreciate any help or ideas you might have. Thanks! Mark Na __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Set working directory by dragging text file onto R shortcut? [WinXP, unfortunately]
Hi R-helpers, I must use WinXP at work, and I'm missing a particular feature that's available in R on my Mac at home... In WinXP I would like to launch R by dragging and releasing a text file on top of the R shortcut on my desktop. Then, I would like the working directory to be automatically set to the location of that text file. That's what works in MacOS, but I can't figure out if it's possible in WinXP. Any ideas? I'd appreciate it... Thanks, Mark Na PS What I do now is make a new R shortcut for each project, store that shortcut in the directory containing that project's datafiles and R programs (which are in text files), and set the Start in value to that directory...but this has to be set up uniquely for each project... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Relative subscripts?
Dear R-helpers, I have a dataframe with several columns, one of which is called LENGTH. I would like to make a new column called DIFF containing the value of LENGTH minus LENGTH in the previous row, like this: ID LENGTH DIFF 1 1 10 NA 2 2 155 3 3 205 4 4 12 -8 5 5 186 I'd like to think there are relative subscripts in R but I can't find any reference to such a thing. Any help solving this problem would be much appreciated, thanks! Mark Na __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Please help me subset this dataframe, thanks...
Dear R-helpers, I have a dataframe called trackpoints with several columns including a column called time, eg: trackpoints time 1 12:00:00 2 12:00:01 3 12:00:02 . . . 298 12:04:57 299 12:04:58 300 12:04:59 I also have a dataframe called data that contains columns called ID, start and stop, eg: data ID start stop 1 1 12:00:00 12:01:30 2 2 12:02:16 12:03:01 3 3 12:03:58 12:04:31 I wish to make a dataframe called extracted containing only the rows in trackpoints with a value of time bounded by the times in data$start and data$stop and a column called ID containing the value from data$ID, eg: extracted Time ID 1 12:00:00 1 2 12:00:01 1 3 12:00:02 1 . . . 89 12:01:28 1 90 12:01:29 1 91 12:01:30 1 I have the vague notion that I might have to loop this, but I think it would be cleaner to use logical subscripts, if possible. I'd appreciate any help you might be able to provide. Thanks! Mark Na __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to summarise several models in a single table
Dear R-helpers, I have produced several models, named model1, model2, model3, etc... I would like to extract several elements from each model's object, e.g. at minimum the estimates, SEs, and P values of each model's intercept and slopes, model R-squared, and AIC... ...and then produce a new object (a table) that summarises all of my models, with M\models in rows and extractd model elements in columns. Before reinventing the wheel, I wonder if there is a package or function that does what I need? Thank you! Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Logical subset of the columns in a dataframe
Hi R-helpers, I've been struggling with a problem for most of the day (!) so am finally resorting to R-help. I would like to subset the columns of my dataframe based on the frequency with which the columns contain non-zero values. For example, let's say that I want to retain only those columns which contain non-zero values in at least 1% of their rows. In Excel I would calculate a row at the bottom of my data sheet and use the following function =countif(range,0) to identify the number of non-zero cells in each column. Then, I would divide that by the number of rows to obtain the frequency of non-zero values in each column. Then, I would delete those columns with frequencies 0.01. But, I'd like to do this in R. I think the missing link is an analog to Excel's countif function. Any ideas? Thanks! Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to add a data line (series) to a plot using add=TRUE
Hello, I'd like to use the add=TRUE parameter to add a second data line (series) to an existing plot, but R is giving me an error (see below). This code: rap-plot(aspen_sort,ylim=c(1,1),log=y) ...produces the plot to which I'd like to add the second line. But this code: rap-plot(pine_sort,add = TRUE) ...produces this error: Warning messages: 1: In plot.window(...) : add is not a graphical parameter 2: In plot.xy(xy, type, ...) : add is not a graphical parameter 3: In axis(side = side, at = at, labels = labels, ...) : add is not a graphical parameter 4: In axis(side = side, at = at, labels = labels, ...) : add is not a graphical parameter 5: In box(...) : add is not a graphical parameter 6: In title(...) : add is not a graphical parameter I have successfully used add=TRUE (in the same program!) with no errors, so I reckon the problem must be related to my data structures, but I can't see how. Any ideas? Thanks! Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to update a column in a dataframe, more simply...
Hello, I would like to be able to update an existing column in a dataframe, like this... data$score[data$type==1 data$year==2001]-data$score * 0.111 data$score[data$type==1 data$year==2002]-data$score * 0.222 data$score[data$type==1 data$year==2003]-data$score * 0.333 ...but, if possible, using simpler code. I've got several dozen lines of code like this (type 2, type3, etc. for the same years) so it would be great if I could reduce each set of three lines of code to one line Any help much appreciated, thanks! Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to order some of my columns (not rows) alphabetically
Hello, I have a dataframe with 9 columns, and I would like to sort (order) the right-most eight of them alphabetiaclly, i.e.: ID1 ID2 F G A B C E D would become ID1 ID2 A B C D E F G Right now, I'm using this code: attach(data) data-data.frame(ID1,ID2,data[,sort(colnames(data)[3:9])]) detach(data) but that's not very elegant. Ideally I could specify which columns to sort and which to leave as is (but my attempts to do so have failed). Thank you, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How can I comment out whole chunks of code?
Hello, I know this has been discussed, but I haven't found an answer in the archives. Basically, I'd like to be able to comment out chunks of code (which may or may not be syntactically correct) without having to put the # symbol in front of each line (and, if possible, without having to adopt a new text editor). My current R setup (XP) is very simple. I always have three windows open: the R console, my working directory, and a Notepad window containing my program. Adopting Tinn-R would probably solve this problem, but for simplicity I'd rather not move beyond Notepad (if possible). Thanks for any help you can provide, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple logical operations in a subscript
Hello, I would like to select cases using multiple logical operations (e.g. X or Y or Z) without having to repeat the dataframe$variable within the subscript. My working code (with a single logical operator) currently looks like this: dataframe$newvariable[data$oldvariable==X]-group1 I thought this next line of code might do what I wanted, but it doesn't: dataframe$newvariable[data$oldvariable==X | Y | Z]-group1 I'd appreciate any suggestions. I've tried playing around with grep, but can't make it work. Thanks! Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Turn of save workspace prompt (WinXP, R 2.7.2)
Hi, I'd like R to no longer prompt me to save my workspace every time I quit. I seem to recall seeing this option (maybe in the OS X console?) but I can't seem to find it in WinXP. Can anyone help? Thanks! Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] k-sample Kolmogorov-Smirnov test?
Hello, I would like to conduct a k-sample K-S test, but cannot find reference to its implementation in R. Does anyone have experience with this? Thanks, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xls to csv conversion via WinXP's context menu?
Frequently I need to convert a .xls to a .csv (for import into R) and I do this by opening the file in excel and saving it as a csv. I would rather do this using WinXP's context menu (right click the xls, choose convert to csv) but I don't know of a utility that does this. Any ideas? Thanks, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How many parameters does my model (gls) have?
Hello, Is there a way to output the number of parameters in my model (gls)? I can count the number of estimates, but I'd like to use the number of parameters as an R object. Thanks, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to run a model 1000 times, while saving coefficients each time?
Hello, We have written a program (below) to model the effect of a covariate on observed values of a response variable (using only 80% of the rows in our dataframe) and then use that model to calculate predicted values for the remaining 20% of the rows. Then, we compare the observed vs. predicted values using a linear model and inspect that model's coefficients and its R2 value. We wish to run this program 1000 times, and to save the coefficients and R2 values into a separate dataframe called results. We have a looping structure (also below) but we do not know how to save the coefficients and R2 values. We are missing some code (indicated) Any assistance would be greatly appreciated. Thanks, library(sampling) mall-read.csv(mall.csv) for (j in 1:1000) { s-srswor(2840,3550) mall80-mall[s==1,] mall20-mall[s==0,] model1-lm(count~habitat,data=mall80) summary(model1) mall20$predicted-predict(model1,newdata=mall20) model2-lm(count~predicted,data=mall20) MISSING CODE: SAVE MODEL COEFFICIENTS AND R2 VALUE TO A DATAFRAME CALLED RESULTS } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A repeated measures, linear mixed model (lme) WITHOUT random effects...
Hello, I am trying to fit a repeated measures linear mixed model (using lme) but I don't want to include any random effects. I'm having trouble (even after consulting Pinheiro Bates 2000) figuring out how to specify the repeated measure without including it in the specification of a random effect. My data consist of repeated counts in plots that I wish to model as a function of habitat. This attempt: model-lme(count~habitat-1,data=dataframe,method=ML) doesn't consider the repeated nature of my counts (i.e., that there are multiple rows in my dataframe for each plot). I know how to includeplot as a random effect, but I don't wish to do that, and I can't see how to include it without doing so. I'll appreciate any help you can provide. Thanks, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lme questions re: repeated measures covariance structure
Hello, We are attempting to use nlme to fit a linear mixed model to explain bird abundance as a function of habitat: lme(abundance~habitat-1,data=data,method=ML,random=~1|sampleunit) The data consist of repeated counts of birds in sample units across multiple years, and we have two questions: 1) Is it necessary (and, if so, how) to specify the repeated measure (years)? As written, the above code does not. 2) How can we specify a Toeplitz heterogeneous covariance structure for this model? We have searched the help file for lme, and the R-help archives, but cannot find any pertinent information. If that's not possible, can we adapt an existing covariance structure, and if so how? Thanks, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lme questions re: repeated measures covariance structure
Great, thanks for your reply! I've just tracked down a copy of Pinheiro Bates (2000) so I'll look at that, too. Thanks again, Mark On Fri, Aug 22, 2008 at 9:56 AM, Christoph Scherber [EMAIL PROTECTED] wrote: Dear Mark, I would include the repeated measure as the smallest stratum in the random effects specification: random=~1|sampleunit/year Setting up user-defined variance structures should be possible using for example: weights=varPower(form=~habitat) or also try out the available corStruct() classes (found in Pinheiro and Bates 2000) HTH Christoph Mark Na schrieb: Hello, We are attempting to use nlme to fit a linear mixed model to explain bird abundance as a function of habitat: lme(abundance~habitat-1,data=data,method=ML,random=~1|sampleunit) The data consist of repeated counts of birds in sample units across multiple years, and we have two questions: 1) Is it necessary (and, if so, how) to specify the repeated measure (years)? As written, the above code does not. 2) How can we specify a Toeplitz heterogeneous covariance structure for this model? We have searched the help file for lme, and the R-help archives, but cannot find any pertinent information. If that's not possible, can we adapt an existing covariance structure, and if so how? Thanks, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. . -- Dr. rer.nat. Christoph Scherber University of Goettingen DNPW, Agroecology Waldweg 26 D-37073 Goettingen Germany phone +49 (0)551 39 8807 fax +49 (0)551 39 8806 Homepage http://www.gwdg.de/~cscherb1 http://www.gwdg.de/%7Ecscherb1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.