Re: [R] aligning axis labels in a colorkey from levelplot
Deepayan Sarkar deepayan.sar...@gmail.com writes: You can specify a fixed-width fontfamily if that helps: levelplot(matrix(seq(4,120,l=9),3,3), colorkey = list(at = seq(0, 120, 20), labels = list(labels = c(' 0',' 20',' 40',' 60',' 80','100','120'), fontfamily = courier, font = 1))) Thanks Deepayan; I think I finally found a solution which worked much easier than I thought: ## Thanks to R graphics, 2nd ed Paul Murrell, page 250 shows how to edit ## an existing plot. levelplot(matrix(-90:89,20,20)) grid.edit([.]colorkey.labels$, grep=TRUE, just=right, global=T, x=unit(0.95, npc)) I can live with adjusting the x position by hand. Stephen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Drawing (lon,lat) coordinates onto the image of a world
Given a set of latitude and longitude coordinates pairs (stored in variables latitudevals and longitudevals), I would like to plot them onto the image of a equirectangular world map. I would like to plot each coordinate pair with a red circle, if possible. Does anyone have any suggestions as to how I go about doing this, whether using R or using another program like Google maps? Thank you, Steve [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] increase the usage of CPU and Memory
Dear All, I have been searching online for help increasing my R code more efficiently for almost a whole day, however, there is no solution to my case. So if anyone could give any clue to solve my problem, I would be very appreciate for you help. Thanks in advance. Here is my issue: My desktop is with i7-950 Quad-core CPU with 24Gb memory, and a NVIDIA GTX 480 graphic card, and I am using a 64-bit version of R under 64-bit Windows . I am running a for loop to generate a 461*5 matrix data, which is coming from the coefficients of 5 models. The loop would produce 5 values one time, and it will run 461 times in total. I have tried to run the code inside the loop just once, it will cost almost 10 seconds, so if we intuitively calculate the time of the whole loop will cost, it would be 4610 seconds, equal to almost one and a half hours, which is exactly the whole loop taking indeed. But I have to run this kinda loop for 30 data-sets! Although I thought I am using a not-bad at all desktop, I checked the usage of CPU and memory during my running R code, and found out the whole code just used 15% of CPU and 10% of memory. Does anyone have the same issue with me? or Does anyone know some methods to shorten the running time and increase the usage of CPU and memory? Many thanks, Xi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Packaging Error
I was trying to ByteCompile a package that I made. The package compiles successfully with byte compile set to FALSE. When I set ByteCompile to TRUE, I receive the following error message while doing R CMD INSTALL /usr/lib/R/bin/INSTALL: line 34: 9964 Done echo 'tools:::.install_packages()' 9965 Segmentation fault | R_DEFAULT_PACKAGES= LC_COLLATE=C ${R_HOME}/bin/R $myArgs --slave --args ${args} I have not been able to understand the problem. Can someone help me understand the problem so that it can be fixed? Thanks, Mayank This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Save rgl plot3d Graph as Image
In the rgl package *rgl.postscript *can save 3d scatter plots you have generated using the plot3d command . For example open3d() x - sort(rnorm(1000)) y - rnorm(1000) z - rnorm(1000) + atan2(x,y) plot3d(x, y, z, col=rainbow(1000)) rgl.postscript(persp3dd.pdf,pdf) -- View this message in context: http://r.789695.n4.nabble.com/Save-rgl-plot3d-Graph-as-Image-tp898351p4634478.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loop for multiple plots in figure
This solution works really nicely I learned much by working through it. However but I am having trouble with subplot formatting; setting main=d$Subject results in the correct title over each plot but repeated multiple times. Also I can't seem to format the axis labels and numbers to reduce the space between them and the plot. Any more thoughts appreciated. revised code: tC - textConnection( Subject XvarYvarparam1 param2 bob 9 100 1 100 bob 0 110 1 200 steve 2 250 1 50 bob -5 175 0 35 dave22 260 0 343 bob 3 180 0 74 steve 1 290 1 365 kevin 5 380 1 546 bob 8 185 0 76 dave2 233 0 343 steve -10 230 0 556 dave-10 233 1 400 steve -7 250 1 388 dave3 568 0 555 kevin 10 380 0 57 kevin 4 390 0 50 bob 6 115 1 600 ) data - read.table(header=TRUE, tC) close.connection(tC) rm(tC) plot_one - function(d){ with(d, plot(Xvar, Yvar, t=n, tck=0.02, main=d$Subject, xlim=c(-14,14), ylim=c(0,600))) # set limits with(d[d$param1 == 0,], points(Xvar, Yvar, col = 1)) # first line with(d[d$param1 == 1,], points(Xvar, Yvar, col = 2)) # second line } par(mfrow=c(2,2)) plyr::d_ply(data, Subject, plot_one) -- View this message in context: http://r.789695.n4.nabble.com/Loop-for-multiple-plots-in-figure-tp4634390p4634482.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Drawing (lon,lat) coordinates onto the image of a world
Hello, What do you mean by image? A file (jpeg, bmp,...)? Best Regards Le 26/06/2012 10:47, Steven Winter a écrit : Given a set of latitude and longitude coordinates pairs (stored in variables latitudevals and longitudevals), I would like to plot them onto the image of a equirectangular world map. I would like to plot each coordinate pair with a red circle, if possible. Does anyone have any suggestions as to how I go about doing this, whether using R or using another program like Google maps? Thank you, Steve [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] increase the usage of CPU and Memory
See the vignette for package 'parallel' to make use of your 4 cores. On 26/06/2012 01:07, Xi wrote: Dear All, I have been searching online for help increasing my R code more efficiently for almost a whole day, however, there is no solution to my case. So if anyone could give any clue to solve my problem, I would be very appreciate for you help. Thanks in advance. Here is my issue: My desktop is with i7-950 Quad-core CPU with 24Gb memory, and a NVIDIA GTX 480 graphic card, and I am using a 64-bit version of R under 64-bit Windows . I am running a for loop to generate a 461*5 matrix data, which is coming from the coefficients of 5 models. The loop would produce 5 values one time, and it will run 461 times in total. I have tried to run the code inside the loop just once, it will cost almost 10 seconds, so if we intuitively calculate the time of the whole loop will cost, it would be 4610 seconds, equal to almost one and a half hours, which is exactly the whole loop taking indeed. But I have to run this kinda loop for 30 data-sets! Although I thought I am using a not-bad at all desktop, I checked the usage of CPU and memory during my running R code, and found out the whole code just used 15% of CPU and 10% of memory. Does anyone have the same issue with me? or Does anyone know some methods to shorten the running time and increase the usage of CPU and memory? Many thanks, Xi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combineLimits and Dates
On 2012-06-25 15:30, Duncan Mackay wrote: Hi Elliot This works on Win 7 ver 2.15 useOuterStrips(combineLimits( xyplot(x + y ~ d | g, groups = h, data = dat, type = 'l', scales = list(y = list(relation = free), x = list( at =seq(from = as.Date(2011-01-01), to = as.Date(2011-10-01), by = 3 month), labels = format(seq(from = as.Date(2011-01-01), to = as.Date(2011-10-01), by = 3 month), %b)) ), auto.key = TRUE) )) This works because the x-limits don't require combining in this example; all panels have the same xlims. See below for a solution when the xlims are not equal. amend the seq as required and the format if required see ?strptime for format HTH Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England Armidale NSW 2351 Email: home: mac...@northnet.com.au At 02:28 26/06/2012, you wrote: I'm having some trouble using the latticeExtra 'combineLimits' function with a Date x-variable: require(lattice) set.seed(12345) dates- seq(as.Date(2011-01-01), as.Date(2011-12-31), days) dat- data.frame(d = rep(dates, 4), g = factor(rep(rep(c(1,2), each = length(dates)), 2)), h = factor(rep(c(a, b), each = length(dates)*2)), x = rnorm(4 * length(dates)), y = rnorm(4 * length(dates))) plt1- xyplot(x + y ~ d | g, groups = h, data = dat, type = 'l', scales = list(relation = free), auto.key = TRUE) plt1- useOuterStrips(plt1) plt1- combineLimits(plt1) The x-axis labels are right after the call to 'useOuterStrips' but they get converted to numeric after the call to 'combineLimits'. How do I keep them as date labels? After combineLimits(plt1), the plt1 object will have an x.limits component that has the dates converted to numeric form. You can just modify that component with: plt1$x.limits - lapply(plt1$x.limits, as.Date, origin = 1970-01-01) and then plot it. Peter Ehlers Thanks. - Elliot -- Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC 134 Mount Auburn Street | Cambridge, MA | 02138 Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] increase the usage of CPU and Memory
Hello Xi, If a program has input or output to disk or network, this may cause it to wait and not use the available CPU. Output is usually buffered, but may cause delay if the buffer gets full (I'm not sure though whether this is an issue with plenty of memory available) Take care Oliver On Mon, Jun 25, 2012 at 8:07 PM, Xi amzhan...@gmail.com wrote: Dear All, I have been searching online for help increasing my R code more efficiently for almost a whole day, however, there is no solution to my case. So if anyone could give any clue to solve my problem, I would be very appreciate for you help. Thanks in advance. Here is my issue: My desktop is with i7-950 Quad-core CPU with 24Gb memory, and a NVIDIA GTX 480 graphic card, and I am using a 64-bit version of R under 64-bit Windows . I am running a for loop to generate a 461*5 matrix data, which is coming from the coefficients of 5 models. The loop would produce 5 values one time, and it will run 461 times in total. I have tried to run the code inside the loop just once, it will cost almost 10 seconds, so if we intuitively calculate the time of the whole loop will cost, it would be 4610 seconds, equal to almost one and a half hours, which is exactly the whole loop taking indeed. But I have to run this kinda loop for 30 data-sets! Although I thought I am using a not-bad at all desktop, I checked the usage of CPU and memory during my running R code, and found out the whole code just used 15% of CPU and 10% of memory. Does anyone have the same issue with me? or Does anyone know some methods to shorten the running time and increase the usage of CPU and memory? Many thanks, Xi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Oliver Ruebenacker, Bioinformatics and Network Analysis Consultant President and Founder of Knowomics (http://www.knowomics.com/wiki/Oliver_Ruebenacker) Consultant at Predictive Medicine (http://predmed.com/people/oliverruebenacker.html) SBPAX: Turning Bio Knowledge into Math Models (http://www.sbpax.org) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graph displays
Good morning, Thanks for help. I can explain better what I am trying to do. I'm trying to read data from a file, separated by a tab, with the following code. Dataset-read.table(C:/Users/Administrator/Desktop/R/graph.txt,sep=\t, quote=\,header = TRUE) View(Dataset) dput(Dataset) View(Dataset) dput(Dataset) structure(list(Source = structure(1:3, .Label = c(A, B, C ), class = factor), X1000s = c(47L, 37L, 17L), X600s = c(63L, 64L, 62L), X500s = c(75L, 45L, 25L), X250s = c(116L, 11L, 66L ), X100s = c(125L, 25L, 12L), X50s = c(129L, 19L, 29L), X10s = c(131L, 61L, 91L), X5s = c(131L, 131L, 171L), X3s = c(131L, 186L, 186L ), X1s = c(131L, 186L, 186L)), .Names = c(Source, X1000s, X600s, X500s, X250s, X100s, X50s, X10s, X5s, X3s, X1s), class = data.frame, row.names = c(NA, -3L)) Dataset Source X1000s X600s X500s X250s X100s X50s X10s X5s X3s X1s 1 A 476375 116 125 129 131 131 131 131 2 B 3764451125 19 61 131 186 186 3 C 1762256612 29 91 171 186 186 the idea is to get a graph like this excel, but in R, as I'm still in the learning phase of the R, I have little knowledge how to do http://imageshack.us/photo/my-images/51/testlt.png/ -- View this message in context: http://r.789695.n4.nabble.com/graph-displays-tp4634448p4634488.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in mice
Hi all, I am imputing missingness of 90 columns in a data frame using mice. But mice gives back : Error in nnet.default(X, Y, w, mask = mask, size = 0, skip = TRUE, softmax = TRUE, : too many (1100) weights Any idea to solve this error is welcome, Anera [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] compare one field of dataframe with excel sheet using R
I have a data frame consisting of three columns(name of compund,ppm and frequency).Name contains string values .ppm and frequency contains numeric values with decimal points upto four digits. I have an excel sheet which is like a library.The first column contains the name of compounds and remaining column contains the ppm values of the compound which satisfy certain rules.The number of ppm values varies for each compound from 4 to 700. I need to compare the values of ppm from the dataframe and compare it with the ppm values in excel sheet and give the result if they are similar. -- View this message in context: http://r.789695.n4.nabble.com/compare-one-field-of-dataframe-with-excel-sheet-using-R-tp4634489.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to estimate variance components with lmer for models with random effects and compare them with lme results
Hi, I performed an experiment where I raised different families coming from two different source populations, where each family was split up into a different treatments. After the experiment I measured several traits on each individual. To test for an effect of either treatment or source as well as their interaction, I used a linear mixed effect model with family as random factor, i.e. lme(fixed=Trait~Treatment*Source,random=~1|Family,method=ML) so far so good, Now I have to calculate the relative variance components, i.e. the percentage of variation that is explained by either treatment or source as well as the interaction. Without a random effect, I could easily use the sums of squares (SS) to calculate the variance explained by each factor. But for a mixed model (with ML estimation), there are no SS, hence I thought I could use Treatment and Source as random effects too to estimate the variance, i.e. lme(fixed=Trait~1,random=~(Treatment*Source)|Family, method=REML) However, in some cases, lme does not converge, hence I used lmer from the lme4 package: lmer(Trait~1+(Treatment*Source|Family),data=DATA) Where I extract the variances from the model using the summary function: model-lmer(Trait~1+(Treatment*Source|Family),data=regrexpdat) results-model@REmat variances-results[,3] I get the same values as with the VarCorr function. I use then these values to calculate the actual percentage of variation taking the sum as the total variation. Where I am struggling is with the interpretation of the results from the initial lme model (with treatment and source as fixed effects) and the random model to estimate the variance components (with treatment and source as random effect). I find in most cases that the percentage of variance explained by each factor does not correspond to the significance of the fixed effect. For example for the trait HD, The initial lme suggests a tendency for the interaction as well as a significance for Treatment. Using a backward procedure, I find that Treatment has a close to significant tendency. However, estimating variance components, I find that Source has the highest variance, making up to 26.7% of the total variance. anova(lme(fixed=HD~as.factor(Treatment)*as.factor(Source),random=~1|as.factor(Family),method=ML,data=test),type=m) numDF denDF F-value p-value (Intercept)1 426 0.044523 0.8330 as.factor(Treatment) 1 426 5.935189 0.0153 as.factor(Source) 111 0.042662 0.8401 as.factor(Treatment):as.factor(Source) 1 426 3.754112 0.0533 summary(lmer(HD~1+(as.factor(Treatment)*as.factor(Source)|Family),data=regrexpdat)) Linear mixed model fit by REML Formula: HD ~ 1 + (as.factor(Treatment) * as.factor(Source) | Family) Data: regrexpdat AICBIC logLik deviance REMLdev -103.5 -54.43 63.75 -132.5 -127.5 Random effects: Groups Name Variance Std.Dev. Corr Family (Intercept) 0.0113276 0.106431 as.factor(Treatment) 0.0063710 0.079819 0.405 as.factor(Source) 0.0235294 0.153393 -0.134 -0.157 as.factor(Treatment)L:as.factor(Source) 0.0076353 0.087380 -0.578 -0.589 -0.585 Residual 0.0394610 0.198648 Number of obs: 441, groups: Family, 13 Fixed effects: Estimate Std. Error t value (Intercept) -0.027400.03237 -0.846 Hence my question is, is it correct what I am doing? Or should I use another way to estimate the amount of variance explained by each factor (i.e. Treatment, Source and their interaction). For example, would the effect sizes be a more appropriate way to go? Thanks! Kay Lucek -- View this message in context: http://r.789695.n4.nabble.com/How-to-estimate-variance-components-with-lmer-for-models-with-random-effects-and-compare-them-with-ls-tp4634492.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] MuMIn - assessing variable importance following model averaging, z-stats/p-values or CI?
Dear R users, Recent changes to the MuMIn package now means that the model averaging command (model.avg) no longer returns confidence intervals, but instead returns zvalues and corresponding pvalues for fixed effects included in models. Previously I have used this package for model selection/averaging following Greuber et al (2011) where it suggests that one should use confidence intervals from model averaging to assess whether your fixed effects have an affect or not (If confidence intervals do not span zero then variable has an affect). Can anyone tell me why MuMIn now gives z-stats and p-values and whether these should be used to assess the 'significance'/importance of variables when model averaging? Heres the example code of what I'm doing #-# ps-lmer(tranPS~( Sex+ Age.Cat2+ TOTAL+ Propfarm+ Maize+ TOTAL:Propfarm+ Maize:TOTAL+ Maize:Propfarm+ (1|Socialgroup)+(1|Year)+(1|Tattoo)),REML=FALSE, data=propspec) pss-standardize(ps,standardize.y = FALSE) psdrg-dredge(pss) summary(model.avg(get.models(psdrg,subset=delta2))) #-# REf -Grueber, C.E., Nakagawa, S., Laws, R.J. Jamieson, I.G. (2011) Multimodel inference in ecology and evolution: challenges and solutions. Journal of evolutionary biology, 24, 699-711. Any help would be much appreciated Regards Andrew Robertson PhD student Centre for Ecology and Conservation University of Exeter, Cornwall Campus Tremough, Cornwall. TR10 9EZ UK Tel: 01326 371852 Email: ar...@exeter.ac.uk Web page: http://biosciences.exeter.ac.uk/staff/postgradresearch/andrewrobertson/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loop for multiple plots in figure
Try this alternative solution using only base functions: # split the data into 4 data.frames l - split(data, data$Subject) names(l) # set up the graph parameters par(mfrow=n2mfrow(length(l)), mar=c(4,4,1,1), mgp = c(2, 1, 0)) # good old for loop over the subject names for( n in names(l)){ d - l[[n]] # temporary data.frame for convenience with(d, plot(Xvar, Yvar, t=n)) # set limits with(d[d$param1 == 0,], lines(Xvar, Yvar, lty=1)) # first line with(d[d$param1 == 1,], lines(Xvar, Yvar, lty=2)) # second line title(n) # here n is just a string } HTH, b. On 25 June 2012 23:45, Marcel Curlin cemar...@u.washington.edu wrote: This solution works really nicely I learned much by working through it. However but I am having trouble with subplot formatting; setting main=d$Subject results in the correct title over each plot but repeated multiple times. Also I can't seem to format the axis labels and numbers to reduce the space between them and the plot. Any more thoughts appreciated. revised code: tC - textConnection( Subject Xvar Yvar param1 param2 bob 9 100 1 100 bob 0 110 1 200 steve 2 250 1 50 bob -5 175 0 35 dave 22 260 0 343 bob 3 180 0 74 steve 1 290 1 365 kevin 5 380 1 546 bob 8 185 0 76 dave 2 233 0 343 steve -10 230 0 556 dave -10 233 1 400 steve -7 250 1 388 dave 3 568 0 555 kevin 10 380 0 57 kevin 4 390 0 50 bob 6 115 1 600 ) data - read.table(header=TRUE, tC) close.connection(tC) rm(tC) plot_one - function(d){ with(d, plot(Xvar, Yvar, t=n, tck=0.02, main=d$Subject, xlim=c(-14,14), ylim=c(0,600))) # set limits with(d[d$param1 == 0,], points(Xvar, Yvar, col = 1)) # first line with(d[d$param1 == 1,], points(Xvar, Yvar, col = 2)) # second line } par(mfrow=c(2,2)) plyr::d_ply(data, Subject, plot_one) -- View this message in context: http://r.789695.n4.nabble.com/Loop-for-multiple-plots-in-figure-tp4634390p4634482.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] clean Email format data
Dear all I am now going to do some text analysis using R. However, the data is very noisy that I need to clean it first. I don't have much experience in the text cleaning process. Is anyone would provide help on this? If you are able to provide some similar code which was done before would be greatly appreciated. May content is mainly the Feedback data through *Phone call record*: (usally the structure looks like the below one) *Email:* the common email corresponding , usually got a lot of history , and also some footnote such as if you are not the intended reciepient... etal.. I know it's quite a complex problem and can not be solved by a single answer,so, some tips is also very good, I will .. One example of the data: # Fyna. g-cc...@adfae.com 24/06/2012 09:15 AM Tog-cc...@adfae.com cc g-cc...@adfae.com Subject ase Mewrr asdffID:dde_20120624_15988015_11653024 * (keep this part)* CUSTOMER DETAILS Name : Mr dffa Company : da Address : ff Home No. : Office No. : Payphone Ext : Mobile No. : Fax No. : Email : CASE DETAILS Division : * dsaf (RIM) (keep this part)* Category 1 : * dsaf (RIM) (keep this part)* Category 2 : * dsaf (RIM) (keep this part)* Category 3 : Veh Reg Num : COMMENTS 24/06/2012 09:15:23 AM (Name) - Location @Ddaferdsdaf Rd Caller feedback Content.. (*This part I need to keep*) NFORMANT STATES Date Time : 24/06/2012 09:15:31 AM CSO ID : dasf https://MSCCasdfEB/LsdfA/Madsf.htm?pardsnDc?0pAsdoE9.=cS0eiIcp9m -- View this message in context: http://r.789695.n4.nabble.com/clean-Email-format-data-tp4634491.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in mice
On 26/06/2012 08:59, Anera Salucci wrote: Hi all, I am imputing missingness of 90 columns in a data frame using mice. But mice gives back : Error in nnet.default(X, Y, w, mask = mask, size = 0, skip = TRUE, softmax = TRUE, : too many (1100) weights Any idea to solve this error is welcome, See ?nnet (in package nnet). Anera [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to estimate variance components with lmer for models with random effects and compare them with lme results
1. This is not an R question; it is a statistical issue. 2. R-sig-mixed-models is the appropriate list, not r-help. -- Bert On Tue, Jun 26, 2012 at 3:28 AM, KL sticklena...@gmail.com wrote: Hi, I performed an experiment where I raised different families coming from two different source populations, where each family was split up into a different treatments. After the experiment I measured several traits on each individual. To test for an effect of either treatment or source as well as their interaction, I used a linear mixed effect model with family as random factor, i.e. lme(fixed=Trait~Treatment*Source,random=~1|Family,method=ML) so far so good, Now I have to calculate the relative variance components, i.e. the percentage of variation that is explained by either treatment or source as well as the interaction. Without a random effect, I could easily use the sums of squares (SS) to calculate the variance explained by each factor. But for a mixed model (with ML estimation), there are no SS, hence I thought I could use Treatment and Source as random effects too to estimate the variance, i.e. lme(fixed=Trait~1,random=~(Treatment*Source)|Family, method=REML) However, in some cases, lme does not converge, hence I used lmer from the lme4 package: lmer(Trait~1+(Treatment*Source|Family),data=DATA) Where I extract the variances from the model using the summary function: model-lmer(Trait~1+(Treatment*Source|Family),data=regrexpdat) results-model@REmat variances-results[,3] I get the same values as with the VarCorr function. I use then these values to calculate the actual percentage of variation taking the sum as the total variation. Where I am struggling is with the interpretation of the results from the initial lme model (with treatment and source as fixed effects) and the random model to estimate the variance components (with treatment and source as random effect). I find in most cases that the percentage of variance explained by each factor does not correspond to the significance of the fixed effect. For example for the trait HD, The initial lme suggests a tendency for the interaction as well as a significance for Treatment. Using a backward procedure, I find that Treatment has a close to significant tendency. However, estimating variance components, I find that Source has the highest variance, making up to 26.7% of the total variance. anova(lme(fixed=HD~as.factor(Treatment)*as.factor(Source),random=~1|as.factor(Family),method=ML,data=test),type=m) numDF denDF F-value p-value (Intercept)1 426 0.044523 0.8330 as.factor(Treatment) 1 426 5.935189 0.0153 as.factor(Source) 111 0.042662 0.8401 as.factor(Treatment):as.factor(Source) 1 426 3.754112 0.0533 summary(lmer(HD~1+(as.factor(Treatment)*as.factor(Source)|Family),data=regrexpdat)) Linear mixed model fit by REML Formula: HD ~ 1 + (as.factor(Treatment) * as.factor(Source) | Family) Data: regrexpdat AICBIC logLik deviance REMLdev -103.5 -54.43 63.75 -132.5 -127.5 Random effects: Groups Name Variance Std.Dev. Corr Family (Intercept) 0.0113276 0.106431 as.factor(Treatment) 0.0063710 0.079819 0.405 as.factor(Source) 0.0235294 0.153393 -0.134 -0.157 as.factor(Treatment)L:as.factor(Source) 0.0076353 0.087380 -0.578 -0.589 -0.585 Residual 0.0394610 0.198648 Number of obs: 441, groups: Family, 13 Fixed effects: Estimate Std. Error t value (Intercept) -0.027400.03237 -0.846 Hence my question is, is it correct what I am doing? Or should I use another way to estimate the amount of variance explained by each factor (i.e. Treatment, Source and their interaction). For example, would the effect sizes be a more appropriate way to go? Thanks! Kay Lucek -- View this message in context: http://r.789695.n4.nabble.com/How-to-estimate-variance-components-with-lmer-for-models-with-random-effects-and-compare-them-with-ls-tp4634492.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __
Re: [R] graph displays
On 06/26/2012 06:24 PM, MSousa wrote: Good morning, Thanks for help. I can explain better what I am trying to do. I'm trying to read data from a file, separated by a tab, with the following code. Dataset-read.table(C:/Users/Administrator/Desktop/R/graph.txt,sep=\t, quote=\,header = TRUE) View(Dataset) dput(Dataset) View(Dataset) dput(Dataset) structure(list(Source = structure(1:3, .Label = c(A, B, C ), class = factor), X1000s = c(47L, 37L, 17L), X600s = c(63L, 64L, 62L), X500s = c(75L, 45L, 25L), X250s = c(116L, 11L, 66L ), X100s = c(125L, 25L, 12L), X50s = c(129L, 19L, 29L), X10s = c(131L, 61L, 91L), X5s = c(131L, 131L, 171L), X3s = c(131L, 186L, 186L ), X1s = c(131L, 186L, 186L)), .Names = c(Source, X1000s, X600s, X500s, X250s, X100s, X50s, X10s, X5s, X3s, X1s), class = data.frame, row.names = c(NA, -3L)) Dataset Source X1000s X600s X500s X250s X100s X50s X10s X5s X3s X1s 1 A 476375 116 125 129 131 131 131 131 2 B 3764451125 19 61 131 186 186 3 C 1762256612 29 91 171 186 186 the idea is to get a graph like this excel, but in R, as I'm still in the learning phase of the R, I have little knowledge how to do http://imageshack.us/photo/my-images/51/testlt.png/ Hi MSousa, Try this: library(plotrix) barp(Dataset[,-1],names.arg=rep(,10),col=2:4) staxlab(1,at=1:10,labels=names(Dataset)[-1]) legend(2,170,Dataset$Source,fill=2:4) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Drawing (lon,lat) coordinates onto the image of a world
Hi Steve, On Mon, Jun 25, 2012 at 9:47 PM, Steven Winter stevenwinte...@yahoo.com wrote: Given a set of latitude and longitude coordinates pairs (stored in variables latitudevals and longitudevals), I would like to plot them onto the image of a equirectangular world map. I would like to plot each coordinate pair with a red circle, if possible. Does anyone have any suggestions as to how I go about doing this, whether using R or using another program like Google maps? This might help: library(maps) map(world) lon - c(-75, -70, 10) lat - c(42, -45, 50) points(lon, lat, col=red, pch=19) Sarah Thank you, Steve [[alternative HTML version deleted]] -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] compare one field of dataframe with excel sheet using R
It would help if you provided an example for your data frame, and example for your spreadsheet, and more information on how to judge if the ppm values are similar. Maybe this code will help you get started ... # Here's an example data frame mydf - data.frame( compound=letters[1:10], ppm=abs(round(rnorm(10), 4)), frequency=abs(round(rnorm(10), 4))) # Here's an example data frame representing data from your spreadsheet # You can read the data from the spreadsheet into R using the package XLConnect # library(XLConnect) # mysheet - readWorksheet(loadWorkbook(C:\\Temp\\Compounds.xlsx), sheet=Sheet1, startRow=1) mysheet - data.frame( compound=letters[sample(1:10, 100, replace=TRUE)], libppm=abs(round(rnorm(100), 4))) # combine the two example data frames both - merge(mydf, mysheet) # list the compounds in mydf that had ppm values within 0.1 of those in the spreadsheet both$diff - abs(both$ppm-both$libppm) both[both$diff0.1, ] Jean sathya7priya sathya7pr...@gmail.com wrote on 06/26/2012 03:34:22 AM: I have a data frame consisting of three columns(name of compund,ppm and frequency).Name contains string values .ppm and frequency contains numeric values with decimal points upto four digits. I have an excel sheet which is like a library.The first column contains the name of compounds and remaining column contains the ppm values of the compound which satisfy certain rules.The number of ppm values varies for each compound from 4 to 700. I need to compare the values of ppm from the dataframe and compare it with the ppm values in excel sheet and give the result if they are similar. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] significance level (p) for t-value in package zelig
My point was just that the situation in a cumulative link model is not much different from a binomial glm - the binomial glm is even a special case of the clm with only two response categories. And just like summary(glm(, family=binomial)) reports z-values and computes p-values by using the normal distribution as reference, one can do the same in a cumulative link model by applying the same asymptotic arguments. In both models the variance is determined implicitly by the mean, so a t-distribution is never involved. Cheers, Rune On 25 June 2012 11:05, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On 25/06/2012 09:32, Rune Haubo wrote: According to standard likelihood theory these are actually not t-values, but z-values, i.e., they asymptotically follow a standard normal distribution under the null hypothesis. This means that you Whose 'standard'? It is conventional to call a value of t-like statistic (i.e. a ratio of the form value/standard error) a 't-value'. And that is nothing to do with 'likelihood theory' (t statistics predate the term 'likelihood'!). The separate issue is whether a t statistic is even approximately t-distributed (and if so, on what df?), and another is if it is asymptotically normal. For the latter you have to say what you mean by 'asymptotic': we have lost a lot of the context, but as this does not appear to be IID univariate observations: - 'standard likelihood theory' is unlikely to apply. - standard asymptotics may well not be a good approximation (in regression modelling, people tend to fit more complex models to large datasets, which is often why a large dataset was collected). - even for IID observations the derivation of the t distribution assumes normality. The difference between a t distribution and a normal distribution is practically insignificant unless the df is small. And if the df is small, one can rarely rely on the CLT for approximate normality could use pnorm instead of pt to get the p-values, but an easier solution is probably to use the clm-function (for Cumulative Link Models) from the ordinal package - here you get the p-values automatically. Cheers, Rune On 23 June 2012 07:02, Bert Gunter gunter.ber...@gene.com wrote: This advice is almost certainly false! A t-statistic can be calculated, but the distribution will not necessarily be student's t nor will the df be those of the rse. See, for example, rlm() in MASS, where values of the t-statistic are given without p values. If Brian Ripley says that p values cannot be straightforwardly calculated by pt(), then believe it! -- Bert On Fri, Jun 22, 2012 at 9:30 PM, Özgür Asar oa...@metu.edu.tr wrote: Michael, Try ?pt Best Ozgur -- View this message in context: http://r.789695.n4.nabble.com/significance-level-p-for-t-value-in-package-zelig-tp4634252p4634271.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Rune Haubo Bojesen Christensen Ph.D. Student, M.Sc. Eng. Phone: (+45) 45 25 33 63 Mobile: (+45) 30 26 45 54 DTU Informatics, Section for Statistics Technical University of Denmark, Build. 305, Room 122, DK-2800 Kgs. Lyngby, Denmark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graph displays
Sorry I misunderstood what you wanted. Using ggplot2 and reshape2 which I imagine you will have to install, this should give you what you want library(ggplot2) library(reshape2) xx1 - melt(Dataset, id = c(Source)) p - ggplot( xx1 , aes(variable, value, fill= Source )) + geom_bar(position = dodge) + scale_y_continuous( Scale Values) + scale_x_discrete(X values) + opts( title = Graphing Exercise) p John Kane Kingston ON Canada -Original Message- From: ricardosousa2...@clix.pt Sent: Tue, 26 Jun 2012 01:24:17 -0700 (PDT) To: r-help@r-project.org Subject: Re: [R] graph displays Good morning, Thanks for help. I can explain better what I am trying to do. I'm trying to read data from a file, separated by a tab, with the following code. Dataset-read.table(C:/Users/Administrator/Desktop/R/graph.txt,sep=\t, quote=\,header = TRUE) View(Dataset) dput(Dataset) View(Dataset) dput(Dataset) structure(list(Source = structure(1:3, .Label = c(A, B, C ), class = factor), X1000s = c(47L, 37L, 17L), X600s = c(63L, 64L, 62L), X500s = c(75L, 45L, 25L), X250s = c(116L, 11L, 66L ), X100s = c(125L, 25L, 12L), X50s = c(129L, 19L, 29L), X10s = c(131L, 61L, 91L), X5s = c(131L, 131L, 171L), X3s = c(131L, 186L, 186L ), X1s = c(131L, 186L, 186L)), .Names = c(Source, X1000s, X600s, X500s, X250s, X100s, X50s, X10s, X5s, X3s, X1s), class = data.frame, row.names = c(NA, -3L)) Dataset Source X1000s X600s X500s X250s X100s X50s X10s X5s X3s X1s 1 A 476375 116 125 129 131 131 131 131 2 B 3764451125 19 61 131 186 186 3 C 1762256612 29 91 171 186 186 the idea is to get a graph like this excel, but in R, as I'm still in the learning phase of the R, I have little knowledge how to do http://imageshack.us/photo/my-images/51/testlt.png/ -- View this message in context: http://r.789695.n4.nabble.com/graph-displays-tp4634448p4634488.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Receive Notifications of Incoming Messages Easily monitor multiple email accounts access them with a click. Visit http://www.inbox.com/notifier and check it out! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rrdf package for mac not working
Please contact the package maintainer. Best, Uwe Ligges On 26.06.2012 00:41, Ricardo Pietrobon wrote: rrdf is incredibly helpful, but I've notice that the rrdf package for mac hasn't been working for some time: http://goo.gl/5Ukpn . wondering if there is still a plan to maintain that in the long run, or if there is some other alternative to read RDF files. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rrdf package for mac not working
On 26.06.2012 00:41, Ricardo Pietrobon wrote: rrdf is incredibly helpful, but I've notice that the rrdf package for mac hasn't been working for some time: http://goo.gl/5Ukpn . wondering if there is still a plan to maintain that in the long run, or if there is some other alternative to read RDF files. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MuMIn - assessing variable importance following model averaging, z-stats/p-values or CI?
Please contact the package maintainer. Best, Uwe Ligges On 26.06.2012 12:46, Robertson, Andrew wrote: Dear R users, Recent changes to the MuMIn package now means that the model averaging command (model.avg) no longer returns confidence intervals, but instead returns zvalues and corresponding pvalues for fixed effects included in models. Previously I have used this package for model selection/averaging following Greuber et al (2011) where it suggests that one should use confidence intervals from model averaging to assess whether your fixed effects have an affect or not (If confidence intervals do not span zero then variable has an affect). Can anyone tell me why MuMIn now gives z-stats and p-values and whether these should be used to assess the 'significance'/importance of variables when model averaging? Heres the example code of what I'm doing #-# ps-lmer(tranPS~( Sex+ Age.Cat2+ TOTAL+ Propfarm+ Maize+ TOTAL:Propfarm+ Maize:TOTAL+ Maize:Propfarm+ (1|Socialgroup)+(1|Year)+(1|Tattoo)),REML=FALSE, data=propspec) pss-standardize(ps,standardize.y = FALSE) psdrg-dredge(pss) summary(model.avg(get.models(psdrg,subset=delta2))) #-# REf -Grueber, C.E., Nakagawa, S., Laws, R.J. Jamieson, I.G. (2011) Multimodel inference in ecology and evolution: challenges and solutions. Journal of evolutionary biology, 24, 699-711. Any help would be much appreciated Regards Andrew Robertson PhD student Centre for Ecology and Conservation University of Exeter, Cornwall Campus Tremough, Cornwall. TR10 9EZ UK Tel: 01326 371852 Email: ar...@exeter.ac.uk Web page: http://biosciences.exeter.ac.uk/staff/postgradresearch/andrewrobertson/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Packaging Error
On 26.06.2012 08:54, Mayank Bansal wrote: I was trying to ByteCompile a package that I made. The package compiles successfully with byte compile set to FALSE. When I set ByteCompile to TRUE, I receive the following error message while doing R CMD INSTALL /usr/lib/R/bin/INSTALL: line 34: 9964 Done echo 'tools:::.install_packages()' 9965 Segmentation fault | R_DEFAULT_PACKAGES= LC_COLLATE=C ${R_HOME}/bin/R $myArgs --slave --args ${args} I have not been able to understand the problem. Can someone help me understand the problem so that it can be fixed? Not without your package to try it out. Best, Uwe Ligges Thanks, Mayank This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plotting two histograms on one plot with hist function
I would like to plot two data sets (frequency (y-axis) of mean values for 0-1(x=axis)) on a single histogram for comparison. The hist() only allow the overlay of two histograms, and although barplot() allows beside=TRUE, it does not show frequency values (like hist) but rather all of the values. Is there any way that I can use the hist() to plot two data sets similar to the barplot(). Any help or advice will be appreciated! Kind regards, Marguerite E-pos vrywaringsklousule Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd wees en is slegs bedoel vir die persoon aan wie dit geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u hierdie dokument geensins mag gebruik, versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per telefoon in kennis en vee die e-pos uit. Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uit hierdie e-pos en/of die oopmaak van enige l?ers aangeheg by hierdie e-pos nie. E-mail disclaimer This e-mail may contain confidential information and may...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rms package-superposition prediction curve of ols and data points
Hello, I have a question about the “plot.predict” function in Frank Harrell's rms package. Do you know how to superpose in the same graph the prediction curve of ols and raw data points? Put most simply, I would like to combine these two graphs: fit_linear - ols (y4 ~ rcs(x2,c(5,10,15,20,60,80,90)), x=TRUE, y=TRUE) p - Predict(fit_linear,x2,conf.int=FALSE) plot (p, ylim =c(-2,0.5), xlim=c(0,100)) # graph n°1 z - plot (x2,y4,ylim=c(-2,0.5),xlim=c(0,100),type=p,lwd=6,col=blue) # graph n°2 Thanks all, Agnès -- View this message in context: http://r.789695.n4.nabble.com/rms-package-superposition-prediction-curve-of-ols-and-data-points-tp4634503.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] shapiro.test()
Hey, today I wanted to use the shapiro.test() on data containing 3 numerical values per group. It is the first time that an NA was given back for some of the groups. In the follwing an example of code and output is shown: shapiro.test(c(0.000637806, 0.00175561, 0.001196708)) Shapiro-Wilk normality test data: c(0.000637806, 0.00175561, 0.001196708) W = 1, p-value = NA I am not able to find the bug in our data, so I think there might be a problem with the shapiro.test(). I use the following technical background: platform x86_64-pc-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 2 minor 14.1 year 2011 month 12 day22 svn rev57956 language R version.string R version 2.14.1 (2011-12-22) Thanks, Judith __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove empty levels in subset
Hi, I have exactly the same question (how to remove empty levels in my subset), but in my case the factor command does not work, because my dataframe is not atomic Try this: test2$a - factor(test2$a) R gives me the error message: Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? Do you have advice? Thank you -- View this message in context: http://r.789695.n4.nabble.com/Remove-empty-levels-in-subset-tp873967p4634499.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Intersection
Hello. I have a problem with 2 dataframes. There are 2 columns - value and dates. These dataframes have different dimension. Some dates coincide. And I need to intersect them by dates and have on output two dataframes with identical columns dates and new dimension . value have to recieve in compliance with dates. Regards, Aleksander. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rms package-superposition prediction curve of ols and data points
You could use points() instead of plot() for the second command. Sarah On Tue, Jun 26, 2012 at 8:37 AM, achaumont agnes.chaum...@live.be wrote: Hello, I have a question about the “plot.predict” function in Frank Harrell's rms package. Do you know how to superpose in the same graph the prediction curve of ols and raw data points? Put most simply, I would like to combine these two graphs: fit_linear - ols (y4 ~ rcs(x2,c(5,10,15,20,60,80,90)), x=TRUE, y=TRUE) p - Predict(fit_linear,x2,conf.int=FALSE) plot (p, ylim =c(-2,0.5), xlim=c(0,100)) # graph n°1 z - plot (x2,y4,ylim=c(-2,0.5),xlim=c(0,100),type=p,lwd=6,col=blue) # graph n°2 Thanks all, Agnès -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Intersection
That sounds like a job for merge(), but it's hard to be sure because you didn't provide the information requested in the posting guide. Sarah On Tue, Jun 26, 2012 at 11:03 AM, Васильченко Александр vasilchenko@gmail.com wrote: Hello. I have a problem with 2 dataframes. There are 2 columns - value and dates. These dataframes have different dimension. Some dates coincide. And I need to intersect them by dates and have on output two dataframes with identical columns dates and new dimension . value have to recieve in compliance with dates. Regards, Aleksander. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove empty levels in subset
Hi, On Tue, Jun 26, 2012 at 8:06 AM, svo s.vanom...@uu.nl wrote: Hi, I have exactly the same question (how to remove empty levels in my subset), but in my case the factor command does not work, because my dataframe is not atomic Try this: test2$a - factor(test2$a) R gives me the error message: Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? Do you have advice? I have two pieces of advice. 1. Don't try to use factor() on your entire data frame, but only on a single column at a time, as shown in the example you included. 2. Provide an example of your data using something like dput(head(mydata, 10)) so we can offer actual working code. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data.table vs plyr reg output
Hello. The data.table package is very helpful in terms of speed. But I am having trouble actually using the output from linear regression. Is there any way to get the data.table output to be as pretty/useful as that from the plyr package? Below is an example. library('data.table'); library('plyr'); REG - data.table(ID=c(rep('Frank',5),rep('Tony',5),rep('Ed',5)), y=rnorm(15), x=rnorm(15), z=rnorm(15)); REG; #The ddply function from the plyr package produces very neat and useful output; ddply(REG, .(ID), function(x) coef(lm(y ~ x + z, data=x))); #The data.table output is fast, but not very neat (in terms of the order of the coefficient estimates). Is there any way to get the data.table output to look more like the plyr/ddply output (without making a list for each coef and running the regression two times)? REG[, coef(lm(y ~ x + z)), by=ID]; Thank you! Geoff [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Intersection
Hi. Try with following functions: ?intersection ?%in% ?[ Perhaps someone will provide you more help if you read and follow posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html Andrija On Tue, Jun 26, 2012 at 5:03 PM, ÷ÁÓÉÌØÞÅÎËÏ áÌÅËÓÁÎÄÒ vasilchenko@gmail.com wrote: Hello. I have a problem with 2 dataframes. There are 2 columns - value and dates. These dataframes have different dimension. Some dates coincide. And I need to intersect them by dates and have on output two dataframes with identical columns dates and new dimension . value have to recieve in compliance with dates. Regards, Aleksander. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test()
See ?shapiro.test ...the number of non-missing values must be between 3 and 5000. By the way, how reasonable testing normality of 3 values? Best ozgur -- View this message in context: http://r.789695.n4.nabble.com/shapiro-test-tp4634513p4634520.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting two histograms on one plot with hist function
Why not just plot the two histograms on the same scale in a 2 panel plot? John Kane Kingston ON Canada -Original Message- From: mb...@sun.ac.za Sent: Tue, 26 Jun 2012 15:24:55 +0200 To: r-help@r-project.org Subject: [R] plotting two histograms on one plot with hist function I would like to plot two data sets (frequency (y-axis) of mean values for 0-1(x=axis)) on a single histogram for comparison. The hist() only allow the overlay of two histograms, and although barplot() allows beside=TRUE, it does not show frequency values (like hist) but rather all of the values. Is there any way that I can use the hist() to plot two data sets similar to the barplot(). Any help or advice will be appreciated! Kind regards, Marguerite E-pos vrywaringsklousule Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd wees en is slegs bedoel vir die persoon aan wie dit geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u hierdie dokument geensins mag gebruik, versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per telefoon in kennis en vee die e-pos uit. Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uit hierdie e-pos en/of die oopmaak van enige l?ers aangeheg by hierdie e-pos nie. E-mail disclaimer This e-mail may contain confidential information and may...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rms package-superposition prediction curve of ols and data points
On Jun 26, 2012, at 11:29 AM, Sarah Goslee wrote: You could use points() instead of plot() for the second command. Ummm. Maybe not. I think think that plot.Predict uses lattice graphics. You may need to use trellis.focus() followed by lpoints(). Or use the + operation with suitable objects. -- David. Sarah On Tue, Jun 26, 2012 at 8:37 AM, achaumont agnes.chaum...@live.be wrote: Hello, I have a question about the “plot.predict” function in Frank Harrell's rms package. Do you know how to superpose in the same graph the prediction curve of ols and raw data points? Put most simply, I would like to combine these two graphs: fit_linear - ols (y4 ~ rcs(x2,c(5,10,15,20,60,80,90)), x=TRUE, y=TRUE) p - Predict(fit_linear,x2,conf.int=FALSE) plot (p, ylim =c(-2,0.5), xlim=c(0,100)) # graph n°1 z - plot (x2,y4,ylim=c(-2,0.5),xlim=c(0,100),type=p,lwd=6,col=blue) # graph n°2 Thanks all, Agnès -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting two histograms on one plot with hist function
On Tue, Jun 26, 2012 at 10:02 AM, John Kane jrkrid...@inbox.com wrote: Why not just plot the two histograms on the same scale in a 2 panel plot? I think OP request was for comparison. Two panels may do, but why not a barplot of the histograms in the same panel ? barplot( rbind( hist(rbeta(30,2,4),breaks=seq(0,1,.1),plot=F)$counts, hist(rbeta(30,6,8),breaks=seq(0,1,.1),plot=F)$counts), beside=T) see str(hist(yourdata)) or ?hist Cheers Ilai John Kane Kingston ON Canada -Original Message- From: mb...@sun.ac.za Sent: Tue, 26 Jun 2012 15:24:55 +0200 To: r-help@r-project.org Subject: [R] plotting two histograms on one plot with hist function I would like to plot two data sets (frequency (y-axis) of mean values for 0-1(x=axis)) on a single histogram for comparison. The hist() only allow the overlay of two histograms, and although barplot() allows beside=TRUE, it does not show frequency values (like hist) but rather all of the values. Is there any way that I can use the hist() to plot two data sets similar to the barplot(). Any help or advice will be appreciated! Kind regards, Marguerite E-pos vrywaringsklousule Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd wees en is slegs bedoel vir die persoon aan wie dit geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u hierdie dokument geensins mag gebruik, versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per telefoon in kennis en vee die e-pos uit. Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uit hierdie e-pos en/of die oopmaak van enige l?ers aangeheg by hierdie e-pos nie. E-mail disclaimer This e-mail may contain confidential information and may...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk and most webmails __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test()
Actually, your sample size is 3. Sorry for that. Ozgur -- View this message in context: http://r.789695.n4.nabble.com/shapiro-test-tp4634513p4634525.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] increase the usage of CPU and Memory
On 26-06-2012 16:33, Oliver Ruebenacker wrote: Hello Xi, If a program has input or output to disk or network, this may cause it to wait and not use the available CPU. Output is usually buffered, but may cause delay if the buffer gets full (I'm not sure though whether this is an issue with plenty of memory available) Take care Oliver On Mon, Jun 25, 2012 at 8:07 PM, Xi amzhan...@gmail.com wrote: Dear All, I have been searching online for help increasing my R code more efficiently for almost a whole day, however, there is no solution to my case. So if anyone could give any clue to solve my problem, I would be very appreciate for you help. Thanks in advance. Here is my issue: My desktop is with i7-950 Quad-core CPU with 24Gb memory, and a NVIDIA GTX 480 graphic card, and I am using a 64-bit version of R under 64-bit Windows . I am running a for loop to generate a 461*5 matrix data, which is coming from the coefficients of 5 models. The loop would produce 5 values one time, and it will run 461 times in total. I have tried to run the code inside the loop just once, it will cost almost 10 seconds, so if we intuitively calculate the time of the whole loop will cost, it would be 4610 seconds, equal to almost one and a half hours, which is exactly the whole loop taking indeed. But I have to run this kinda loop for 30 data-sets! Although I thought I am using a not-bad at all desktop, I checked the usage of CPU and memory during my running R code, and found out the whole code just used 15% of CPU and 10% of memory. Does anyone have the same issue with me? or Does anyone know some methods to shorten the running time and increase the usage of CPU and memory? Many thanks, Xi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hi Oliver, can you please give some details on what you are meaning by 'Output is usually buffered'? Thanks and regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Intersection
Hi, Try this: dat1-data.frame(value=c(15,20,25,30,45,50),dates=c(2005-05-25,2005-06-25,2005-07-25,2005-08-25,2005-09-25,2005-10-25)) dat2-data.frame(value=c(15,20,25,50),dates=c(2005-05-25,2005-06-25,2005-07-25,2005-10-25)) merge(dat1,dat2, by=dates) dates value.x value.y 1 2005-05-25 15 15 2 2005-06-25 20 20 3 2005-07-25 25 25 4 2005-10-25 50 50 or subset(dat1,(dates %in% dat2$dates)) value dates 1 15 2005-05-25 2 20 2005-06-25 3 25 2005-07-25 6 50 2005-10-25 I hope this is what you meant. You mentioned the datasets have different dimensions. Not sure what you meant. A.K. - Original Message - From: Васильченко Александр vasilchenko@gmail.com To: r-help@r-project.org Cc: Sent: Tuesday, June 26, 2012 11:03 AM Subject: [R] Intersection Hello. I have a problem with 2 dataframes. There are 2 columns - value and dates. These dataframes have different dimension. Some dates coincide. And I need to intersect them by dates and have on output two dataframes with identical columns dates and new dimension . value have to recieve in compliance with dates. Regards, Aleksander. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How To Setup hunspell in R
Do you make any progress in solving this? I'm having the same struggle. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/How-To-Setup-hunspell-in-R-tp4541801p4634523.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ljung-Box test (Box.test)
I fit a simple linear model y = bX to a data set today, and that produced 24 residuals (I have 24 data points, one for each year from 1984-2007). I would like to test the time-independence of the residuals of my model, and I was recommended by my supervisor to use the Ljung-Box test. The Box.test function in R takes 4 arguments: x a numeric vector or univariate time series. lag the statistic will be based on lag autocorrelation coefficients. type test to be performed: partial matching is used. fitdf number of degrees of freedom to be subtracted if x is a series of residuals. Unfortunately, I never took a statistics class where I learned the Ljung-Box test, and information about it online is hard to find. What does lag mean, and what value would you guys recommend I use for the test? Also, what does fitdf represent, and what would the value for that parameter be in my case? Finally, the value of x is a vector of my 24 residuals, correct? Thank you all so much. I apologize for the basic nature of the question. Steven [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test()
On Jun 26, 2012, at 16:43 , r...@uni-potsdam.de wrote: Hey, today I wanted to use the shapiro.test() on data containing 3 numerical values per group. It is the first time that an NA was given back for some of the groups. In the follwing an example of code and output is shown: shapiro.test(c(0.000637806, 0.00175561, 0.001196708)) Shapiro-Wilk normality test data: c(0.000637806, 0.00175561, 0.001196708) W = 1, p-value = NA I am not able to find the bug in our data, so I think there might be a problem with the shapiro.test(). The clue is that diff(sort(c(0.000637806, 0.00175561, 0.001196708))) [1] 0.000558902 0.000558902 which is either an extreme coincidence or a sign that your data are not independent samples from a continuous distribution. Since the normal quantiles are also equidistant, you get a correlation of W=1 in the QQ-plot, and apparently this triggers the NA p-value. I suppose returning p=1.0 would arguably be a better choice for this case, but it _is_ pretty extreme. -pd I use the following technical background: platform x86_64-pc-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 2 minor 14.1 year 2011 month 12 day22 svn rev57956 language R version.string R version 2.14.1 (2011-12-22) Thanks, Judith __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] increase the usage of CPU and Memory
Hello Christopher, If a process has data to write to hard disk, the data is usually written to a buffer in memory, and from there it is written to the hard disk independently of the CPU. Since writing to memory is much faster than writing to hard disk, this allows the process to run faster. To the process, it appears as if the data is already on disk. If, however, the buffer runs full, an attempt by a process to write more data will cause the process to wait until space is available in the buffer. If a process spends time waiting, it means it does not use all the CPU it could otherwise. I don't know how much input is buffered, but since only the process knows where it will request input from next, this limits ways to buffer input. I'm assuming though, that if you open a file and read the first few bytes, some more bytes may be read into a buffer since the process is likely to request them next. But in any case, input form disk or network is almost certain to cause waiting times and therefore decreases used CPU time. Take care Oliver On Tue, Jun 26, 2012 at 1:53 PM, Christofer Bogaso bogaso.christo...@gmail.com wrote: On 26-06-2012 16:33, Oliver Ruebenacker wrote: Hello Xi, If a program has input or output to disk or network, this may cause it to wait and not use the available CPU. Output is usually buffered, but may cause delay if the buffer gets full (I'm not sure though whether this is an issue with plenty of memory available) Take care Oliver On Mon, Jun 25, 2012 at 8:07 PM, Xi amzhan...@gmail.com wrote: Dear All, I have been searching online for help increasing my R code more efficiently for almost a whole day, however, there is no solution to my case. So if anyone could give any clue to solve my problem, I would be very appreciate for you help. Thanks in advance. Here is my issue: My desktop is with i7-950 Quad-core CPU with 24Gb memory, and a NVIDIA GTX 480 graphic card, and I am using a 64-bit version of R under 64-bit Windows . I am running a for loop to generate a 461*5 matrix data, which is coming from the coefficients of 5 models. The loop would produce 5 values one time, and it will run 461 times in total. I have tried to run the code inside the loop just once, it will cost almost 10 seconds, so if we intuitively calculate the time of the whole loop will cost, it would be 4610 seconds, equal to almost one and a half hours, which is exactly the whole loop taking indeed. But I have to run this kinda loop for 30 data-sets! Although I thought I am using a not-bad at all desktop, I checked the usage of CPU and memory during my running R code, and found out the whole code just used 15% of CPU and 10% of memory. Does anyone have the same issue with me? or Does anyone know some methods to shorten the running time and increase the usage of CPU and memory? Many thanks, Xi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hi Oliver, can you please give some details on what you are meaning by 'Output is usually buffered'? Thanks and regards, -- Oliver Ruebenacker, Bioinformatics and Network Analysis Consultant President and Founder of Knowomics (http://www.knowomics.com/wiki/Oliver_Ruebenacker) Consultant at Predictive Medicine (http://predmed.com/people/oliverruebenacker.html) SBPAX: Turning Bio Knowledge into Math Models (http://www.sbpax.org) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Compile C files
Hello: Sorry, this might look like a beginner question, but I'm just starting to work on the C and R interface. I'm trying to compile a C file (with a function) to load it to an R function but, in the command line I keep getting a lot of errors, like: C:/Program~1/R/R-215~1.0/include/Rinternals.h:1066:1: error: expected declaration specifiers before 'SEXP' I've been able to compile this file before, so I I'm using Windows 7 in a 64 bits computer. Best regards, Frederico [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to specify newdata in a Cox-Modell with a time dependent interaction term?
I'm finally back from vacation and looking at your email. 1. The primary mistake is in your call, where you say fit - survfit(mod.allison.5, newdata.1, id=Id) This will use the character string Id as the value of the identifier, not the data. The effect is exactly the same as the difference between print(x) and print('x'). 2. In reply to John's comment that all the id values are the same. It is correct. Normally the survfit routine is used to produce multiple curves, one curve per line of the input data, for time-independent variables. The presence of an id argument is used to tell it that there are multiple lines per subject in the data, e.g. time-dependent covariates. So even though there is only one curve being produced we need an id statement to trigger the behavior. If you only want one curve for one individual, then individual=TRUE is an alternate, as John pointed out. 3. It's very important to specify the Surv object and the formula directly in the coxph function ... Yes, I agree. I always use your suggested form because it gives better documentation -- variable names are directly visible in the coxph call. I don't understand the attraction of the other form, but lot's of people use it. Why did it go wrong? Because the survfit function was evaluating Surv(Rossi.2$start, Rossi.2$stop, Rossi.2$arrest.time) ~ fin + age + age:stop + pro, data=newdata.1 The length of the variables will be different. The error message comes from the R internals, not my program. Terry Therneau On 06/16/2012 08:04 AM, Jürgen Biedermann wrote: Dear Mr. Therneau, Mr. Fox, or to whoever, who has some time... I don't find a solution to use the survfit function (package: survival) for a defined pattern of covariates with a Cox-Model including a time dependent interaction term. Somehow the definition of my newdata argument seems to be erroneous. I already googled the problem, found many persons having the same or a similar problem, but still no solution. I want to stress that my time-dependent covariate does not depend on the failure of an individual (in this case it doesn't seem sensible to predict a survivor function for an individual). Rather one of my effects declines with time (time-dependent coefficient). For illustration, I use the example of John Fox's paper Cox Proportional - Hazards Regression for Survival Data. http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf Do you know any help? See code below Thanks very much in advance Jürgen Biedermann # #Code Rossi - read.table(http://cran.r-project.org/doc/contrib/Fox-Companion/Rossi.txt;, header=T) Rossi.2 - fold(Rossi, time='week', event='arrest', cov=11:62, cov.names='employed') # see below for the fold function from John Fox # modeling an interaction with time (Page 14) mod.allison.5 - coxph(Surv(start, stop, arrest.time) ~ fin + age + age:stop + prio, data=Rossi.2) mod.allison.5 # Attempt to get the survivor function of a person with age=30, fin=0 and prio=5 newdata.1 - data.frame(unique(Rossi.2[c(start,stop)]),fin=0,age=30,prio=5,Id=1,arrest.time=0) fit - survfit(mod.allison.5,newdata.1,id=Id) Error message: Fehler in model.frame.default(data = newdata.1, id = Id, formula = Surv(start, : Variablenlängen sind unterschiedlich (gefunden für '(id)') -- failure, length of variables are different. #- fold - function(data, time, event, cov, cov.names=paste('covariate', '.', 1:ncovs, sep=), suffix='.time', cov.times=0:ncov, common.times=TRUE, lag=0){ vlag - function(x, lag) c(rep(NA, lag), x[1:(length(x)-lag)]) xlag - function(x, lag) apply(as.matrix(x), 2, vlag, lag=lag) all.cov - unlist(cov) if (!is.list(cov)) cov - list(cov) ncovs - length(cov) nrow - nrow(data) ncol - ncol(data) ncov - length(cov[[1]]) nobs - nrow*ncov if (length(unique(c(sapply(cov, length), length(cov.times)-1))) 1) stop(paste( all elements of cov must be of the same length and \n, cov.times must have one more entry than each element of cov.)) var.names - names(data) subjects - rownames(data) omit.cols - if (!common.times) c(all.cov, cov.times) else all.cov keep.cols - (1:ncol)[-omit.cols] nkeep - length(keep.cols) if (is.numeric(event)) event - var.names[event] times - if (common.times) matrix(cov.times, nrow, ncov+1, byrow=T) else data[, cov.times] new.data - matrix(Inf, nobs, 3 + ncovs + nkeep) rownames - rep(, nobs) colnames(new.data) - c('start', 'stop', paste(event, suffix, sep=), var.names[-omit.cols], cov.names) end.row - 0 for (i in 1:nrow){ start.row - end.row + 1 end.row - end.row + ncov start -
Re: [R] Ljung-Box test (Box.test)
Hello, That's a statistics question, but it's also about using an R function. The Ljung-Box test isn't supposed to be used in such a context, to test the residuals of an ols y = bX + e. It is used to test time independence of the original series or of the residuals of an ARMA(p, q) fit. In both cases you are right, 'x' is a series. 'lag' can be explained as follows: you have a time series and want to know if the value observed today depends on what was observed in the past. Then, a linear regression of today on yesterday could be X[t] = b[1]*X[t-1] + e[t], e ~ Normal(0, sigma^2) A linear regression on two time units in the past would be X[t] = b[1]*X[t-1] + b[2]*X[t-2] + e[t], e ~ Normal(0, sigma^2) etc. This is a regression of the series on itself lagged by a certain number of time units, the present is regressed on the past. Function ar() fits this kind of model to a time series. In the first case, the order is p=1, in the second, p=2. Now, in the first case, is there second order serial correlation? Test the residuals with lag=2, fitdf=1, the value of p. Third order? lag=3, fitdf=p=1, etc. You are NOT fitting this type of model, so the Ljung-Box test is misused. Test the original series with default parameters, lag=1. If there is serial correlation, fit an AR (Auto-Regressive) model with ar(). See the help page ?ar. And see a statiscian with experience in time series. It's a world on its own, I haven't even mentioned seasonality. And almost everything else about time series. Do ask someone near you. Hope this helps, Rui Barradas Em 26-06-2012 19:01, Steven Winter escreveu: I fit a simple linear model y = bX to a data set today, and that produced 24 residuals (I have 24 data points, one for each year from 1984-2007). I would like to test the time-independence of the residuals of my model, and I was recommended by my supervisor to use the Ljung-Box test. The Box.test function in R takes 4 arguments: x a numeric vector or univariate time series. lag the statistic will be based on lag autocorrelation coefficients. type test to be performed: partial matching is used. fitdf number of degrees of freedom to be subtracted if x is a series of residuals. Unfortunately, I never took a statistics class where I learned the Ljung-Box test, and information about it online is hard to find. What does lag mean, and what value would you guys recommend I use for the test? Also, what does fitdf represent, and what would the value for that parameter be in my case? Finally, the value of x is a vector of my 24 residuals, correct? Thank you all so much. I apologize for the basic nature of the question. Steven [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Storing whole regression results
Hello seasons R users, Is it possible to store a complete regression result into an array? I've already been able to save individual regression coefficients, but would like to store the whole regression results into different arrays through a loop. So that in under different quantiles regressions, I would be able to create a loop and store the full regression result each time into a different array for printing. The only way I can think of is to pre-generate a whole set of arrays and matrices to individually store each regression coefficients one at a time. Thank you, Kevin Master of Science Student | University of Guelph Department of Food, Resource and Agricultural Economics J.D. MacLachlan Building - Room 002 Guelph, ON N1G 2W1 Webpage:http://fare.uoguelph.ca/users/kchang01 http://fare.uoguelph.ca/users/kchang01 Email:mailto:kchan...@uoguelph.ca kchan...@uoguelph.ca Mobile:226-979-2813 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Storing whole regression results
You can store entire regression results in a list, then use lapply() to retrieve individual coefficients as desired. Lists are very powerful for managing odd data formats, and no loops needed. Sarah On Tue, Jun 26, 2012 at 4:19 PM, Kevin Chang kchan...@uoguelph.ca wrote: Hello seasons R users, Is it possible to store a complete regression result into an array? I've already been able to save individual regression coefficients, but would like to store the whole regression results into different arrays through a loop. So that in under different quantiles regressions, I would be able to create a loop and store the full regression result each time into a different array for printing. The only way I can think of is to pre-generate a whole set of arrays and matrices to individually store each regression coefficients one at a time. Thank you, Kevin -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] chisq.test
Dear list! I would like to calculate chisq.test on simple data set with 70 observations, but the output is ''Warning message:'' Warning message: In chisq.test(tabele) : Chi-squared approximation may be incorrect Here is an example: tabele - matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = TRUE) dimnames(tabela) - list( SEX = c(M,F), HAIR = c(Brown, Black, Red, Blonde)) addmargins(tabele) prop.table(tabele) chisq.test(tabele) Please, give me an advice / suggestion / recommendation. Thanks a lot to all, OV [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Zero inflated: is there a limit to the level of inflation
Hello, I have count data that illustrate the presence or absence of individuals in my study population. I created a grid cell across the study area and calcuated a count value for each individual per season per year for each grid cell. The count value is the number of time an individual was present in each grid cell. For illustration my data columns look something like this and are repeated for each individual: Cell_ID Param1 Param2 Param3 Param4 COUNT NameYearSeason Cov 1 160.565994 729.08 15037930.3 0 AA 2010AUT Open 1 160.565994 729.08 15037930.3 22 AA 2011SPR Open 1 160.565994 729.08 15037930.3 12 AA 2009SUM Open 1 160.565994 729.08 15037930.3 0 AA 2010SUM Open 2 169.427001 491.87 1503.31 5101.09 0 AA 2010AUT oldHard 2 169.427001 491.87 1503.31 5101.09 16 AA 2011SPR oldHard 2 169.427001 491.87 1503.31 5101.09 0 AA 2009SUM oldHard 2 169.427001 491.87 1503.31 5101.09 0 AA 2010SUM oldHard … 563 86.777099 612.69 977 4474.6 62 AA 2010AUT Water 563 86.777099 612.69 977 4474.6 12 AA 2011SPR Water 563 86.777099 612.69 977 4474.6 55 AA 2009SUM Water 1 160.565994 729.08 15037930.3 0 BB 2010SUM Open 2 169.427001 491.87 1503.31 5101.09 72 BB 2010SUM oldHard 5 160.75 614.95 1503.31 2878.98 16 BB 2010SUM medHard 6 170.404998 510.58 1489.44 743.14 0 BB 2010SUM Water … 563 86.777099 612.69 977 4474.6 0 BB 2010SUM Water 1 160.565994 729.08 15037930.3 14 C 2005AUT Open 1 160.565994 729.08 15037930.3 0 C 2006AUT Open 1 160.565994 729.08 15037930.3 0 C 2006SPR Open 1 160.565994 729.08 15037930.3 56 C 2007SPR Open 1 160.565994 729.08 15037930.3 0 C 2006SUM Open 2 169.427001 491.87 1503.31 5101.09 124 C 2005AUT oldHard 2 169.427001 491.87 1503.31 5101.09 231 C 2006AUT oldHard 2 169.427001 491.87 1503.31 5101.09 889 C 2006SPR oldHard 2 169.427001 491.87 1503.31 5101.09 0 C 2007SPR oldHard … 563 86.777099 612.69 977 4474.6 0 C 2005 AUT Water 563 86.777099 612.69 977 4474.6 231 C 2006 AUT Water 563 86.777099 612.69 977 4474.6 185 C 2006 SPR Water 563 86.777099 612.69 977 4474.6 123 C 2007 SPR Water 563 86.777099 612.69 977 4474.6 52 C 2006 SUM Water I have 563 grid cells across my study area and each individual has 1-563 cells associated for each year and each season the individual was monitored. Therefore my grid cells are repeated. I end up with 71,000 records and 925 records have a Count value 0; which means 70,075 records have a Count value = 0. I wanted to run a zero inflated poisson model to determine mixed effects (of parameters) with individual as the random effect. But I have been advised two things: 1. I cannot run a zero inflated poisson model because my data are too extremely inflated (i.e. 70,075 vs 925) and 2. I cannot run the model with each cell repeated for each individual. I am told the model doesn't recognize that Cell_ID #1 for individual A is the same Cell_ID #1 for individual B. Does anyone know if either or both of these points are true? I would appreciate any thoughts, advice, or suggestions. Thanks! -Stephanie -- View this message in context: http://r.789695.n4.nabble.com/Zero-inflated-is-there-a-limit-to-the-level-of-inflation-tp4634532.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented,
[R] Figuring out encodings of PDFs in R
Dear list, I am currently scraping some text data from several PDFs using the readPDF() function in the tm package. This all works very well and in most cases the encoding seems to be latin1 - in some, however, it is not. Is there a good way in R to check character encodings? I found the functions is.utf8() and is.local() in the tau package but that obviously only gets me so far. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test
On 2012-06-26 11:27, Omphalodes Verna wrote: Dear list! I would like to calculate chisq.test on simple data set with 70 observations, but the output is ''Warning message:'' Warning message: In chisq.test(tabele) : Chi-squared approximation may be incorrect Here is an example: tabele- matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = TRUE) dimnames(tabela)- list( SEX = c(M,F), HAIR = c(Brown, Black, Red, Blonde)) addmargins(tabele) prop.table(tabele) chisq.test(tabele) Please, give me an advice / suggestion / recommendation. Do this: ct - chisq.test(tabele) ct$expected If that does not give you a sufficient hint, then you need to review the assumptions underlying the chisquare test. Peter Ehlers Thanks a lot to all, OV [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test
The warning means that you have many cells with expected values less than 5 (4 of 8 cells in this case) so that the chi square estimate may be inflated. The good news is that the probability of the inflated chi square is .0978 which you probably would not consider to be significant anyway. If you want to get a simulated p value using Monte Carlo simulation (see the references in the manual page for chisq.test), just change the call to chisq.test(tabele, simulate.p.value=TRUE, B=2000) When I run this five times, I get probability estimates ranging from .09795 to .1089. Alternatively, get more data. -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Omphalodes Verna Sent: Tuesday, June 26, 2012 1:28 PM To: r-help@r-project.org Subject: [R] chisq.test Dear list! I would like to calculate chisq.test on simple data set with 70 observations, but the output is ''Warning message:'' Warning message: In chisq.test(tabele) : Chi-squared approximation may be incorrect Here is an example: tabele - matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = TRUE) dimnames(tabela) - list( SEX = c(M,F), HAIR = c(Brown, Black, Red, Blonde)) addmargins(tabele) prop.table(tabele) chisq.test(tabele) Please, give me an advice / suggestion / recommendation. Thanks a lot to all, OV [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] flatten lists
I am looking for a function to flatten a list to a list of only 1 level deep. Very similar to unlist, however I don't want to turn it into a vector because then everything will be casted to character vectors: x - list(name=Jeroen, age=27, married=FALSE, home=list(country=Netherlands, city=Utrecht)) unlist(x) This function sort of does it: flatlist - function(mylist){ lapply(rapply(mylist, enquote, how=unlist), eval) } flatlist(x) However it is a bit slow. Is there a more native way? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting two histograms on one plot with hist function
Oh, I had not thought of it in those terms. It does make sense now. John Kane Kingston ON Canada -Original Message- From: ke...@math.montana.edu Sent: Tue, 26 Jun 2012 10:57:31 -0600 To: jrkrid...@inbox.com Subject: Re: [R] plotting two histograms on one plot with hist function On Tue, Jun 26, 2012 at 10:02 AM, John Kane [1]jrkrid...@inbox.com wrote: Why not just plot the two histograms on the same scale in a 2 panel plot? I think OP request was for comparison. Two panels may do, but why not a barplot of the histograms in the same panel ? barplot( rbind( hist(rbeta(30,2,4),breaks=seq(0,1,.1),plot=F)$counts, hist(rbeta(30,6,8),breaks=seq(0,1,.1),plot=F)$counts), beside=T) see str(hist(yourdata)) or ?hist Cheers Ilai John Kane Kingston ON Canada -Original Message- From: [2]mb...@sun.ac.za Sent: Tue, 26 Jun 2012 15:24:55 +0200 To: [3]r-help@r-project.org Subject: [R] plotting two histograms on one plot with hist function I would like to plot two data sets (frequency (y-axis) of mean values for 0-1(x=axis)) on a single histogram for comparison. The hist() only allow the overlay of two histograms, and although barplot() allows beside=TRUE, it does not show frequency values (like hist) but rather all of the values. Is there any way that I can use the hist() to plot two data sets similar to the barplot(). Any help or advice will be appreciated! Kind regards, Marguerite E-pos vrywaringsklousule Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd wees en is slegs bedoel vir die persoon aan wie dit geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u hierdie dokument geensins mag gebruik, versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per telefoon in kennis en vee die e-pos uit. Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uit hierdie e-pos en/of die oopmaak van enige l?ers aangeheg by hierdie e-pos nie. E-mail disclaimer This e-mail may contain confidential information and may...{{dropped:11}} __ [4]R-help@r-project.org mailing list [5]https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide [6]http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. GET FREE SMILEYS FOR YOUR IMEMAIL - Learn more at [7]http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails __ [8]R-help@r-project.org mailing list [9]https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide [10]http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ Free Online Photosharing - Share your photos online with your friends and family! Visit [11]http://www.inbox.com/photosharing to find out more! References 1. mailto:jrkrid...@inbox.com 2. mailto:mb...@sun.ac.za 3. mailto:r-help@r-project.org 4. mailto:R-help@r-project.org 5. https://stat.ethz.ch/mailman/listinfo/r-help 6. http://www.R-project.org/posting-guide.html 7. http://www.inbox.com/smileys 8. mailto:R-help@r-project.org 9. https://stat.ethz.ch/mailman/listinfo/r-help 10. http://www.R-project.org/posting-guide.html 11. http://www.inbox.com/photosharing __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Zero inflated: is there a limit to the level of inflation
On Jun 26, 2012, at 2:10 PM, SSimek wrote: Hello, I have count data that illustrate the presence or absence of individuals in my study population. I created a grid cell across the study area and calcuated a count value for each individual per season per year for each grid cell. The count value is the number of time an individual was present in each grid cell. For illustration my data columns look something like this and are repeated for each individual: Cell_ID Param1 Param2 Param3 Param4 COUNT NameYearSeason Cov 1 160.565994 729.08 15037930.3 0 AA 2010AUT Open 1 160.565994 729.08 15037930.3 22 AA 2011SPR Open 1 160.565994 729.08 15037930.3 12 AA 2009SUM Open 1 160.565994 729.08 15037930.3 0 AA 2010SUM Open 2 169.427001 491.87 1503.31 5101.09 0 AA 2010AUT oldHard 2 169.427001 491.87 1503.31 5101.09 16 AA 2011SPR oldHard 2 169.427001 491.87 1503.31 5101.09 0 AA 2009SUM oldHard 2 169.427001 491.87 1503.31 5101.09 0 AA 2010SUM oldHard … 563 86.777099 612.69 977 4474.6 62 AA 2010AUT Water 563 86.777099 612.69 977 4474.6 12 AA 2011SPR Water 563 86.777099 612.69 977 4474.6 55 AA 2009SUM Water 1 160.565994 729.08 15037930.3 0 BB 2010SUM Open 2 169.427001 491.87 1503.31 5101.09 72 BB 2010SUM oldHard 5 160.75 614.95 1503.31 2878.98 16 BB 2010SUM medHard 6 170.404998 510.58 1489.44 743.14 0 BB 2010SUM Water … 563 86.777099 612.69 977 4474.6 0 BB 2010SUM Water 1 160.565994 729.08 15037930.3 14 C 2005AUT Open 1 160.565994 729.08 15037930.3 0 C 2006AUT Open 1 160.565994 729.08 15037930.3 0 C 2006SPR Open 1 160.565994 729.08 15037930.3 56 C 2007SPR Open 1 160.565994 729.08 15037930.3 0 C 2006SUM Open 2 169.427001 491.87 1503.31 5101.09 124 C 2005AUT oldHard 2 169.427001 491.87 1503.31 5101.09 231 C 2006AUT oldHard 2 169.427001 491.87 1503.31 5101.09 889 C 2006SPR oldHard 2 169.427001 491.87 1503.31 5101.09 0 C 2007SPR oldHard … 563 86.777099 612.69 977 4474.6 0 C 2005 AUT Water 563 86.777099 612.69 977 4474.6 231 C 2006 AUT Water 563 86.777099 612.69 977 4474.6 185 C 2006 SPR Water 563 86.777099 612.69 977 4474.6 123 C 2007 SPR Water 563 86.777099 612.69 977 4474.6 52 C 2006 SUM Water I have 563 grid cells across my study area and each individual has 1-563 cells associated for each year and each season the individual was monitored. Therefore my grid cells are repeated. I end up with 71,000 records and 925 records have a Count value 0; which means 70,075 records have a Count value = 0. I wanted to run a zero inflated poisson model to determine mixed effects (of parameters) with individual as the random effect. But I have been advised two things: 1. I cannot run a zero inflated poisson model because my data are too extremely inflated (i.e. 70,075 vs 925) and 2. I cannot run the model with each cell repeated for each individual. I am told the model doesn't recognize that Cell_ID #1 for individual A is the same Cell_ID #1 for individual B. Does anyone know if either or both of these points are true? I would appreciate any thoughts, advice, or suggestions. Thanks! -Stephanie Hi Stephanie, Some comments: 1. You should think about or at least be open to a zero inflated negative binomial distribution rather than zero inflated poisson. 2. You should at least review the vignette for the pscl CRAN package, which provides standard fixed effects models and related functions for count based data and importantly,
Re: [R] Zero inflated: is there a limit to the level of inflation
On Tue, 26 Jun 2012, Marc Schwartz wrote: On Jun 26, 2012, at 2:10 PM, SSimek wrote: Hello, I have count data that illustrate the presence or absence of individuals in my study population. I created a grid cell across the study area and calcuated a count value for each individual per season per year for each grid cell. The count value is the number of time an individual was present in each grid cell. For illustration my data columns look something like this and are repeated for each individual: Cell_ID Param1 Param2 Param3 Param4 COUNT NameYearSeason Cov 1 160.565994 729.08 15037930.3 0 AA 2010AUT Open 1 160.565994 729.08 15037930.3 22 AA 2011SPR Open 1 160.565994 729.08 15037930.3 12 AA 2009SUM Open 1 160.565994 729.08 15037930.3 0 AA 2010SUM Open 2 169.427001 491.87 1503.31 5101.09 0 AA 2010AUT oldHard 2 169.427001 491.87 1503.31 5101.09 16 AA 2011SPR oldHard 2 169.427001 491.87 1503.31 5101.09 0 AA 2009SUM oldHard 2 169.427001 491.87 1503.31 5101.09 0 AA 2010SUM oldHard ? 563 86.777099 612.69 977 4474.6 62 AA 2010AUT Water 563 86.777099 612.69 977 4474.6 12 AA 2011SPR Water 563 86.777099 612.69 977 4474.6 55 AA 2009SUM Water 1 160.565994 729.08 15037930.3 0 BB 2010SUM Open 2 169.427001 491.87 1503.31 5101.09 72 BB 2010SUM oldHard 5 160.75 614.95 1503.31 2878.98 16 BB 2010SUM medHard 6 170.404998 510.58 1489.44 743.14 0 BB 2010SUM Water ? 563 86.777099 612.69 977 4474.6 0 BB 2010SUM Water 1 160.565994 729.08 15037930.3 14 C 2005AUT Open 1 160.565994 729.08 15037930.3 0 C 2006AUT Open 1 160.565994 729.08 15037930.3 0 C 2006SPR Open 1 160.565994 729.08 15037930.3 56 C 2007SPR Open 1 160.565994 729.08 15037930.3 0 C 2006SUM Open 2 169.427001 491.87 1503.31 5101.09 124 C 2005AUT oldHard 2 169.427001 491.87 1503.31 5101.09 231 C 2006AUT oldHard 2 169.427001 491.87 1503.31 5101.09 889 C 2006SPR oldHard 2 169.427001 491.87 1503.31 5101.09 0 C 2007SPR oldHard ? 563 86.777099 612.69 977 4474.6 0 C 2005 AUT Water 563 86.777099 612.69 977 4474.6 231 C 2006 AUT Water 563 86.777099 612.69 977 4474.6 185 C 2006 SPR Water 563 86.777099 612.69 977 4474.6 123 C 2007 SPR Water 563 86.777099 612.69 977 4474.6 52 C 2006 SUM Water I have 563 grid cells across my study area and each individual has 1-563 cells associated for each year and each season the individual was monitored. Therefore my grid cells are repeated. I end up with 71,000 records and 925 records have a Count value 0; which means 70,075 records have a Count value = 0. I wanted to run a zero inflated poisson model to determine mixed effects (of parameters) with individual as the random effect. But I have been advised two things: 1. I cannot run a zero inflated poisson model because my data are too extremely inflated (i.e. 70,075 vs 925) and 2. I cannot run the model with each cell repeated for each individual. I am told the model doesn't recognize that Cell_ID #1 for individual A is the same Cell_ID #1 for individual B. Does anyone know if either or both of these points are true? I would appreciate any thoughts, advice, or suggestions. Thanks! -Stephanie Hi Stephanie, Some comments: 1. You should think about or at least be open to a zero inflated negative binomial distribution rather than zero inflated poisson. 2. You should at least review the vignette for the pscl CRAN package, which provides standard fixed effects models and related functions for count based data and importantly, some good conceptual content: http://cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf 3. Given the repeated measures framework and correlation issues you likely have, you should subscribe to and re-post your query to the R-sig-mixed-models list: https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models which will avail you of experts in the field. 4. There is also a draft FAQ for mixed models here: http://glmm.wikidot.com/faq which I believe is maintained by Ben Bolker,
Re: [R] flatten lists
do.call(c, x) maybe? On Tue, Jun 26, 2012 at 02:25:40PM -0700, Jeroen Ooms wrote: I am looking for a function to flatten a list to a list of only 1 level deep. Very similar to unlist, however I don't want to turn it into a vector because then everything will be casted to character vectors: x - list(name=Jeroen, age=27, married=FALSE, home=list(country=Netherlands, city=Utrecht)) unlist(x) This function sort of does it: flatlist - function(mylist){ lapply(rapply(mylist, enquote, how=unlist), eval) } flatlist(x) However it is a bit slow. Is there a more native way? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] flatten lists
Hmm that doesn't seem to work if the original list is nested more than 2 levels deep. I should have probably given a better example: x - list(name=Jeroen, age=27, married=FALSE, home=list(country=list(name=Netherlands, short=NL), city=Utrecht)) On Tue, Jun 26, 2012 at 3:04 PM, Neal Fultz nfu...@gmail.com wrote: do.call(c, x) maybe? On Tue, Jun 26, 2012 at 02:25:40PM -0700, Jeroen Ooms wrote: I am looking for a function to flatten a list to a list of only 1 level deep. Very similar to unlist, however I don't want to turn it into a vector because then everything will be casted to character vectors: x - list(name=Jeroen, age=27, married=FALSE, home=list(country=Netherlands, city=Utrecht)) unlist(x) This function sort of does it: flatlist - function(mylist){ lapply(rapply(mylist, enquote, how=unlist), eval) } flatlist(x) However it is a bit slow. Is there a more native way? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] flatten lists
Alright, but I need something recursive for lists with arbitrary deepness. On Tue, Jun 26, 2012 at 3:37 PM, arun smartpink...@yahoo.com wrote: Hi, Try: do.call(c,do.call(c,x)) x1-do.call(c,do.call(c,x)) x2-flatlist(x) identical(x1,x2) [1] TRUE A.K. - Original Message - From: Jeroen Ooms jeroen.o...@stat.ucla.edu To: Neal Fultz nfu...@gmail.com Cc: r-help@r-project.org Sent: Tuesday, June 26, 2012 6:23 PM Subject: Re: [R] flatten lists Hmm that doesn't seem to work if the original list is nested more than 2 levels deep. I should have probably given a better example: x - list(name=Jeroen, age=27, married=FALSE, home=list(country=list(name=Netherlands, short=NL), city=Utrecht)) On Tue, Jun 26, 2012 at 3:04 PM, Neal Fultz nfu...@gmail.com wrote: do.call(c, x) maybe? On Tue, Jun 26, 2012 at 02:25:40PM -0700, Jeroen Ooms wrote: I am looking for a function to flatten a list to a list of only 1 level deep. Very similar to unlist, however I don't want to turn it into a vector because then everything will be casted to character vectors: x - list(name=Jeroen, age=27, married=FALSE, home=list(country=Netherlands, city=Utrecht)) unlist(x) This function sort of does it: flatlist - function(mylist){ lapply(rapply(mylist, enquote, how=unlist), eval) } flatlist(x) However it is a bit slow. Is there a more native way? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test
On Jun 26, 2012, at 2:27 PM, Omphalodes Verna wrote: Dear list! I would like to calculate chisq.test on simple data set with 70 observations, but the output is ''Warning message:'' Warning message: In chisq.test(tabele) : Chi-squared approximation may be incorrect Here is an example: tabele - matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = TRUE) dimnames(tabela) - list( SEX = c(M,F), HAIR = c(Brown, Black, Red, Blonde)) addmargins(tabele) prop.table(tabele) chisq.test(tabele) Please, give me an advice / suggestion / recommendation. Read any introductory stats book regarding small cell sizes: [,1] [,2] [,3] [,4] [1,] 11335 [2,]3 186 21 -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Indexing matrices from the Matrix package with [i, j] seems to be very slow. Are there faster alternatives?
Dear Duncan Thanks for your suggestion, but I really need sparse matrices: I have implemented various graph algorithms based on adjacency matrices. For large graphs, storing all the 0's in an adjacency matrices become uneconomical, and therefore I thought I would use sparse matrices but the speed of [i,j] will slow down the algorithms. However, using RcppEigen it is possible to mimic [i,j] with a slowdown of only a factor 16 which is much better than what is obtained when using [i,j]: benchmark(lookup(mm,`[`), lookup(MM,`[`), lookup(MM, Xiijj), + columns=c(test, replications, elapsed, relative), replications=5) test replications elapsed relative 1 lookup(mm, `[`)50.05 1.0 2 lookup(MM, `[`)5 23.54470.8 3 lookup(MM, Xiijj)50.84 16.8 The code for producing the result is given below. Best regards, Søren - library(inline) library(RcppEigen) library(rbenchmark) library(Matrix) src - ' using namespace Rcpp; typedef Eigen::SparseMatrixdouble MSpMat; const MSpMat X(asMSpMat(XX_)); int i = asint(ii_)-1; int j = asint(jj_)-1; double ans = X.coeff(i,j); return(wrap(ans)); ' Xiijj - cxxfunction(signature(XX_=matrix, ii_=integer, jj_=integer), body=src, plugin=RcppEigen) mm - matrix(c(1,0,0,0,0,0,0,0), nr=100, nc=100) MM - as(mm, Matrix) object.size(mm) object.size(MM) lookup - function(mat, func){ for (i in 1:nrow(mat)){ for (j in 1:ncol(mat)){ v-func(mat,i,j) } } } benchmark(lookup(mm,`[`), lookup(MM,`[`), lookup(MM, Xiijj), columns=c(test, replications, elapsed, relative), replications=5) -Original Message- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: 25. juni 2012 11:27 To: Søren Højsgaard Cc: r-help@r-project.org Subject: Re: [R] Indexing matrices from the Matrix package with [i, j] seems to be very slow. Are there faster alternatives? On 12-06-24 4:50 PM, Søren Højsgaard wrote: Dear all, Indexing matrices from the Matrix package with [i,j] seems to be very slow. For example: library(rbenchmark) library(Matrix) mm- matrix(c(1,0,0,0,0,0,0,0), nr=20, nc=20) MM- as(mm, Matrix) lookup- function(mat){ for (i in 1:nrow(mat)){ for (j in 1:ncol(mat)){ mat[i,j] } } } benchmark(lookup(mm), lookup(MM), columns=c(test, replications, elapsed, relative), replications=50) test replications elapsed relative 1 lookup(mm) 500.01 1 2 lookup(MM) 508.77 877 I would have expected a small overhead when indexing a matrix from the Matrix package, but this result is really surprising... Does anybody know if there are faster alternatives to [i,j] ? There's also a large overhead when indexing a dataframe, though Matrix appears to be slower. It's designed to work on whole matrices at a time, not single entries. So I'd suggest that if you need to use [i,j] indexing, then try to arrange your code to localize the access, and extract a submatrix as a regular fast matrix first. (Or if it will fit in memory, convert the whole thing to a matrix just for the access. If I just add the line mat - as.matrix(mat) at the start of your lookup function, it becomes several hundred times faster.) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mixture distribution with positive and negative probabilities
Hi! Any ideas on which package (e.g. mixdist, flexmix, etc) how I could fit a mixture of say 3 Gaussian functions where 2 have their proportions, means, and sigmas, and the third has a mean, sigma but a negative proportion. Basically I'm trying to fit a mixture model to a distribution that I know is the sum of 3 distributions, where one inhibits the other two. Is there such a thing? Thanks in advance! Yakir Gagnon cell+1 919 886 3877 office +1 919 684 7188 Johnsen Lab Biology Department Box 90338 Duke University Durham, NC 27708 BioSci Building Room 307 http://fds.duke.edu/db/aas/Biology/postdoc/yg32 http://www.biology.duke.edu/johnsenlab/people/yakir.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] flatten lists
Frankly, I'm not sure what you mean, but presumably unlist(yourlist, recurs=FALSE) is not it, right? -- Bert On Tue, Jun 26, 2012 at 2:25 PM, Jeroen Ooms jeroen.o...@stat.ucla.eduwrote: I am looking for a function to flatten a list to a list of only 1 level deep. Very similar to unlist, however I don't want to turn it into a vector because then everything will be casted to character vectors: x - list(name=Jeroen, age=27, married=FALSE, home=list(country=Netherlands, city=Utrecht)) unlist(x) This function sort of does it: flatlist - function(mylist){ lapply(rapply(mylist, enquote, how=unlist), eval) } flatlist(x) However it is a bit slow. Is there a more native way? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Indexing matrices from the Matrix package with [i, j] seems to be very slow. Are there faster alternatives?
Duncan, I should probably add that I am aware that my code is not the solution and also that the relative gain of my code probably decreases with the problem size until eventually it will perform worse that [i,j] (because of copying I suppose). So my point is just: It would just be nice if [i,j] was faster... Regards Søren PS: For a 2000 x 2000 matrix I get: test replications elapsed relative 1 lookup(mm, `[`) 514.85 1.00 2 lookup(MM, Xiijj)5 133.66 9.000673 Using the modified code src - ' using namespace Rcpp; typedef Eigen::MappedSparseMatrixdouble MSpMat; const MSpMat X(asMSpMat(XX_)); int i = asint(ii_)-1; int j = asint(jj_)-1; double ans = X.coeff(i,j); return(wrap(ans)); ' -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Søren Højsgaard Sent: 27. juni 2012 01:20 To: Duncan Murdoch Cc: r-help@r-project.org Subject: Re: [R] Indexing matrices from the Matrix package with [i, j] seems to be very slow. Are there faster alternatives? Dear Duncan Thanks for your suggestion, but I really need sparse matrices: I have implemented various graph algorithms based on adjacency matrices. For large graphs, storing all the 0's in an adjacency matrices become uneconomical, and therefore I thought I would use sparse matrices but the speed of [i,j] will slow down the algorithms. However, using RcppEigen it is possible to mimic [i,j] with a slowdown of only a factor 16 which is much better than what is obtained when using [i,j]: benchmark(lookup(mm,`[`), lookup(MM,`[`), lookup(MM, Xiijj), + columns=c(test, replications, elapsed, relative), + replications=5) test replications elapsed relative 1 lookup(mm, `[`)50.05 1.0 2 lookup(MM, `[`)5 23.54470.8 3 lookup(MM, Xiijj)50.84 16.8 The code for producing the result is given below. Best regards, Søren - library(inline) library(RcppEigen) library(rbenchmark) library(Matrix) src - ' using namespace Rcpp; typedef Eigen::SparseMatrixdouble MSpMat; const MSpMat X(asMSpMat(XX_)); int i = asint(ii_)-1; int j = asint(jj_)-1; double ans = X.coeff(i,j); return(wrap(ans)); ' Xiijj - cxxfunction(signature(XX_=matrix, ii_=integer, jj_=integer), body=src, plugin=RcppEigen) mm - matrix(c(1,0,0,0,0,0,0,0), nr=100, nc=100) MM - as(mm, Matrix) object.size(mm) object.size(MM) lookup - function(mat, func){ for (i in 1:nrow(mat)){ for (j in 1:ncol(mat)){ v-func(mat,i,j) } } } benchmark(lookup(mm,`[`), lookup(MM,`[`), lookup(MM, Xiijj), columns=c(test, replications, elapsed, relative), replications=5) -Original Message- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: 25. juni 2012 11:27 To: Søren Højsgaard Cc: r-help@r-project.org Subject: Re: [R] Indexing matrices from the Matrix package with [i, j] seems to be very slow. Are there faster alternatives? On 12-06-24 4:50 PM, Søren Højsgaard wrote: Dear all, Indexing matrices from the Matrix package with [i,j] seems to be very slow. For example: library(rbenchmark) library(Matrix) mm- matrix(c(1,0,0,0,0,0,0,0), nr=20, nc=20) MM- as(mm, Matrix) lookup- function(mat){ for (i in 1:nrow(mat)){ for (j in 1:ncol(mat)){ mat[i,j] } } } benchmark(lookup(mm), lookup(MM), columns=c(test, replications, elapsed, relative), replications=50) test replications elapsed relative 1 lookup(mm) 500.01 1 2 lookup(MM) 508.77 877 I would have expected a small overhead when indexing a matrix from the Matrix package, but this result is really surprising... Does anybody know if there are faster alternatives to [i,j] ? There's also a large overhead when indexing a dataframe, though Matrix appears to be slower. It's designed to work on whole matrices at a time, not single entries. So I'd suggest that if you need to use [i,j] indexing, then try to arrange your code to localize the access, and extract a submatrix as a regular fast matrix first. (Or if it will fit in memory, convert the whole thing to a matrix just for the access. If I just add the line mat - as.matrix(mat) at the start of your lookup function, it becomes several hundred times faster.) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained,
Re: [R] Compile C files
On 12-06-26 2:48 PM, Frederico Mestre wrote: Hello: Sorry, this might look like a beginner question, but I'm just starting to work on the C and R interface. I'm trying to compile a C file (with a function) to load it to an R function but, in the command line I keep getting a lot of errors, like: You'll need to tell us what you did before you can expect us to interpret the error messages. Duncan Murdoch C:/Program~1/R/R-215~1.0/include/Rinternals.h:1066:1: error: expected declaration specifiers before 'SEXP' I've been able to compile this file before, so I I'm using Windows 7 in a 64 bits computer. Best regards, Frederico [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Figuring out encodings of PDFs in R
On 12-06-26 3:28 PM, Jonas Michaelis wrote: Dear list, I am currently scraping some text data from several PDFs using the readPDF() function in the tm package. This all works very well and in most cases the encoding seems to be latin1 - in some, however, it is not. Is there a good way in R to check character encodings? I found the functions is.utf8() and is.local() in the tau package but that obviously only gets me so far. There are heuristics for guessing encodings, but I don't think they are built into R. I think the way to do what you want is to read the PDF spec to find out how the strings are encoded in the source file, and believe that. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RES: Compile C files
Hello: I just reinstalled R and Rtools. It works perfectly now. Thanks, Frederico -Mensagem original- De: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Enviada em: quarta-feira, 27 de Junho de 2012 01:06 Para: Frederico Mestre Cc: r-help@r-project.org Assunto: Re: [R] Compile C files On 12-06-26 2:48 PM, Frederico Mestre wrote: Hello: Sorry, this might look like a beginner question, but I'm just starting to work on the C and R interface. I'm trying to compile a C file (with a function) to load it to an R function but, in the command line I keep getting a lot of errors, like: You'll need to tell us what you did before you can expect us to interpret the error messages. Duncan Murdoch C:/Program~1/R/R-215~1.0/include/Rinternals.h:1066:1: error: expected declaration specifiers before 'SEXP' I've been able to compile this file before, so I I'm using Windows 7 in a 64 bits computer. Best regards, Frederico [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rms package-superposition prediction curve of ols and data points
This is what the addpanel argument to plot.Predict is for, something along the lines of ap - function(...) lpoints(age, weight) plot(Predict(. . .), addpanel=ap) Frank David Winsemius wrote On Jun 26, 2012, at 11:29 AM, Sarah Goslee wrote: You could use points() instead of plot() for the second command. Ummm. Maybe not. I think think that plot.Predict uses lattice graphics. You may need to use trellis.focus() followed by lpoints(). Or use the + operation with suitable objects. -- David. Sarah On Tue, Jun 26, 2012 at 8:37 AM, achaumont lt;agnes.chaumont@gt; wrote: Hello, I have a question about the “plot.predict” function in Frank Harrell's rms package. Do you know how to superpose in the same graph the prediction curve of ols and raw data points? Put most simply, I would like to combine these two graphs: fit_linear - ols (y4 ~ rcs(x2,c(5,10,15,20,60,80,90)), x=TRUE, y=TRUE) p - Predict(fit_linear,x2,conf.int=FALSE) plot (p, ylim =c(-2,0.5), xlim=c(0,100)) # graph n°1 z - plot (x2,y4,ylim=c(-2,0.5),xlim=c(0,100),type=p,lwd=6,col=blue) # graph n°2 Thanks all, Agnès -- Sarah Goslee http://www.functionaldiversity.org __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/rms-package-superposition-prediction-curve-of-ols-and-data-points-tp4634503p4634566.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question about formatting Dates
Dear R People: I have dates as factors in the following: poudel.df$DATE [1] 1/2/2011 1/4/2011 1/4/2011 1/4/2011 1/6/2011 1/7/2011 1/8/2011 [8] 1/9/2011 1/10/2011 Levels: 1/10/2011 1/2/2011 1/4/2011 1/6/2011 1/7/2011 1/8/2011 1/9/2011 I want them to be regular dates which can be sorted, etc. But when I did this: as.character(poudel.df$DATE) [1] 1/2/2011 1/4/2011 1/4/2011 1/4/2011 1/6/2011 1/7/2011 [7] 1/8/2011 1/9/2011 1/10/2011 and as.Date(as.character(poudel.df$DATE),%m/%d/$Y) [1] NA NA NA NA NA NA NA NA NA because the dates do not have leading zeros. There are approximately 30 years of nearly daily data in the entire set. Any suggestions would be much appreciated. Sincerely, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test
Hi, The error is due to less than 5 observations in some cells. You can try, fisher.test(tabele) Fisher's Exact Test for Count Data data: tabele p-value = 0.0998 alternative hypothesis: two.sided A.K. - Original Message - From: Omphalodes Verna omphalodes.ve...@yahoo.com To: r-help@r-project.org r-help@r-project.org Cc: Sent: Tuesday, June 26, 2012 2:27 PM Subject: [R] chisq.test Dear list! I would like to calculate chisq.test on simple data set with 70 observations, but the output is ''Warning message:'' Warning message: In chisq.test(tabele) : Chi-squared approximation may be incorrect Here is an example: tabele - matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = TRUE) dimnames(tabela) - list( SEX = c(M,F), HAIR = c(Brown, Black, Red, Blonde)) addmargins(tabele) prop.table(tabele) chisq.test(tabele) Please, give me an advice / suggestion / recommendation. Thanks a lot to all, OV [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] flatten lists
Hi, Try: do.call(c,do.call(c,x)) x1-do.call(c,do.call(c,x)) x2-flatlist(x) identical(x1,x2) [1] TRUE A.K. - Original Message - From: Jeroen Ooms jeroen.o...@stat.ucla.edu To: Neal Fultz nfu...@gmail.com Cc: r-help@r-project.org Sent: Tuesday, June 26, 2012 6:23 PM Subject: Re: [R] flatten lists Hmm that doesn't seem to work if the original list is nested more than 2 levels deep. I should have probably given a better example: x - list(name=Jeroen, age=27, married=FALSE, home=list(country=list(name=Netherlands, short=NL), city=Utrecht)) On Tue, Jun 26, 2012 at 3:04 PM, Neal Fultz nfu...@gmail.com wrote: do.call(c, x) maybe? On Tue, Jun 26, 2012 at 02:25:40PM -0700, Jeroen Ooms wrote: I am looking for a function to flatten a list to a list of only 1 level deep. Very similar to unlist, however I don't want to turn it into a vector because then everything will be casted to character vectors: x - list(name=Jeroen, age=27, married=FALSE, home=list(country=Netherlands, city=Utrecht)) unlist(x) This function sort of does it: flatlist - function(mylist){ lapply(rapply(mylist, enquote, how=unlist), eval) } flatlist(x) However it is a bit slow. Is there a more native way? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Zero inflated: is there a limit to the level of inflation
Thank you both for your quick response and input. I will consider all of your points and see what we are able to derive from there. Thank you again for your time and expertise. -Stephanie --- Stephanie L. Simek Carnivore Ecology Lab Forest and Wildlife Research Center Mississippi State University Box 9690 Mississippi State, MS 39762 Cell: (850) 591-1430 Email: ssi...@cfr.msstate.edu -Original Message- From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at] Sent: Tuesday, June 26, 2012 4:46 PM To: Marc Schwartz Cc: Stephanie L. Simek; r-help@r-project.org Subject: Re: [R] Zero inflated: is there a limit to the level of inflation On Tue, 26 Jun 2012, Marc Schwartz wrote: On Jun 26, 2012, at 2:10 PM, SSimek wrote: Hello, I have count data that illustrate the presence or absence of individuals in my study population. I created a grid cell across the study area and calcuated a count value for each individual per season per year for each grid cell. The count value is the number of time an individual was present in each grid cell. For illustration my data columns look something like this and are repeated for each individual: Cell_ID Param1 Param2 Param3 Param4 COUNT NameYear Season Cov 1160.565994 729.08 15037930.3 0 AA 2010 AUT Open 1160.565994 729.08 15037930.3 22 AA 2011 SPR Open 1160.565994 729.08 15037930.3 12 AA 2009 SUM Open 1160.565994 729.08 15037930.3 0 AA 2010 SUM Open 2169.427001 491.87 1503.31 5101.09 0 AA 2010 AUT oldHard 2169.427001 491.87 1503.31 5101.09 16 AA 2011 SPR oldHard 2169.427001 491.87 1503.31 5101.09 0 AA 2009 SUM oldHard 2169.427001 491.87 1503.31 5101.09 0 AA 2010 SUM oldHard ? 563 86.777099 612.69 977 4474.6 62 AA 2010 AUT Water 563 86.777099 612.69 977 4474.6 12 AA 2011 SPR Water 563 86.777099 612.69 977 4474.6 55 AA 2009 SUM Water 1160.565994 729.08 15037930.3 0 BB 2010 SUM Open 2169.427001 491.87 1503.31 5101.09 72 BB 2010 SUM oldHard 5160.75 614.95 1503.31 2878.98 16 BB 2010SUM medHard 6170.404998 510.58 1489.44 743.14 0 BB 2010 SUM Water ? 563 86.777099 612.69 977 4474.6 0 BB 2010 SUM Water 1160.565994 729.08 15037930.3 14 C 2005 AUT Open 1160.565994 729.08 15037930.3 0 C 2006 AUT Open 1160.565994 729.08 15037930.3 0 C 2006 SPR Open 1160.565994 729.08 15037930.3 56 C 2007 SPR Open 1160.565994 729.08 15037930.3 0 C 2006 SUM Open 2169.427001 491.87 1503.31 5101.09 124 C 2005 AUT oldHard 2169.427001 491.87 1503.31 5101.09 231 C 2006 AUT oldHard 2169.427001 491.87 1503.31 5101.09 889 C 2006 SPR oldHard 2169.427001 491.87 1503.31 5101.09 0 C 2007 SPR oldHard ? 563 86.777099 612.69 977 4474.6 0 C 2005AUT Water 563 86.777099 612.69 977 4474.6 231 C 2006AUT Water 563 86.777099 612.69 977 4474.6 185 C 2006SPR Water 563 86.777099 612.69 977 4474.6 123 C 2007SPR Water 563 86.777099 612.69 977 4474.6 52 C 2006SUM Water I have 563 grid cells across my study area and each individual has 1-563 cells associated for each year and each season the individual was monitored. Therefore my grid cells are repeated. I end up with 71,000 records and 925 records have a Count value 0; which means 70,075 records have a Count value = 0. I wanted to run a zero inflated poisson model to determine mixed effects (of parameters) with individual as the random effect. But I have been advised two things: 1. I cannot run a zero inflated poisson model because my data are too extremely inflated (i.e. 70,075 vs 925) and 2. I cannot run the model with each cell repeated for each individual. I am told the model doesn't recognize that Cell_ID #1 for individual A is the same Cell_ID #1 for individual B. Does anyone know if either or both of these points are true? I would appreciate any thoughts, advice, or suggestions. Thanks! -Stephanie Hi Stephanie, Some comments: 1. You should think about or at least be open to a zero inflated negative binomial distribution rather than zero inflated poisson. 2. You should at least review the vignette for the pscl CRAN package, which provides standard fixed effects models and
Re: [R] flatten lists
Hi, I hope this helps. Tested to some depth. x1 - list(name=Jeroen, age=27, married=FALSE, home=list(country=list(name=Netherlands, short=NL), city=Utrecht)) x2 - list(name=Jeroen, age=27, married=FALSE, home=list(country=list(name=list(Country1=Netherlands,Country2=Spain), short=list(NL,SP)), city=Utrecht)) x3 - list(name=Jeroen, age=27, married=FALSE, home=list(country=list(name=list(Countrygroup= list(Netherlands,Germany),Country2=Spain), short=list(NL,SP)), city=Utrecht)) #recursive function x4-lapply(do.call(c,c(x3,list(recursive=TRUE))),FUN=unlist) x4[2]-as.numeric(x4[2]) x4[3]-as.logical(x4[3]) x4 $name [1] Jeroen $age [1] 27 $married [1] FALSE $home.country.name.Countrygroup1 [1] Netherlands $home.country.name.Countrygroup2 [1] Germany $home.country.name.Country2 [1] Spain $home.country.short1 [1] NL $home.country.short2 [1] SP $home.city [1] Utrecht identical(x4,flatlist(x3)) [1] TRUE A.K. - Original Message - From: Jeroen Ooms jeroen.o...@stat.ucla.edu To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Tuesday, June 26, 2012 6:55 PM Subject: Re: [R] flatten lists Alright, but I need something recursive for lists with arbitrary deepness. On Tue, Jun 26, 2012 at 3:37 PM, arun smartpink...@yahoo.com wrote: Hi, Try: do.call(c,do.call(c,x)) x1-do.call(c,do.call(c,x)) x2-flatlist(x) identical(x1,x2) [1] TRUE A.K. - Original Message - From: Jeroen Ooms jeroen.o...@stat.ucla.edu To: Neal Fultz nfu...@gmail.com Cc: r-help@r-project.org Sent: Tuesday, June 26, 2012 6:23 PM Subject: Re: [R] flatten lists Hmm that doesn't seem to work if the original list is nested more than 2 levels deep. I should have probably given a better example: x - list(name=Jeroen, age=27, married=FALSE, home=list(country=list(name=Netherlands, short=NL), city=Utrecht)) On Tue, Jun 26, 2012 at 3:04 PM, Neal Fultz nfu...@gmail.com wrote: do.call(c, x) maybe? On Tue, Jun 26, 2012 at 02:25:40PM -0700, Jeroen Ooms wrote: I am looking for a function to flatten a list to a list of only 1 level deep. Very similar to unlist, however I don't want to turn it into a vector because then everything will be casted to character vectors: x - list(name=Jeroen, age=27, married=FALSE, home=list(country=Netherlands, city=Utrecht)) unlist(x) This function sort of does it: flatlist - function(mylist){ lapply(rapply(mylist, enquote, how=unlist), eval) } flatlist(x) However it is a bit slow. Is there a more native way? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about formatting Dates
On Tue, Jun 26, 2012 at 10:54 PM, Erin Hodgess erinm.hodg...@gmail.com wrote: Dear R People: I have dates as factors in the following: poudel.df$DATE [1] 1/2/2011 1/4/2011 1/4/2011 1/4/2011 1/6/2011 1/7/2011 1/8/2011 [8] 1/9/2011 1/10/2011 Levels: 1/10/2011 1/2/2011 1/4/2011 1/6/2011 1/7/2011 1/8/2011 1/9/2011 I want them to be regular dates which can be sorted, etc. But when I did this: as.character(poudel.df$DATE) [1] 1/2/2011 1/4/2011 1/4/2011 1/4/2011 1/6/2011 1/7/2011 [7] 1/8/2011 1/9/2011 1/10/2011 and as.Date(as.character(poudel.df$DATE),%m/%d/$Y) Right about ... ^ should be a percent instead of a dollar sign. Also, probably can't hurt to used a named argument (but I don't think that's the problem here) In the future dput()-ery would be much appreciated. Michael [1] NA NA NA NA NA NA NA NA NA because the dates do not have leading zeros. There are approximately 30 years of nearly daily data in the entire set. Any suggestions would be much appreciated. Sincerely, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A solution for question about formatting Dates
Hello again: Here is a solution to the dates without leading zeros: pou1 - function(x) { #Note: x is a data frame #Assume that Column 1 has the date #Column 2 has station #Column 3 has min #Column 4 has max library(stringr) w - character(length=nrow(x)) z - str_split(x[,1],/) for(i in 1:nrow(x)) { u - str_pad(z[[i]][1:3],width=2,pad=0) w[i] - paste(u,sep=,collapse=/) } a - as.Date(w,%m/%d/%Y) This is not particularly elegant, but it does the trick. Thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test
On 27/06/12 08:54, arun wrote: Hi, The error is due to less than 5 observations in some cells. NO, NO, NO It's not the observations that matter, it is the ***EXPECTED COUNTS***. These must all be at least 5 in order for the null distribution of the test statistic to be adequately approximated by a chi-squared distribution. cheers, Rolf Turner You can try, fisher.test(tabele) Fisher's Exact Test for Count Data data: tabele p-value = 0.0998 alternative hypothesis: two.sided A.K. - Original Message - From: Omphalodes Verna omphalodes.ve...@yahoo.com To: r-help@r-project.org r-help@r-project.org Cc: Sent: Tuesday, June 26, 2012 2:27 PM Subject: [R] chisq.test Dear list! I would like to calculate chisq.test on simple data set with 70 observations, but the output is ''Warning message:'' Warning message: In chisq.test(tabele) : Chi-squared approximation may be incorrect Here is an example: tabele - matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = TRUE) dimnames(tabela) - list( SEX = c(M,F), HAIR = c(Brown, Black, Red, Blonde)) addmargins(tabele) prop.table(tabele) chisq.test(tabele) Please, give me an advice / suggestion / recommendation. Thanks a lot to all, OV [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A solution for question about formatting Dates
Please don't change subject lines for follow-on comments. It messes up threading in most readers: e.g., https://stat.ethz.ch/pipermail/r-help/2012-June/thread.html Michael On Tue, Jun 26, 2012 at 11:57 PM, Erin Hodgess erinm.hodg...@gmail.com wrote: Hello again: Here is a solution to the dates without leading zeros: pou1 - function(x) { #Note: x is a data frame #Assume that Column 1 has the date #Column 2 has station #Column 3 has min #Column 4 has max library(stringr) w - character(length=nrow(x)) z - str_split(x[,1],/) for(i in 1:nrow(x)) { u - str_pad(z[[i]][1:3],width=2,pad=0) w[i] - paste(u,sep=,collapse=/) } a - as.Date(w,%m/%d/%Y) This is not particularly elegant, but it does the trick. Thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove empty levels in subset
Thank you very much. The advice I followed (and which, for some reason, I do not see here right now) was to use 'droplevels'. I needed the command for several variables at the same time, so this was very convenient. Hello, Have you tried 'droplevels': test -data.frame(a=as.factor(rep(c(f1,f2,f3),10)),b=rep(c(1,2,3),10)) test2 - subset(test,test$a==f1) summary(test2) ab f1:10 Min. :1 f2: 0 1st Qu.:1 f3: 0 Median :1 Mean :1 3rd Qu.:1 Max. :1 test3-droplevels(test2) summary(test3) ab f1:10 Min. :1 1st Qu.:1 Median :1 Mean :1 3rd Qu.:1 Max. :1 A.K. -- View this message in context: http://r.789695.n4.nabble.com/Remove-empty-levels-in-subset-tp873967p4634550.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] selecting rows by maximum value of one variables in dataframe nested by another Variable
How could I select the rows of a dataset that have the maximum value in one variable and to do this nested in another variable. It is a dataframe in long format with repeated measures per subject. I was not successful using aggregate, because one of the columns has character values (and/or possibly because of another reason). I would like to transfer something like this: subjecttime.ms V3 1 1 stringA 1 12 stringB 1 22 stringC 2 1 stringB 2 14 stringC 2 25 stringA …. To something like this: subjecttime.ms V3 1 22 stringC 2 25 stringA … Thank you very much for you help! Miriam -- Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.