Re: [R] fitting cosine curve
On Wed, 21 Jun 2017, J C Nash wrote: Using a more stable nonlinear modeling tool will also help, but key is to get the periodicity right. The model is linear up to omega after transformation as Don and I noted. Taking a guess that 2*pi/240 = 0.0262 is about right for omega: rsq <- function(x) {t2<-t*x;summary(lm(y~cos(t2)+sin(t2)))$r.squared} vrsq <- Vectorize(rsq) optimise(rsq, c(0.8,1.2)*2*pi/240,maximum=TRUE) $maximum [1] 0.02794878 $objective [1] 0.8127072 curve(vrsq,0.025,0.03) Isn't this stable enough? And as you note plot(lm(y~cos(t*0.0279)+sin(t*0.0279))) reveals lack-of-fit. Of course there are some other issues not addressed here such as possible autoregression. HTH, Chuck y=c(16.82, 16.72, 16.63, 16.47, 16.84, 16.25, 16.15, 16.83, 17.41, 17.67, 17.62, 17.81, 17.91, 17.85, 17.70, 17.67, 17.45, 17.58, 16.99, 17.10) t=c(7, 37, 58, 79, 96, 110, 114, 127, 146, 156, 161, 169, 176, 182, 190, 197, 209, 218, 232, 240) lidata <- data.frame(y=y, t=t) #I use the method to fit a curve, but it is different from the real curve, #which can be seen in the figure. linFit <- lm(y ~ cos(t)) library(nlsr) #fullFit <- nls(y ~ A*cos(omega*t+C) + B, #start=list(A=coef(linFit)[1],B=coef(linFit)[2],C=0,omega=.4)) #omega cannot be set to 1, don't know why. fullFit <- nlxb(y ~ A*cos(omega*t+C) + B, data=lidata, start=list(A=coef(linFit)[1],B=coef(linFit)[2],C=0,omega=.04), trace=TRUE) co <- coef(fullFit) fit <- function(x, a, b, c, d) {a*cos(b*x+c)+d} plot(x=t, y=y) curve(fit(x, a=co['A'], b=co['omega'], c=co['C'],d=co['B']), add=TRUE ,lwd=2, col="steelblue") jstart <- list(A=20, B=100, C=0, omega=0.01) jfit <- nlxb(y ~ A*cos(omega*t+C) + B, data=lidata, start=jstart, trace=TRUE) co <- coef(jfit) fit <- function(x, a, b, c, d) {a*cos(b*x+c)+d} plot(x=t, y=y) curve(fit(x, a=co['A'], b=co['omega'], c=co['C'],d=co['B']), add=TRUE ,lwd=2, col="steelblue") JN On 2017-06-21 12:06 AM, lily li wrote: I'm trying the different parameters, but don't know what the error is: Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates Thanks for any suggestions. On Tue, Jun 20, 2017 at 7:37 PM, Don Cohen <don-r-h...@isis.cs3-inc.com> wrote: If you know the period and want to fit phase and amplitude, this is equivalent to fitting a * sin + b * cos > >>> > I don't know how to set the approximate starting values. I'm not sure what you meant by that, but I suspect it's related to phase and amplitude. > >>> > Besides, does the method work for sine curve as well? sin is the same as cos with a different phase Any combination of a and b above = c * sin (theta + d) for some value of c and d and = e * cos (theta + f) for some value of e and f. Also for any c,d and for any e,f there is an a,b. the c and e are what I'm calling amplitude, the d and f are what I'm calling phase. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry Dept of Family Medicine & Public Health cberry at ucsd edu UC San Diego / La Jolla, CA 92093-0901 http://biostat.ucsd.edu/ccberry.htm __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitting cosine curve
On Tue, 20 Jun 2017, lily li wrote: Hi R users, I have a question about fitting a cosine curve. I don't know how to set the approximate starting values. See Y.L. Tong (1976) Biometrics 32:85-94 The method is known as `cosinor' analysis. It takes advantage of the *intrinsic* linearity of y = a + b * cos( omega*t - c ), when omega is given. It you are scratching your head saying 'that thing is not linear', you need to go back to your linear models text and review what `linearity' actually refers to. Also, reading the Tong paper is recomended as you will need the transformations given there in any case. What you end up doing is fitting fit <- lm(y~cos(t.times.omega)+sin(t.times.omega)) and then transforming coef(fit) to get back a, b, and c. So, you only need to have omega. If it is not obvious what value to use, then that will be more of a challenge. The paper gives asymptotics for the dispersion matrix of (a, b, c), I recall. Besides, does the method work for sine curve as well? Seriously? See https://en.wikipedia.org/wiki/List_of_trigonometric_identities#Shifts_and_periodicity HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Determining which.max() within groups
On Tue, 6 Jun 2017, Morway, Eric wrote: Using the dataset below, I got close to what I'm after, but not quite all the way there. Any suggestions appreciated: Daily <- read.table(textConnection(" Date wyrQ 1911-04-01 1990 4.530695 1911-04-02 1990 4.700596 1911-04-03 1990 4.898814 1911-04-04 1990 5.097032 1911-04-05 1991 5.295250 1911-04-06 1991 6.569508 1911-04-07 1991 5.861587 1911-04-08 1991 5.153666 1911-04-09 1992 4.445745 1911-04-10 1992 3.737824 1911-04-11 1992 3.001586 1911-04-12 1992 3.001586 1911-04-13 1993 2.350298 1911-04-14 1993 2.661784 1911-04-16 1993 3.001586 1911-04-17 1993 2.661784 1911-04-19 1994 2.661784 1911-04-28 1994 3.369705 1911-04-29 1994 3.001586 1911-05-20 1994 2.661784"),header=TRUE) aggregate(Q ~ wyr, data = Daily, which.max) # gives: #wyr Q # 1 1990 4 # 2 1991 2 # 3 1992 1 # 4 1993 3 # 5 1994 2 I can 'see' that it is returning the which.max() relative to each grouping. Is there a way to instead return the absolute position (row) of the max value within each group. i.e.: # Would instead like to have # wyr Q # 1 1990 4 # 2 1991 6 # 3 1992 9 # 4 1993 15 # 5 1994 18 The icing on the cake would be to get the Julien Day corresponding to the date on which each year's maximum occurs? Like this: which.max.by.wyr <- with(Daily, which( ave( Q, wyr, FUN=max) == Q)) cbind( Daily[ which.max.by.wyr, ], index=which.max.by.wyr ) Date wyrQ index 4 1911-04-04 1990 5.097032 4 6 1911-04-06 1991 6.569508 6 9 1911-04-09 1992 4.445745 9 15 1911-04-16 1993 3.00158615 18 1911-04-28 1994 3.36970518 If there are ties in Q and you do not want more than one max value listed, you can add a litle fuzz to randomly pick one. i.e. fuzz <- runif(nrow(Daily), 0, 1e-10) which.max.by.wyr <- with(Daily, which(ave(Q+fuzz,wyr,FUN=max)==Q+fuzz)) If you want the first tied value, then sort fuzz before determining which.max.by.wyr. HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data import R: some explanatory variables not showing up correctly in summary
On Thu, 1 Jun 2017, Rui Barradas wrote: Hello, In order for us to help we need to know how you've imported your data. What was the file type? What instructions have you used to import it? Did you use base R or a package? Give us a minimal but complete code example that can reproduce your situation. Hope this helps, Rui Barradas Absolutely. It would also help to see what the unique values of each column *really* are. To that end run and report the results of this: lapply(your.data.frame, function(x) unique(as.character(x))) I'll bet you have both "combination" and "combination " as values or something similar where two different strings look to your eye to be the same when printed by summary(). HTH, Chuck Em 01-06-2017 11:02, Tara Adcock escreveu: Hi, I have a question regarding data importing into R. When I import my data into R and review the summary, some of my explanatory variables are being reported as if instead of being one variable, they are two with the same name. See below for an example; Behav person Behav dog Position **combination : 38 combination : 4** Bank:372 **combination : 7 combination : 4** **Island :119** fast :123 fast : 15 **Island : 11** slow :445 slow : 95 Land: 3 stat :111 stat : 14 Water :230 Also, all of the distances I have imported are showing up in the summary along with a line entitled "other". However, I haven't used any other distances? DistanceDistance.dog 2-10m :184 <50m : 35 <50m :156 2-10m : 27 10-20m :156 20-30m : 23 20-30m : 91 30-40m : 16 40-50m : 57 10-20m : 13 **(Other): 82 (Other): 18** I have checked my data sheet over and over again and I think standardised the data, but the issue keeps arising. I'm assuming I need to clean the data set but as a nearly complete novice in R I am not certain how to do this. Any help at all with this would be much appreciated. Thanks so much. Kind Regards, Tara Adcock. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry Dept of Family Medicine & Public Health cberry at ucsd edu UC San Diego / La Jolla, CA 92093-0901 http://biostat.ucsd.edu/ccberry.htm __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] odfWeave - A loop of the "same" data
On Thu, 1 Jun 2017, POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST) via R-help wrote: Before I go and do this another way - can I check if anyone has a way of looping through data in odfWeave (or possibly sweave) to do a repeating analysis on subsets of data? For simplicity lets use mtcars dataset in R to explain. Dataset looks like this: mtcars mpg cyl disp hp drat wt ... Mazda RX4 21.0 6 160 110 3.90 2.62 ... Mazda RX4 Wag 21.0 6 160 110 3.90 2.88 ... Datsun 71022.8 4 108 93 3.85 2.32 ... Say I wanted to have a 'catalogue' style report from mtcars, where on each page I would perhaps have the Rowname as a heading and then plot a graph of mpg highlighting that specific car Then add a page break and *do the same for the next car*. I can manually do this of course, but it is effectively a loop something like this: for (n in length(mtcars$mpg)) { barplot (mtcars$mpg, col=c(rep(1,n-1),2,rep(1,length(mtcars$mpg)-n))) } There is a odfWeave page break function so I can do that sort of thing (I think). But I don't think I can output more than one image can I? In reality I will want several images and a table per "catalogue" page. At the moment I think I need to create a master odt document, and create individual catalogue pages. And merge them into one document - but that feels clunky (unless I can script the merge!) Anyone got a better way? For a complex template inside a loop, I'd probably do as Jeff suggests and use a knitr child document for ease of developing and debugging the template. But for the simple case you describe I'd use a brew script to unroll the loop. You would write your input file as usual, but put a brew script in the right place, then run brew on the input file to produce an intermediate file that unrolls the loop, then weave the intermediate file to get your desired result. Here is a simple example of such you can run in an R session (assuming the brew package is installed) and see the results printed out. --8<---cut here---start->8--- brew::brew(text=" Everything before the loop <% for (i in 1:10) { %> Print the value of i <% print(i) %> or better yet \\Sexpr{<%= i %>} <% } %> everything after ") --8<---cut here---end--->8--- The double backslash is needed in the literal string used here. If you put that script in a file using an editor, you would just use a single backslash. HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Make sure a data frame has been "fun through" a function
On Tue, 21 Feb 2017, stephen sefick wrote: Sorry for not being clear. I have never used S3 methods before. Below is some R code that sketches out my idea. Is this a sensible solution? Sure. See comments (untested) inline. Chuck test_data <- data.frame(a=1:10, b=1:10, c=1:10) functionA <- function(x, impossible_genotype){ ##some data processing y <- x ##return S3 to be able to use impossible genotype later class(y) <- append(class(y),"genotypes") class(y) <- c("genotypes",class(y)) attr(y, "impossible_genotype") <- impossible_genotype return(y) } test_data_genotypes <- functionA(test_data, impossible_genotype="Ref") functionB <- function(x){ ##stop if pre-processed with functionA if(sum(class(x)=="genotypes")!=1){stop("Need to pre-process data with functionA")} if(!(inherits("genotypes")){ stop("Need to pre-process data with functionA")} or in functionA you could skip the class()<- and just set the "impossible_genotypes" attribute to FALSE when there are none such. Then here test if (is.null(attr(x,"impossible_genotypes"))){ stop("Need to pre-process data with functionA") } else { return(alleles) } ##use this later in functionB to impossible_genotype <- attributes(x)$impossible_genotype impossible_genotype <- attr(x,"impossible_genotype") alleles <- c("Ref", "Alt") coded_genotype <- alleles[alleles!=impossible_genotype] maybe `!is.element(alleles,impossible_genotype)' is safer than `!=' return(coded_genotype) } ##stop if not pre-processed with functionA functionB(test_data) ##processed with functionA functionB(test_data_genotypes) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Make sure a data frame has been "fun through" a function
On Mon, 20 Feb 2017, stephen sefick wrote: Hello, I would like to add something to a data frame that is 1) invisible to the user, 2) has no side effects, and 3) I can test for in a following function. Is this possible? I am exploring classes and attributes and I have thought about using a list (but 1 and 2 not satisfied). Any help would be greatly appreciated. Depends on exactly what you mean by `invisible' and `side effects'. You can do this (but I am not necessarily recommending this): add.stuff <- function(x,...){ + class(x)<- c("more.stuff",class(x)) + attr(x,"stuff")<- list(...) + x} And printing and model functions will be unaffected: df <- data.frame(a=1:3,b=letters[1:3]) df2 <- add.stuff(df,comment="wow", length="3 rows") df2 a b 1 1 a 2 2 b 3 3 c attr(df2,"stuff") $comment [1] "wow" $length [1] "3 rows" all.equal(lm(a~b,df),lm(a~b,df2)) # only call should differ [1] "Component “call”: target, current do not match when deparsed" And if you need some generics to take account of the "stuff" attribute, you can write the methods to do that. --- Another solution is to put your data.framne in a package and then have other objects hold the 'stuff' stuff. Once your package is loaded or imported, the user will have access to the data in a way that might be said to be `invisible' in ordinary usage. --- But seriously, you should say *why* you want to do this. There are probably excellent solutions that do not involve directly altering the data.frame and may not involve putting together a package. HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [ESS] feature request : completion of available packages name
On Tue, 3 Jan 2017, Martin Maechler wrote: On Tue, Jan 3, 2017 at 6:30 PM, Samuel BARRETOwrote: Hi, Do you think it would be difficult to add some kind of completion backend to complete the package names when typing `library(` ? [deleted] More seriously: We do something like that already in ESS for quite a while now but I don't recall the details. I suspect Martin knows this, but ... C-c C-e C-l runs ess-load-library. With `ido' enabled, it puts a list in the minibuffer where fuzzy matching will narrow it down to the package one wants. Hitting TAB will display all the package names that fuzzily match in a completion buffer. Thanks for reminding me of the nifty feature! Chuck __ ESS-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/ess-help
Re: [R] MC_CORES and mc.cores for parallel package
On Wed, 7 Dec 2016, Marc Girondot via R-help wrote: Hi, From the documentation of ?options Options set in package parallel These will be set when package parallel (or its namespace) is loaded if not already set. mc.cores: a integer giving the maximum allowed number of additional R processes allowed to be run in parallel to the current R process. Defaults to the setting of the environment variable MC_CORES if set. Most applications which use this assume a limit of 2 if it is unset. As advertised. - Start R with no MC_CORES specified: - check environment var - set environment var - check options - THEN load parallel - check option again Sys.getenv("MC_CORES") [1] "" Sys.setenv("MC_CORES"=3L) options("mc.cores") $mc.cores NULL library(parallel) options("mc.cores") $mc.cores [1] 3 --- I think you confused things by loading parallel *before* setting the environment var. HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Frequency of a character in a string
On Mon, 14 Nov 2016, Marc Schwartz wrote: On Nov 14, 2016, at 11:26 AM, Charles C. Berry <ccbe...@ucsd.edu> wrote: On Mon, 14 Nov 2016, Bert Gunter wrote: [stuff deleted] Hi, Both gsub() and strsplit() are using regex based pattern matching internally. That being said, they are ultimately calling .Internal code, so both are pretty fast. For comparison: ## Create a 1,000,000 character vector set.seed(1) Vec <- paste(sample(letters, 100, replace = TRUE), collapse = "") nchar(Vec) [1] 100 ## Split the vector into single characters and tabulate table(strsplit(Vec, split = "")[[1]]) a b c d e f g h i j k l 38664 38442 38282 38496 38540 38623 38548 38288 38143 38493 38184 38621 m n o p q r s t u v w x 38306 38725 38705 38144 38529 38809 38575 38355 38386 38364 38904 38310 y z 38265 38299 ## Get just the count of "a" table(strsplit(Vec, split = "")[[1]])["a"] a 38664 nchar(gsub("[^a]", "", Vec)) [1] 38664 ## Check performance system.time(table(strsplit(Vec, split = "")[[1]])["a"]) user system elapsed 0.100 0.007 0.107 system.time(nchar(gsub("[^a]", "", Vec))) user system elapsed 0.270 0.001 0.272 So, the above would suggest that using strsplit() is somewhat faster than using gsub(). However, as Chuck notes, in the absence of more exhaustive benchmarking, the difference may or may not be more generalizable. Whether splitting on fixed strings rather than treating them as regex'es (i.e.`fixed=TRUE') makes a big difference seems to depend on what you split: First repeating what Marc did... system.time(table(strsplit(Vec, split = "",fixed=TRUE)[[1]])["a"]) user system elapsed 0.132 0.010 0.139 system.time(table(strsplit(Vec, split = "",fixed=FALSE)[[1]])["a"]) user system elapsed 0.130 0.010 0.138 ... fixed=TRUE hardly matters. But the idiom I proposed... system.time(sum(lengths(strsplit(paste0("X", Vec, "X"),"a",fixed=TRUE)) - 1)) user system elapsed 0.017 0.000 0.018 system.time(sum(lengths(strsplit(paste0("X", Vec, "X"),"a",fixed=FALSE)) - 1)) user system elapsed 0.104 0.000 0.104 ... is 5 times faster with fixed=TRUE for this case. This result matchea Marc's count: sum(lengths(strsplit(paste0("X", Vec, "X"),"a",fixed=FALSE)) - 1) [1] 38664 Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Frequency of a character in a string
On Mon, 14 Nov 2016, Bert Gunter wrote: Yes, but it need some help, since nchar gives the length of the *entire* string; e.g. ## to count "a" 's : x <-(c("abbababba","bbabbabbaaaba")) nchar(gsub("[^a]","",x)) [1] 4 6 This is one of about 8 zillion ways to do this in base R if you don't want to use a specialized package. Just for curiosity: Can anyone comment on what is the most efficient way to do this using base R pattern matching? Most efficient? There probably is no uniformly most efficient way to do this as the timing will depend on the distribution of "a" in the atoms of any vector as well as the length of the vector. But here is one way to avoid the regular expression matching: lengths(strsplit(paste0("X", x, "X"),"a",fixed=TRUE)) - 1 Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Alternative to apply in base R
On Tue, 8 Nov 2016, Doran, Harold wrote: Without reaching out to another package in R, I wonder what the best way is to speed enhance the following toy example? Over the years I have become very comfortable with the family of apply functions and generally not good at finding an improvement for speed. This toy example is small, but my real data has many millions of rows and the same operations is repeated many times and so finding a less expensive alternative would be helpful. mm <- matrix(rnorm(100), ncol = 10) rn <- apply(mm, 1, prod) If the real example has only 10 columns, try this: y <- mm[,1] for (i in 2:10) y[] <- y*mm[,i] all.equal(y,rn) If it has many more columns, I would `reach out' to the inline package and write 3 lines of C or Fortran to do the operation. HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dealing with -Inf in a maximisation problem.
On Mon, 7 Nov 2016, Rolf Turner wrote: On 07/11/16 13:07, William Dunlap wrote: Have you tried reparameterizing, using logb (=log(b)) instead of b? Uh, no. I don't think that that makes any sense in my context. The "b" values are probabilities and must satisfy a "sum-to-1" constraint. To accommodate this constraint I re-parametrise via a "logistic" style parametrisation --- basically b_i = exp(z_i)/[sum_j exp(z_j)], j = 1, ... n with the parameters that the optimiser works with being z_1, ..., z_{n-1} (and with z_n == 0 for identifiability). The objective function is of the form sum_i(a_i * log(b_i)), This is sum_i(a_i * z_i) - sum(a_i)*log(sum_j(exp(z_j)), isn't it? So you don't need to evaluate b_i here, do you? Large values of z_j will lead to exp(z_j) == Inf, but using sum_i(a_i * (z_i-max.z)) - sum(a_i)*log(sum_j(exp(z_j-max.z)) will handle that. HTH, Chuck p.s. Regarding "advice from younger and wiser heads", I probably cannot claim to be either. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function ave() with seq_along returning char sequence instead of numeric
On Mon, 31 Oct 2016, Jeff Newmiller wrote: The help page describes the first argument x as a numeric... it is not designed to accept character, Actually it is so designed, but not advertised as such. See below. so the fact that you get anything even close to right is just a bonus. As the doctor says, "if it hurts, don't do that". ave( rep( 1, length( v ), v, FUN=seq_along ) -- [snip] Reading the code of `ave` and then `split<-.default`, you will see subset replacement, "x[i]<- ...", on the argument 'x'. So, the issue is having FUN and that replacement (and possible coercion) yield something useful/sensible. In other words, class(x) need not be "numeric". For instance, operating on "Date" objects: # start at 2016-01-02, step 10 days, ... x <- as.Date("2016-01-01")+seq(1,1000,by=10) z <- rep(1:10, 10) class(ave(x,z)) # Date class is preserved [1] "Date" ave(x,z) # mean date [1] "2017-03-27" "2017-04-06" "2017-04-16" "2017-04-26" ... ave(x,z,FUN=min) # earliest date [1] "2016-01-02" "2016-01-12" "2016-01-22" "2016-02-01" ... However, trying to describe this feature in the help page without a lot of detail and examples might confuse more users than it would enlighten. -- Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Escaping quotes WAS: Storing long string with white space in variable
On Wed, 19 Oct 2016, g.maub...@weinwolf.de wrote: Hi All, I would like to store a long string with white space in a variable: -- cut -- # Create README.md readme <- "--- title: "Your project title here" author: "Author(s) name(s) here" date: "Current date here" output: html_document --- [snip] " cat(readme, file = "README.md") -- cut -- I am looking for an equivalent to Pythons """ """ long string feature. Whitespace isn;t the issue here. Embedded quotes are the problem. See ?Quotes In this case, if your replace the first and last double quotes with single quotes, you will get a single string of 464 characters, which is what you want. In other cases where you have embedded single quotes, you can escape them and the double quotes and get what you want. [snip] PS: This is a template for a project folder for each project. I would like to create it with R script instead of distributing it as a template file. This way one needs only the R script to setup a project like this: [rest deleted] There are templating procedures available. Package brew might be worth a look. It can take a template string with embedded R code and return a string with the R results spliced in. Package knitr has some ability to deal with templates, too, and I think there are other packages. Jeff Newmiller's suggestion to make a package seems right to me whether you use brew or not. Long term it will probably be the easiest path. HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looping through data tables (or data frames) by removing previous individuals
On Mon, 3 Oct 2016, Frank S. wrote: Dear R users, [deleted] I want to get a list of "k" data tables (or data frames) so that each contains those individuals who for the first time are at least 65, looping on each of the dates of vector "v". Let's consider the following example with 5 individuals: dt <- data.table( id = 1:5, fborn = as.Date(c("1935-07-25", "1942-10-05", "1942-09-07", "1943-09-07", "1943-12-31")), sex = as.factor(rep(c(0, 1), c(2, 3))) ) v <- seq(as.Date("2006-01-01"), as.Date("2009-01-01"), by ="year") # k=4 I would expect to obtain k=4 data tables so that: dt_p1: contains id = 1 (he is for the first time at least 65 on date v[1]) dt_p2: is NULL (no subject reach for the first time 65 on date v[2]) dt_p3: contains id = 2 & id = 3 (they are for the first time at least 65 on v[3]) dt_p4: contains id = 4 & id = 5 (they are for the first time at least 65 on v[4]) Here is a start (using a data.frame for dt): vp <- as.POSIXlt( c( as.Date("1000-01-01"), v )) vp$year <- vp$year-65 dt.cut <- as.numeric(cut(as.POSIXlt(dt$fborn),vp)) split(dt,factor(dt.cut, 1:length(v))) $`1` id fborn sex 1 1 1935-07-25 0 $`2` [1] idfborn sex <0 rows> (or 0-length row.names) $`3` id fborn sex 2 2 1942-10-05 0 3 3 1942-09-07 1 $`4` id fborn sex 4 4 1943-09-07 1 5 5 1943-12-31 1 See ?as.POSIXlt ?cut.POSIXt ?split HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improve code efficient with do.call, rbind and split contruction
On Sat, 3 Sep 2016, Bert Gunter wrote: Chuck et. al.: As I said previously, my intuition about the relative efficiency of tapply() and duplicated() in the context of this thread was wrong. My `intuition' was wrong, too. But tapply() uses split() which runs quite fast. So not a big surprise, but if you look thru tapply() you'll notice it is well crafted in other ways. In particular, the way the `f' arg of split is constructed makes a big difference in timing (using a for loop to build up a mixed radix number). In fact interaction(f,g) needs about 3 times the time of tapply(f,list(f,g)) for just building an index. Thanks for following up. Best, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improve code efficient with do.call, rbind and split contruction
On Fri, 2 Sep 2016, Bert Gunter wrote: [snip] The "trick" is to use tapply() to select the necessary row indices of your data frame and forget about all the do.call and rbind stuff. e.g. I agree the way to go is "select the necessary row indices" but I get there a different way. See below. set.seed(1001) df <- data.frame(f =factor(sample(LETTERS[1:4],100,rep=TRUE)), + g <- factor(sample(letters[1:6],100,rep=TRUE)), + y = runif(100)) ix <- seq_len(nrow(df)) ix <- with(df,tapply(ix,list(f,g),function(x)x[length(x)])) ix a b c d e f A 94 69 100 59 80 87 B 89 57 65 90 75 88 C 85 92 86 95 97 62 D 47 73 72 74 99 96 jx <- which( !duplicated( df[,c("f","g")], fromLast=TRUE )) xtabs(jx~f+g,df[jx,]) ## Show equivalence to Bert's `ix' g f a b c d e f A 94 69 100 59 80 87 B 89 57 65 90 75 88 C 85 92 86 95 97 62 D 47 73 72 74 99 96 Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with big data and parallel computing: 500, 000 x 4 linear models
On Mon, 8 Aug 2016, Ellis, Alicia M wrote: I have a large dataset with ~500,000 columns and 1264 rows. Each column represents the percent methylation at a given location in the genome. I need to run 500,000 linear models for each of 4 predictors of interest in the form of: Methylation.stie1 ~ predictor1 + covariate1+ covariate2 + ... covariate9 ...and save only the pvalue for the predictor The original methylation data file had methylation sites as row labels and the individuals as columns so I read the data in chunks and transposed it so I now have 5 csv files (chunks) with columns representing methylation sites and rows as individuals. I was able to get results for all of the regressions by running each chunk of methylation data separately on our supercomputer using the code below. This sounds like a problem for my old laptop, not a supercomputer. You might want to review the algebra and geometry of least squares. In particular, covariate1 ... covariate9 are the same 1264 x 9 matrix for every problem IIUC. So, you can compute the QR decomposition for that matrix (and the unit vector `intercept') *once* and use it in all the problems. Using that decomposition, find the residuals for the regressands and for `predictor1' (etc) regressors. The rest is simple least squares. You compute the correlation coefficient of the residuals of a regressand and those of a regressor, for each combination. Make a table of critical values for the p-value(s) you require - remember to get the degrees of freedom right (i.e. account for the covariates). These correlations of residuals are the partial correlations given the covariates, and a test on one of them is algebraically equal to the test on regression coefficient for corresponding regressand and regressor in a modelthat also includes those 9 covariates. See: ?qr ?lm.fit HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make the "apply" faster
On Sat, 9 Jul 2016, Debasish Pai Mazumder wrote: I have 4-dimension array x(lat,lon,time,var) I am using "apply" to calculate over time new = apply(x,c(1,2,4),FUN=function(y) {length(which(y>=70))}) This is very slow. Is there anyway make it faster? If dim(x)[3] << prod(dim(x)[-3]), new <- Reduce("+",lapply(1:dim(x)[3],function(z) x[,,z,]>=70)) will be faster. However, if you can follow Peter Langfelder's suggestion to use rowSums, that would be best. Even using rowSums(aperm(x,c(1,2,4,3)>=70,dims=3) and paying the price of aperm() might be better. Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to identify runs or clusters of events in time
See below On Fri, 1 Jul 2016, Mark Shanks wrote: Hi, Imagine the two problems: 1) You have an event that occurs repeatedly over time. You want to identify periods when the event occurs more frequently than the base rate of occurrence. Ideally, you don't want to have to specify the period (e.g., break into months), so the analysis can be sensitive to scenarios such as many events happening only between, e.g., June 10 and June 15 - even though the overall number of events for the month may not be much greater than usual. Similarly, there may be a cluster of events that occur from March 28 to April 3. Ideally, you want to pull out the base rate of occurrence and highlight only the periods when the frequency is less or greater than the base rate. A good place to start is: Siegmund, D. O., N. R. Zhang, and B. Yakir. "False discovery rate for scanning statistics." Biometrika 98.4 (2011): 979-985. and Aldous, David. Probability approximations via the Poisson clumping heuristic. Vol. 77. Springer Science & Business Media, 2013. --- A nice illustration of how scan statistcis can be used is: Aberdein, Jody, and David Spiegelhalter. "Have London's roads become more dangerous for cyclists?." Significance 10.6 (2013): 46-48. 2) Events again occur repeatedly over time in an inconsistent way. However, this time, the event has positive or negative outcomes - such as a spot check of conformity to regulations. You again want to know whether there is a group of negative outcomes close together in time. This analysis should take into account the negative outcomes as well though. E.g., if from June 10 to June 15 you get 5 negative outcomes and no positive outcomes it should be flagged. On the other hand, if from June 10 to June 15 you get 5 negative outcomes interspersed between many positive outcomes it should be ignored. I'm guessing that there is some statistical approach designed to look at these types of issues. What is it called? `Scan statistic' is a good search term. `Poisson clumping', too. What package in R implements it? I basically just need to know where to start. There are some R packages. CRAN has packages SNscan and graphscan, which sound like they might interest you. My BioConductor package geneRxCluster: http://bioconductor.org/packages/release/bioc/html/geneRxCluster.html seeks clusters in a binary sequence as described in detail at http://bioinformatics.oxfordjournals.org/content/30/11/1493 HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Order of formula terms in model.matrix
On Sun, 17 Jan 2016, Lars Bishop wrote: I’d appreciate your help on understanding the following. It is not very clear to me from the model.matrix documentation, why simply changing the order of terms in the formula may change the number of resulting columns. Please note I’m purposely not including main effects in the model formula in this case. IIRC, there are some heuristics involved harking back to the White Book. I recall there have been discussions of whether and how this could be fixed before on this list and or R-devel, but I cannot seem to lay my browser on them right now. set.seed(1) x1 <- rnorm(100) f1 <- factor(sample(letters[1:3], 100, replace = TRUE)) trt <- sample(c(-1,1), 100, replace = TRUE) df <- data.frame(x1=x1, f1=f1, trt=trt) dim(model.matrix( ~ x1:trt + f1:trt, data = df)) [1] 100 4 dim(model.matrix(~ f1:trt + x1:trt, data = df)) [1] 100 5 By `x1:trt' I guess you mean the same thing as `I(x1*trt)'. If you use the latter form, the issue you raise goes away. Note that `I(some.expr)' gives you the ability to force the behavior of model.matrix to be exactly what you want by suitably crafting `some.expr', heuristics notwithstanding. HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add sequence numbers to lines with the same ID: How can this be accomplished?
On Sat, 24 Oct 2015, Bert Gunter wrote: Rolf's solution works for the situation where all duplicated values are contiguous, which may be what you need. However, I wondered how it could be done if this were not the case. Below is an answer. It is not as efficient or elegant as Rolf's solution for the contiguous case I think; maybe someone will come up with something better. The often underappreciated `ave' comes to mind. viz., ave(w,w,FUN=seq_along) and ave(ID,ID,FUN=seq_along) agree with the results below. Of course, ave(...) is just split/unsplit in guise, further our discussion of a month or two back. Best, Chuck But I think it works. Here's an example with code: w <- c(1:5,3,1,2,7,8,5,5,5,2,3) w [1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3 d <- 0+duplicated(w) for(x in unique(w)){ + i <- w==x + d[i]<-1+ cumsum(d[i]) + + } d [1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3 As always, corrections and/or improvements welcome. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner <r.tur...@auckland.ac.nz> wrote: On 25/10/15 11:28, John Sorkin wrote: I have a file that has (1) Line numbers, (2) IDs. A given ID number can appear in more than one row. For each row with a repeated ID, I want to add a number that gives the sequence number of the repeated ID number. The R code below demonstrates what I want to have, without any attempt to produce the result, as I have no idea how to accomplish my goal. line <- c(1,2,3,4,5,6,7,8,9,10) ID<-c(1,1,2,3,4,5,6,7,8,8) cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID 8") cbind(line,ID) Seq <- c(1,2,1,1,1,1,1,1,1,2) cat("Sequence numbers within ID added to the data") cbind(line,ID,Seq) I *think* that unlist(lapply(rle(ID)$lengths,seq_len)) gives what you want. At least it does for the given example. cheers, Rolf Turner -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry Dept of Family Medicine & Public Health cberry at ucsd edu UC San Diego / La Jolla, CA 92093-0901 http://famprevmed.ucsd.edu/faculty/cberry/ __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear regression with a rounded response variable
On Wed, 21 Oct 2015, Ravi Varadhan wrote: Hi, I am dealing with a regression problem where the response variable, time (second) to walk 15 ft, is rounded to the nearest integer. I do not care for the regression coefficients per se, but my main interest is in getting the prediction equation for walking speed, given the predictors (age, height, sex, etc.), where the predictions will be real numbers, and not integers. The hope is that these predictions should provide unbiased estimates of the "unrounded" walking speed. These sounds like a measurement error problem, where the measurement error is due to rounding and hence would be uniformly distributed (-0.5, 0.5). Not the usual "measurement error model" problem, though, where the errors are in X and not independent of XB. Look back at the proof of the unbiasedness of least squares under the Gauss-Markov setup. The errors in Y need to have expectation zero. From your description (but see caveat below) this is true of walking *time*, but not not exactly true of walking *speed* (modulo the usual assumptions if they apply to time). In fact if E(epsilon) = 0 were true of unrounded time, it would not be true of unrounded speed (and vice versa). Are there any canonical approaches for handling this type of a problem? Work out the bias analytically? Parametric bootstrap? Data augmentation and friends? What is wrong with just doing the standard linear regression? Well, what do the actual values look like? If half the subjects have a value of 5 seconds and the rest are split between 4 and 6, your assertion that rounding induces an error of dunif(epsilon,-0.5,0.5) is surely wrong (more positive errors in the 6 second group and more negative errors in the 4 second group under any plausible model). HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compare two normal to one normal
On Tue, 22 Sep 2015, John Sorkin wrote: Charles, I am not sure the answer to me question, given a dataset, how can one compare the fit of a model of the fits the data to a mixture of two normal distributions to the fit of a model that uses a single normal distribution, can be based on the glm model you suggest. Well you *did* ask how to calculate the log-likelihood of a fitted normal density, didn't you? That is what I responded to. You can check that result longhand as sum( dnorm( y, y.mean, y.std , log=TRUE ) ) and get the same result (as long as you used ML estimates of the mean and standard deviation). I have used normalmixEM to fit the data to a mixture of two normal curves. The model estimates four (perhaps five) parameters: mu1, sd^2 1, mu2, sd^2, (and perhaps lambda, the mixing proportion. The mixing proportion may not need to be estimated, it may be determined once once specifies mu1, sd^2 1, mu2, and sd^2.) Your model fits the data to a model that contains only the mean, and estimates 2 parameters mu0 and sd0^2. I am not sure that your model and mine can be considered to be nested. If I am correct I can't compare the log likelihood values from the two models. I may be wrong. If I am, I should be able to perform a log likelihood test with 2 (or 3, I am not sure which) DFs. Are you suggesting the models are nested? If so, should I use 3 or 2 DFs? As Rolf points out there is a literature on such tests (and Googling 'test finite mixture' covers much of it). Do you really want a test? If you merely want to pick a winner from two candidate models there are other procedures. k-fold crossvalidation of the loglikelihood ratio statistic seems like an easy, natural approach. HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compare two normal to one normal
On Tue, 22 Sep 2015, John Sorkin wrote: In any event, I still don't know how to fit a single normal distribution and get a measure of fit e.g. log likelihood. Gotta love R: y <- rnorm(10) logLik(glm(y~1)) 'log Lik.' -17.36071 (df=2) HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple if function
On Thu, 17 Sep 2015, Berend Hasselman wrote: On 17 Sep 2015, at 01:42, Dénes Tóthwrote: On 09/16/2015 04:41 PM, Bert Gunter wrote: Yes! Chuck's use of mapply is exactly the split/combine strategy I was looking for. In retrospect, exactly how one should think about it. Many thanks to all for a constructive discussion . -- Bert Bert Gunter Use mapply like this on large problems: unsplit( mapply( function(x,z) eval( x, list( y=z )), expression( A=y*2, B=y+3, C=sqrt(y) ), split( dat$Flow, dat$ASB ), SIMPLIFY=FALSE), dat$ASB) Chuck Is there any reason not to use data.table for this purpose, especially if efficiency is of concern? --- # load data.table and microbenchmark library(data.table) library(microbenchmark) # # prepare data DF <- data.frame( ASB = rep_len(factor(LETTERS[1:3]), 3e5), Flow = rnorm(3e5)^2) DT <- as.data.table(DF) DT[, ASB := as.character(ASB)] # # define functions # # Chuck's version fnSplit <- function(dat) { unsplit( mapply( function(x,z) eval( x, list( y=z )), expression( A=y*2, B=y+3, C=sqrt(y) ), split( dat$Flow, dat$ASB ), SIMPLIFY=FALSE), dat$ASB) } # # data.table-way (IMHO, much easier to read) fnDataTable <- function(dat) { dat[, result := if (.BY == "A") { 2 * Flow } else if (.BY == "B") { 3 + Flow } else if (.BY == "C") { sqrt(Flow) }, by = ASB] } # # benchmark # microbenchmark(fnSplit(DF), fnDataTable(DT)) identical(fnSplit(DF), fnDataTable(DT)[, result]) --- Actually, in Chuck's version the unsplit() part is slow. If the order is not of concern (e.g., DF is reordered before calling fnSplit), fnSplit is comparable to the DT-version. But David’s version is faster than Chuck’s fnSplit. I modified David’s solution slightly to get a result that is identical to fnSplit. # David's version # my modification to return a vector just like fnSplit fnDavid <- function(dat) { z <- mapply( function(x,z) eval( x, list( y=z )), expression(A= y*2, B=y+3, C=sqrt(y) ), split( dat$Flow, dat$ASB ), USE.NAMES=FALSE, SIMPLIFY=TRUE ) as.vector(t(z)) } Added this to Dénes's code. Benchmarking with R package rbenchmark and testing result like this library(rbenchmark) benchmark(fnSplit(DF), fnDataTable(DT),fnDavid(DF)) identical(fnSplit(DF), fnDataTable(DT)[, result]) identical(fnSplit(DF), fnDavid(DF)) gave this: test replications elapsed relative user.self sys.self user.child 2 fnDataTable(DT) 100 0.8291.000 0.7620.066 0 3 fnDavid(DF) 100 1.6151.948 1.5150.098 0 1 fnSplit(DF) 100 2.8783.472 2.6850.190 0 sys.child 2 0 3 0 1 0 identical(fnSplit(DF), fnDataTable(DT)[, result]) [1] TRUE identical(fnSplit(DF), fnDavid(DF)) [1] TRUE The above `TRUE' depends on the structure of ASB here. identical(...) is often FALSE in the general case. A permutation of ASB is enough to show this: DF$ASB <- sample(DF$ASB) identical(fnSplit(DF), fnDavid(DF)) [1] FALSE unsplit() is the price you pay to cope with general orderings. Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple if function
On Tue, 15 Sep 2015, Bert Gunter wrote: Thanks to both Davids. I realize that these things are often a matter of aesthetics -- and hence have little rational justification -- but I agree with The Other David: eval(parse) seems to me to violate R's soul( it makes R a macro language instead of a functional one). However, mapply(... switch) effectively loops through the frame row by row. Aesthetically, I like it; but it seems inefficient. If there are e.g. 1e6 rows in say 10 categories, I think Jeff's approach should do much better. I'll try to generate some actual data to see unless someone else beats me to it. Use mapply like this on large problems: unsplit( mapply( function(x,z) eval( x, list( y=z )), expression( A=y*2, B=y+3, C=sqrt(y) ), split( dat$Flow, dat$ASB ), SIMPLIFY=FALSE), dat$ASB) Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Integrals
On Fri, 28 Aug 2015, Shant Ch via R-help wrote: Hello all, For a study I want to find E|X1/3+X2/3+X3/3-X4| for variables following lognormal distribution. To get the value we need to use four integrals. So far, so good. This is the code which I is used fx-function(x){ dlnorm(x,meanlog=2.185,sdlog=0.562) } U31-integrate(function(y1) { sapply(y1, function(y1) { + integrate(function(y2){ sapply(y2, function(y2) { + integrate(function(x1){ sapply(x1, function(x1) { + integrate(function(x2) + abs(y1/3+y2/3+x1/3-x2)*fx(y1)*fx(y2)*fx(x1)*fx(x2),0, Inf)$value + })},0, Inf)$value })},0, Inf)$value})},0,Inf)$value The error I received is the following: Error in integrate(function(y2) { : maximum number of subdivisions reached I can understand the problem, This is NOT a problem for which a numerical solution (apart from evaluating exp(x) and then doing some arithmetic) is required. You are calculating the expectation of a sum of random variables. And from your code, these are independent random variables. There is a well known calculus of expectations. Consult a book on probability theory. The expectation of a lognormal distribution is both well known and easy to work out. So is the expectation of a constant times a random variable. The expectation of a sum of random variables is also well known. So write down the expectation of the lognormal. Render that expression as an R function. Write down the expectation of a random variable times a constant. Render that expression as an R function. Write down the expression for expectation of a sum of independent random variables as a function of the expectations of the random variables. Lastly, write an R function that calls all of the above to yield the expectation of a sum of lognormal variables times fixed constants. You do not need to use (and should not use!) the integrate() function to accomplish your aim. Chuck p.s. Nesting calls to integrate() is almost certainly a very bad approach to any multiple integration problem. If you ever do need to solve a multiple integration problem numerically, consult an expert before trying to write the solution as R code. but I am unable to figure out what can be done.. It would be great if you can let me know a solution to the problem so as to find a value for the integral. Shant [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry Dept of Family Medicine Public Health cberry at ucsd edu UC San Diego / La Jolla, CA 92093-0901 http://famprevmed.ucsd.edu/faculty/cberry/ __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Integrals
On Sat, 29 Aug 2015, Shant Ch wrote: Hello Dr. Berry, I know the theoretical side but note we are not talking about expectation of sums rather expectation of ABSOLUTE value of the function (X1/3+X2/3+X3/3-X4), i.e. E|X1/3+X2/3+X3/3-X4| , I don#39;t think this can be handled for log normal distribution by integrals by hand. Sorry! My tired eyes missed the absolute value. FWIW, there are some quadrature packages on CRAN. Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Do grep() and strsplit() use different regex engines?
On Sat, 11 Jul 2015, Bert Gunter wrote: David/Jeff: Thank you both. You seem to confirm that my observation of an infelicity in strsplit() is real. That is most helpful. I found nothing in David's message 2 code that was surprising. That is, the splits shown conform to what I would expect from \\b . But not to what I originally showed and David enlarged upon in his first message. I still don't really get why a split should occur at every letter. Jeff may very well have found the explanation, but I have not gone through his code. If the infelicities noted (are there more?) by David and me are not really bugs -- and I would be frankly surprised if they were -- I would suggest that perhaps they deserve mention in the strsplit() man page. Something to the effect that \b and \ should not be used as split characters... . Bert et al, ?strsplit already says: If empty matches occur, in particular if split has length 0, x is split into single characters. And there are various ways that empty matches can happen besides using \\b as the split arg. But there would be no harm in adding your cases to 'in particular ...' The comment in the code (src/main/grep.c: line 493) suggests this was a deliberate decision. However, similar functions in other languages do not do this. For example, emacs `(split-string red green \\b)' gives ( red green ) as the result. Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Programming R to avoid loops
On Sat, 18 Apr 2015, Brant Inman wrote: I have two large data frames with the following structure: df1 id date test1.result 1 a 2009-08-28 1 2 a 2009-09-16 1 3 b 2008-08-06 0 4 c 2012-02-02 1 5 c 2010-08-03 1 6 c 2012-08-02 0 df2 id date test2.result 1 a 2011-02-03 1 2 b 2011-09-27 0 3 b 2011-09-01 1 4 c 2009-07-16 0 5 c 2009-04-15 0 6 c 2010-08-10 1 I need to match items in df2 to those in df1 with specific matching criteria. I have written a looped matching algorithm that works, but it is very slow with my large datasets. I am requesting help on making a version of this code that is faster and “vectorized so to speak. As I see in your posted code, you match id's exactly, dates according to a range, and count the number of positive test result in the second data.frame. For this, the countOverlaps() function of the GenomicRanges package will do the trick with suitably defined GRanges objects. Something like: require(GenomicRanges) date1 - as.integer( as.Date( df1$date, %Y-%m-%d )) date2 - as.integer( as.Date( df2$date, %Y-%m-%d )) lagdays - 30L predays - -30L gr1 - GRanges(seqnames=df1$id, IRanges(start=date1,width=1),strand=*) gr2 - GRanges(seqnames=df2$id, IRanges(start=date2+predays,end=date2+lagdays), strand=*)[ df2$test2.result==1,] df1$test2.count - countOverlaps(gr1,gr2) For the example data.frames (as rendered by Jim Lemon's code), this yields df1 id date test1.result test2.count 1 a 2009-08-281 0 2 a 2009-09-161 0 3 b 2008-08-060 0 4 c 2012-02-021 0 5 c 2010-08-031 1 6 c 2012-08-020 0 The GenomicRanges package is at http://www.bioconductor.org/packages/release/bioc/html/GenomicRanges.html where you will find installation instructions and links to vignettes. HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Requesting function for A/B testing
On Sat, 7 Mar 2015, Rolf Turner wrote: On 06/03/15 22:34, Namratha K wrote: Dear Sir/Madam, I am a student pursuing MCA .As i am doing an project using R language .I want to implement A/B testing using R language.I am searching in google from past few days and not able to implement .So i request u to kindly help me by sending function or code on A/B testing method using R language. talk soon Huh? Or, to put it another way, WTF? Rolf, Old wine, new bottle. According to http://en.wikipedia.org/wiki/A/B_testing, A/B testing has been marketed by some as a change in philosophy and business strategy in certain niches, though the approach is identical to a between-subjects design, which is commonly used in a variety of research traditions[refs] A friend who is a management consultant tells me that the core ideas in his discipline never change, but every five years or so a new set of buzzwords is introduced to describe them. It makes it seem like a change in philosophy and business strategy has occurred. And of course the clients will need to hire consultants who know those buzzwords to stay current. And the consultants will need to sign up for continuing education courses to learn those buzzwords. === Namratha, Read R-intro. Either from your R installation or at http://cran.r-project.org/doc/manuals/r-release/R-intro.html Then start R and enter ls(package:stats,pat=test) at the prompt and push ENTER. Browse the help pages for the functions listed. Run the examples. That should get you started. Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Not finding superclass in library
On Mon, 23 Feb 2015, Ramiro Barrantes wrote: Thank you for pointing this out. I had no idea about the distinction but there are some good references on the matter (http://www.r-bloggers.com/packages-v-libraries-in-r/). I am pasting the corrected version below, any suggestions appreciated: I have a package that I created that defines a parent class, assayObject. I created other classes that inherit from it, say assayObjectDemo. Each one of those classes I wanted to make in its own directory separate from where the assayObject is defined (there are reasons for that). But now if I do: library(assayObject) #--- where parent object is defined source(assayObjectDemo.R) where assayObjectDemo.R is just: setClass(assayObjectDemo,contains=assayObject) createDemoAssayObject - function() { df - data.frame() assay-new(Class=assayObjectDemo) assay } I get: Error in reconcilePropertiesAndPrototype(name, slots, prototype, superClasses, : no definition was found for superclass “assayObject” in the specification of class “assayObjectDemo” What can I do? Start with a reproducible example. Here is one: library(Matrix) tmpf - tempfile(fileext=.R) cat('setClass(MatrixDemo,contains=Matrix)',file=tmpf) source(tmpf) slotNames(MatrixDemo) And it produces the expected output without error: [1] Dim Dimnames Since this works fine for a widely used package and fails for your (unspecified) package, I suspect there is a problem with your package. If I had to guess, I'd say it is a NAMESPACE issue. Be sure your exportClasses directive is correctly formed per Section 1.5.6 Namespaces with S4 classes and methods of R-exts. r-devel might be a better venue for this discussion. HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting a list of lists using lapply
On Fri, 20 Feb 2015, Aron Lindberg wrote: Hmm…Chuck’s solution may actually be problematic because there are several entries which at the deepest level are called “sha”, but that should not be included, such as: input[[67]]$content[[1]]$commit$tree$sha and input[[67]]$content[[1]]$parents[[1]]$sha it’s only the “sha” that fit the following subsetting pattern that should be included: input[[i]]$content[[1]]$sha[1] This should be straightforward. Look at what grepl() is doing. And look at what names(unlist(input)) yields. You can either write a regular expression to handle this (perhaps content.sha$) or write other grepl() expressions to select (or get rid of) the desired (or unwanted) pattern. See ?grepl and the page on regular expression referenced there. HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with expression
On Tue, 25 Jan 2011, David Scott wrote: I have a problem with expressions. I am trying to create a title where the parameter of interest is displayed as a Greek character. Which parameter is being considered is stored in a character variable. As an example, if I have param - alpha param - as.name(alpha) HTH, Chuck and then do plot(0, 0, main = bquote(Parameter==.(param))) then in the title I get Parameter = alpha, whereas I want the Greek character alpha. David Scott -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function of probability for normal distribution
On Wed, 19 Jan 2011, JClark wrote: Dear Greg Snow, I'm a biologist trying to write a mathematical formula for a doubly truncated normal distribution to be used in the language R. I realise this is simple stuff for a mathematician but I'm stumped. Wikipedia gives what seems a fairly simple formula - with function = maths with mean and standard deviation - but also phi - WHAT IS PHI !!?? - On http://en.wikipedia.org/wiki/Truncated_normal_distribution it says (by copy-and-paste) The density function involves \scriptstyle{\phi(\cdot)} \ , which is the probability density function of the standard normal distribution and \scriptstyle{\Phi(\cdot)} \ , its cumulative distribution function. There are some links there that you can follow to get up to speed. especially how do I write this in R and why is the top phi in italics ?? Hmmm. Try the posting guide's suggestions. This seems to help: ?distribution [stuff deleted] For the normal distribution see dnorm. and ?dnorm [stuff deleted] dnorm gives the density, pnorm gives the distribution function, qnorm gives the quantile function, and rnorm generates random deviates. so match up 'density' and 'distribution function' in the ?dnorm page and the wiki page and you should be able to put it together. (FWIW, ?density and ??density are not so helpful.) BTW \frac{\frac{1}{\sigma}\phi(\frac{x - \mu}{\sigma})} (copy and pasted from the wiki page) can be rendered as dnorm(x , mu, sigma ) in R. HTH, Chuck Hoping you can help. Yours sincerely, Jeremy Clark -- View this message in context: http://r.789695.n4.nabble.com/Re-R-function-for-Probabilities-for-the-standard-normal-distribution-tp903639p3225457.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parameters/data that live globally
On Tue, 18 Jan 2011, analys...@hotmail.com wrote: I am coming to R from Fortran and I used to use fixed size arrays in named common. common /name1/array(100) The contents of array can be accessed/modified if and only if this line occurs in the function. Very helpful if different functions need different global data (can have name2, name3 etc. for common data blocks). Is there a way to do this in R? Sure. But probably better to work in R as the developers intended, rather than trying to simulate COMMON blocks per se. See ?list ?environment ?search 10.7 Scope in Intro to R ?get ?assign You can collect objects in an environment and pass it to a function or put it in the search path. Then you can get/assign objects. Probably it is easier to work with a list. Here is a toy example foo - function(object){ object$x - object$y + 1; object} my.list - list( y = 4:5 ) my.list - foo(my.list) my.list $y [1] 4 5 $x [1] 5 6 HTH, Chuck Thanks for any help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] list concatenation
On Tue, 11 Jan 2011, Georg Otto wrote: Dear R gurus, first let me apologize for a question that might hve been answered before. I was not able to find the solution yet. I want to concatenate two lists of lists at their lowest level. Suppose I have two lists of lists: list.1 - list(I=list(A=c(a, b, c), B=c(d, e, f)), II=list(A=c(g, h, i), B=c(j, k, l))) list.2 - list(I=list(A=c(A, B, C), B=c(D, E, F)), II=list(A=c(G, H, I), B=c(J, K, L))) list.1 $I $I$A [1] a b c $I$B [1] d e f $II $II$A [1] g h i $II$B [1] j k l list.2 $I $I$A [1] A B C $I$B [1] D E F $II $II$A [1] G H I $II$B [1] J K L Now I want to concatenate list elements of the lowest levels, so the result looks like this: $I $I$A [1] a b c A B C $I$B [1] d e f D E F $II $II$A [1] g h i G H I $II$B [1] j k l J K L Has anybody a good solution for that? mapply( function(x,y) mapply(c, x, y, SIMPLIFY=FALSE), + list.1, list.2, SIMPLIFY=FALSE ) $I $I$A [1] a b c A B C $I$B [1] d e f D E F $II $II$A [1] g h i G H I $II$B [1] j k l J K L HTH, Chuck Best, Georg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory Needed for Regression
On Mon, 10 Jan 2011, efreeman wrote: I'm looking for a formula for memory usage in standard regression; that is, if I have X rows with Y predictors, how much memory is needed? I'm speccing out a system, and I'd like to be able to get enough memory that we can do some fairly large regressions. install.packages(biglm) require(biglm) Then see ?biglm biglm creates a linear model object that uses only p^2 memory for p variables. It can be updated with more data using update. This allows linear regression on data sets larger than memory. If you want to get serious about this look in Golub and Van Loan* (Sorry, my copy is not at hand so I cannot be more specific. Maybe there is a section like Updating Matrix Factorizations that says what is needed.) Also, see Algorithm AS274 Applied Statistics (1992) Vol.41, No. 2 which is what biglm() refers to. And maybe read the source code of biglm() if you are planning on using that package. HTH, Chuck * @book{golub1996matrix, title={{Matrix computations}}, author={Golub, G.H. and Van Loan, C.F.}, isbn={0801854148}, year={1996}, publisher={Johns Hopkins Univ Pr} } ==Ed Freeman [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on list objects
On Sat, 8 Jan 2011, Ron Michael wrote: Hi, I have 2 questions on list object: ? 1. Suppose I have a matrix like: dat - matrix(1:9,3) ? Now I want to replicate this entire matrix 3 times and put entire result in a list object. Means, if res is the resulting list then I should have: ? res[[1]]=dat, res[[2]]=dat, res[[3]]=dat ? How can I do that in the easilest manner? See ?rep ... Examples ... ## replicate a list ... ? 2. Suppose I have 2 list objects: list1 - list2 - vector(list, length=2) for(i in 1:2) { ??list1[[i]] - matrix(rnorm(15), 3) ??list2[[i]] - matrix(rnorm(15), 3) ?} How can I add these 2 list objects? I have tried with just list1+list2, however it is generating some error. ? See ?mapply take note of the SIMPLIFY arg, which you want as FALSE HTH, Chuck Would be grateful for any help. ? Thanks, [[alternative HTML version deleted]] Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Normal Distribution Quantiles
On Sat, 8 Jan 2011, Rainer Schuermann wrote: Sounds like homework, which is not an encouraged use of the Rhelp list. You can either do it in theory... It is _from_ a homework but I have the solution already (explicitly got that done first!) - this was the pasted Latex code (apologies for that, but in plain text it looks unreadable[1], and I thought everybody here has his / her favorite Latrex editor open all the time anyway...). I'm just looking, for my own advancement and programming training, for a way of doing that in R - which, from your and Dennis' reply, doesn't seem to exist. Your question was 'how do I find the smallest integer $n$ such that...', right? Using uniroot and pnorm, you could solve for real $n$ and then round up. Doing this, I find that in greater than 95% of trials, your bushwalker would be done in 105 days or less. Or you could use findInterval, sapply, and pnorm to get all of the $n$s in one expression. HTH, Chuck I would _not_ misuse the list for getting homework done easily, I will not ask learning statistics questions here, and I will always try to find the solution myself before posting something here, I promise! Thanks anyway for the simulation advice, Rainer (4000 - (40*n)) -329 [1] --- = 1200 (10*(n^-)) 2 On Saturday 08 January 2011 14:56:20 you wrote: On Jan 8, 2011, at 6:56 AM, Rainer Schuermann wrote: This is probably embarrassingly basic, but I have spent quite a few hours in Google and RSeek without getting a clue - probably I'm asking the wrong questions... There is this guy who has decided to walk through Australia, a total distance of 4000 km. His daily portion (mean) is 40km with an sd of 10 km. I want to calculate the number of days it takes to arrive with 80, 90, 95, 99% probability. I know how to do this manually, eg. for 95% $\Phi \left( \frac{4000-40n}{10 \sqrt{n}} \right) \leq 0.05$ find the z score... but how would I do this in R? Not qnorm(), but what is it? Sounds like homework, which is not an encouraged use of the Rhelp list. You can either do it in theory or you can simulate it. Here's a small step toward a simulation approach. cumsum(rnorm(100, mean=40, sd=10)) [1] 41.90617 71.09148 120.55569 159.56063 229.73167 255.35290 300.74655 snipped [92] 3627.25753 3683.24696 3714.11421 3729.41203 3764.54192 3809.15159 3881.71016 [99] 3917.16512 3932.00861 cumsum(rnorm(100, mean=40, sd=10)) [1] 38.59288 53.82815 111.30052 156.58190 188.15454 207.90584 240.64078 snipped [92] 3776.25476 3821.90626 3876.64512 3921.16797 3958.83472 3992.33155 4045.96649 [99] 4091.66277 4134.45867 The first realization did not make it in the expected 100 days so further efforts should extend the simulation runs to maybe 120 days. The second realization had him making it on the 98th day. There is an R replicate() function available once you get a function running that will return a specific value for an instance. This one might work: min(which(cumsum(rnorm(120, mean=40, sd=10)) = 4000) ) [1] 97 If you wanted a forum that does not explicitly discourage homework and would be a better place to ask theory and probability questions, there is CrossValidated: http://stats.stackexchange.com/faq Thanks in advance, and apologies for the level of question... Rainer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pattern recognition with paths
On Wed, 5 Jan 2011, Benjamin Polidore wrote: I'm trying to identify patterns among various paths like the following: http://i.imgur.com/bQPI3.png If I plot these, I can observe intuitively two different patterns: a front loaded (1 and 3) and a backloaded (2,4) progress path: http://i.imgur.com/L5qwZ.png I have thousands of observations like the above table, and I want to use R to identify clusters of these paths. I looked at spatstat, but it seems more relevant to points than paths. Hmmm. Is this what you are after? http://en.wikipedia.org/wiki/Functional_data_analysis It is a hefty topic. There is a substantial literature on characterizing curves. Just Google Functional Data Analysis for a start and look at the 'fda' and 'MFDA' packages. HTH, Chuck Thanks, bp [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Repeated Indexing / Sequence Operation
On Fri, 31 Dec 2010, Paolo Rossi wrote: Hi Everyone, quick question before the end of the year. I have soem indices to select data from a bigger sample. I want to select n days before each index and n days after the index. Any clever way to do it. For heavy duty applications involving intervals - many intervals, finding overlapping intervals, set operations, et cetera, you may want to use the IRanges package: http://www.bioconductor.org/help/bioc-views/release/bioc/html/IRanges.html A for loop would do but I wanted to know if there is a moreR-friendly way to approach this Example # InitialIndices i2 = (90, 190, 290) Like this: i2 - c(90, 190, 290) as.vector( IRanges( start= i2 - 5, end = i2 + 5 ) ) [1] 85 86 87 88 89 90 91 92 93 94 95 185 186 187 188 189 190 191 192 [20] 193 194 195 285 286 287 288 289 290 291 292 293 294 295 HTH, Chuck # Indices I want to end up with i3 = c(85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295) # A way to get Final Indices SampleWidth i3 = c(i2) for (j in seq(1, SampleWidth )) { i3 = c(i3, i2 + j ) i3 = c(i3, i2 - j ) } I tried to tackle this with seq and the apply families but got nowhere Thanks and Happy New Year Paolo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] forcing evaluation of a char string argument
On Wed, 22 Dec 2010, rballen wrote: Why does x in assign(x) correctly evaluate to rank where UseMethod(func) does not get correctly evaluated? Because it is the body of a function definition. If you want to plug in the value of 'func' in the body of that function, you need to do something like this: toGeneric - function(func) { env-environment(get(func)) # default method of new generic = the original function assign(paste(func,.default,sep=),get(func),pos=env) foo - function(x,...) {} lf - list(x=func) body.foo - substitute(UseMethod(x),lf) body(foo)-body.foo assign(func,foo,pos=env) } BTW, are you sure you know what 'env' evaluates to?? (It is NOT the environment of the object named by the value of func in the parent.frame of toGeneric.) HTH, Chuck Can we use as.call(list(UseMethod,func))? assign(paste(func,.default,sep=),get(func),pos=env) assign(func,function(x,...) UseMethod(func),pos=env) On Wed, Dec 22, 2010 at 9:03 PM, William Dunlap [via R] ml-node+3161542-1898476570-206...@n4.nabble.com wrote: Try the following, which I haven't tested much and needs more error checking (e.g., to see that the function is not already generic and that its argument list is compatible with (x,...)). I put in the print statements to show what the calls to substitute() do. toGeneric - function (funcName) { ?? ?? stopifnot(is.character(funcName)) ?? ?? funcItself - get(funcName) ?? ?? stopifnot(is.function(funcItself)) ?? ?? envir - environment(funcItself) ?? ?? tmp - substitute(funcSymbol - function(x, ...) UseMethod(funcName), ?? ?? ?? ?? list(funcSymbol = as.symbol(funcName), funcName = funcName)) ?? ?? print(tmp) ?? ?? eval(tmp, envir = envir) ?? ?? tmp - substitute(defaultSymbol - funcItself, list(defaultSymbol = as.symbol(paste(sep = ., ?? ?? ?? ?? funcName, default)), funcItself = funcItself)) ?? ?? print(tmp) ?? ?? eval(tmp, envir = envir) } E.g., ?? ?? wsx - function(x, base=2)log(x, base=base) ?? ?? toGeneric(wsx) ?? ??wsx - function(x, ...) UseMethod(wsx) ?? ??wsx.default - function (x, base = 2) ?? ??log(x, base = base) ?? ?? wsx ?? ??function (x, ...) ?? ??UseMethod(wsx) ?? ?? wsx.default ?? ??function (x, base = 2) ?? ??log(x, base = base) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: [hidden email] [mailto:[hidden email]] On Behalf Of rballen Sent: Wednesday, December 22, 2010 2:42 PM To: [hidden email] Subject: [R] forcing evaluation of a char string argument I'm trying to make a function to turn a regular function into an S3 generic one. I want myMethod to be: function(x,...) UseMethod(myMethod) But I keep getting: function(x,...) UseMethod(func) Here's the function: toGeneric-function(func) { env-environment(get(func)) # default method of new generic = the original function assign(paste(func,.default,sep=),get(func),pos=env) assign(func,function(x,...) UseMethod(func),pos=env) } toGeneric(myMethod) I messed around with force, substitute, and deparse, but I can't get any of those to help. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/forcing-evaluation-of-a-char-str ing-argument-tp3161365p3161365.html Sent from the R help mailing list archive at Nabble.com. __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. View message @ http://r.789695.n4.nabble.com/forcing-evaluation-of-a-char-string-argument-tp3161365p3161542.html To unsubscribe from forcing evaluation of a char string argument, click here. -- View this message in context: http://r.789695.n4.nabble.com/forcing-evaluation-of-a-char-string-argument-tp3161365p3161666.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with function
On Fri, 17 Dec 2010, Iain Gallagher wrote: Hello List I'm moving this over from the bioC list as, although the problem I'm working on is biological, the current bottle neck is my poor understanding of R. I wonder if someone would help me with the following function. Here is how I'd take it apart. Either 1) put browser() as the first line of the function,then feed lines to the browser one -by-one to see where it hangs, 2) trace(cumulMetric) , then try to run it (much like 1, but it will handle feeding the lines of the function more easily) 3) options( error = recover ), then run it till it hits the error, then browser thru the frames to see what is where See ?browser ?trace ?recover as background. HTH, Chuck cumulMetric - function(deMirPresGenes, deMirs){ #need to match position of each miR in deMirPresGenes with its FC to form a vector of FC in correct order fc - deMirs fcVector - as.numeric(with (fc, FC[match(deMirPresGenes[,4], Probe)] ) ) #multiply fc by context score for each interaction metric - fcVector * as.numeric(deMirPresGenes[,11]) geneMetric - cbind(deMirPresGenes[,2], as.numeric(metric)) #make cumul weighted score listMetric - unstack(geneMetric, as.numeric(geneMetric[,2])~geneMetric[,1]) listMetric - as.data.frame(sapply(listMetric,sum)) #returns a dataframe colnames(listMetric) - c('cumulMetric') #return whole list return(listMetric) } deMirPresGenes looks like this: Gene.ID Gene.Symbol Species.ID miRNA Site.type UTR_start UTR_end X3pairing_contr local_AU_contr position_contr context_score context_percentile 22848 AAK1 9606 hsa-miR-183 2 1546 1552 -0.026 -0.047 0.099 -0.135 47 19 ABCA1 9606 hsa-miR-183 2 1366 1372 -0.011 -0.048 0.087 -0.133 46 20 ABCA2 9606 hsa-miR-495 2 666 672 -0.042 -0.092 -0.035 -0.33 93 23456 ABCB10 9606 hsa-miR-183 3 1475 1481 0.003 -0.109 -0.05 -0.466 98 6059 ABCE1 9606 hsa-miR-495 2 1474 1480 0.005 -0.046 0.006 -0.196 58 55324 ABCF3 9606 hsa-miR-1275 3 90 96 0.007 0.042 -0.055 -0.316 94 although it is much longer in 'real life'. The aim of the function is to extract a dataframe of gene symbols along with a weighted score from the above data. The weighted score is the FC column of deMirs * the context_score column of deMirPresGenes. This is easy peasy! Where I'm falling down is that if I run this function it complains that 'geneMetric' can't be found. Hmm - I've run it all line by line (i.e. not as a function) and it works but wrapped up like this it fails! e.g. testF2 - cumulMetric(testF1, deMirs$up) Error in eval(expr, envir, enclos) : object 'geneMetric' not found deMirs$up looks like this: Probe FC hsa-miR-183 2.63 hsa-miR-1275 2.74 hsa-miR-495 3.13 hsa-miR-886-3p 3.73 hsa-miR-886-5p 3.97 hsa-miR-144* 6.62 hsa-miR-451 7.94 In an effort to debug this I have examined each object using 'print' statements (as suggested by cstrato on the bioC list). All the objects in the function up until listMetric look ok in terms of structure and length. But the error persists and listMetric is not made?!?! Odd. I have added some comments to the output below. tf2-cumulMetric(tf1, deMirs$up)#deMirs$up is a dataframe (see above - it is the same as deMirs) [1] 2.63 2.63 3.13 2.63 3.13 2.74 # print fcVector - looks ok [1] -0.35505 -0.34979 -1.03290 -1.22558 -0.61348 -0.86584 # print metric - looks ok [1] 1045 # length of metric - is correct sym metric # print geneMetric - looks ok [1,] AAK1 -0.35505 [2,] ABCA1 -0.34979 [3,] ABCA2 -1.0329 [4,] ABCB10 -1.22558 [5,] ABCE1 -0.61348 [6,] ABCF3 -0.86584 [1] 1045 # nrow of geneMetric - is correct Error in eval(expr, envir, enclos) : object 'geneMetric' not found Could someone possibly point out where I falling down. Thanks iain sessionInfo() R version 2.12.0 (2010-10-15) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 [7] LC_PAPER=en_GB.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.12.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry
Re: [R] Confidence Intervals for Odds Ratios in multivariate logistic regression
On Wed, 8 Dec 2010, S.M. Raghavan wrote: Hi all, I am trying to fit a logistic regression for a bivariate response using five independent variables in a stepwise procedure. My outputs look okay but does any one know (or is there any literature on) how the confidence intervals are calculated for the reported odds ratios..? Bert and David have wanred you about the misleading results that confidence intervals can give with stepwise procedures. There are a number of approaches that are not so misleading. For one such and perhaps some insights into the problems that David and Bert were pointing out, try this: install.packages( BMA ) library( BMA ) ?bic.glm example( bic.glm ) HTH, Chuck Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep for strings
On Sun, 5 Dec 2010, Santosh Srinivas wrote: I am trying to find the function where I can search for a pattern in a text string (I thought I could use grep for this but no :(). Correct, but reading ?grep and running example( grep ) should show you how to do that search. x [1] abcdefghijkl I want to find the positions (i.e. equivalent of nchar) for cd and in case there are multiple hits .. then the results as a array You will need to deal with the possibility that there are more 'cd's in some elements of 'text' than in others. So, it is not obvious that an _array_ will work without some special tooling. HTH, Chuck Thank you. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to pass selection criteria in a function
On Wed, 1 Dec 2010, cmccar...@bmcc.cuny.edu wrote: Hi, Suppose I have the following data name score Abel??? 88 Baker? 54 Charlie??? 77 stored a? table called myData. I want to write a function that will create a table which is a subset of myData containing those have a score 75. I know I can do this with the following command: subset(myData, score 75) But I would like to do this via a function, something like: newTable - function( data,? criteria){ ??? subset( data, criteria) } and then calling: newTable(myData, score 75) But this doesn't work. I am sure there is a simple way to do this, but I am stuck! Please help. Thanks! Simple? Maybe not so much! You are trying to pass objects without evaluating them. subset is rather special in the way it works. Here is one way: foo - function(x,...){ + mc - match.call() + mc[[1]] - as.name(subset) + eval(mc) + } foo(iris, Petal.Width2.4 ) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 101 6.3 3.3 6.0 2.5 virginica 110 7.2 3.6 6.1 2.5 virginica 145 6.7 3.3 5.7 2.5 virginica Reading the code at the top of lm shows how this kind of strategy can be used. You might look at ?bquote for another way to skin this cat. HTH, Chuck Chris?McCarthy, PhD Department?of?Mathematics?BMCC Office:?N522???Extension:?5235 cmccar...@bmcc.cuny.edu [[alternative HTML version deleted]] Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] more flexible ave
On Tue, 30 Nov 2010, Adaikalavan Ramasamy wrote: Here is a possible solution using sweep instead of ave: df - data.frame(site = c(a, a, a, b, b, b), gr = c(total, x1, x2, x1, total,x2), value1 = c(212, 56, 87, 33, 456, 213), value2 = c(1546, 560, 543, 234, 654, 312) ) lm() and friends provide a simple approach: cbind( df, percent = + df[,-(1:2)] / + predict( lm( cbind(value1,value2) ~ gr*site, df), + new=data.frame(site=df$site,gr='total' )) + ) sitegr value1 value2 percent.value1 percent.value2 1a total212 1546 1. 1.000 2ax1 56560 0.26415094 0.3622251 3ax2 87543 0.41037736 0.3512290 4bx1 33234 0.07236842 0.3577982 5b total456654 1. 1.000 6bx2213312 0.46710526 0.4770642 HTH, Chuck sdf - split(df, df$site) out - lapply( sdf, function(mat){ small.mat - mat[ , -c(1,2)] totals- mat[ which( mat[ , gr] == total ), -c(1,2) ] totals- as.numeric(totals) percent=sweep( small.mat, MARGIN=2, STATS=totals, FUN=/ ) colnames(percent) - paste(percent_, colnames(percent), sep=) return( cbind(mat, percent) ) } ) do.call(rbind, out) sitegr value1 value2 percent_value1 percent_value2 a.1a total212 1546 1. 1.000 a.2ax1 56560 0.26415094 0.3622251 a.3ax2 87543 0.41037736 0.3512290 b.4bx1 33234 0.07236842 0.3577982 b.5b total456654 1. 1.000 b.6bx2213312 0.46710526 0.4770642 Also I think it might be more efficient to replace your gr variable with a binary 0,1 where 1 indicates the total. That way you don't have to generate x1, x2, x3, Regards, Adai On 30/11/2010 14:42, Patrick Hausmann wrote: Hi all, I would like to calculate the percent of the total per group for this data.frame: df- data.frame(site = c(a, a, a, b, b, b), gr = c(total, x1, x2, x1, total,x2), value1 = c(212, 56, 87, 33, 456, 213)) df calcPercent- function(df) { df- transform(df, pct_val1 = ave(df[, -c(1:2)], df$gr, FUN = function(x) x/df[df$gr == total, value1]) ) } # This works as intended... w- lapply(split(df, df$site), calcPercent) w- do.call(rbind, w) w # ... but when I add a new column df$value2- c(1546, 560, 543, 234, 654, 312) # the result is not what I want... w- lapply(split(df, df$site), calcPercent) w- do.call(rbind, w) w Clearly I have to change the function, (particularly value1) - but how... I've also played around with apply but without any success. Thanks for any help! Patrick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the first. from SAS in R
On Tue, 23 Nov 2010, Joel wrote: Is there any similar function in R to the first. in SAS? What it dose is: Lets say we have this table: a b c 1 1 5 1 0 2 2 0 2 2 0 NA 2 9 2 3 1 3 and then I want do to do one thing the first time the number 1 appers in a and something else the secund time 1 appers in a and so on. so something similar to: if first.a { a$d-1 }else{ a$d-0 } This would give me a b c b 1 1 5 1 1 0 2 0 2 0 2 1 2 0 NA 0 2 9 2 0 3 1 3 1 Is there such a function in R or anything similar? See ?duplicated then try a$d - ifelse( duplicated( a$a ), 0 , 1 ) and a$d.2 - as.numeric( !duplicated( a$a ) ) HTH, Chuck thx //Joel -- View this message in context: http://r.789695.n4.nabble.com/the-first-from-SAS-in-R-tp3055417p3055417.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the first. from SAS in R
On Tue, 23 Nov 2010, Dennis Murphy wrote: Interesting. Check this out: u - sample(c(TRUE, FALSE), 10, replace = TRUE) u [1] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE class(u) [1] logical u + 0 [1] 0 0 1 0 0 1 0 0 0 0 0 + u [1] 0 0 1 0 0 1 0 0 0 0 v - rpois(10, 3) !duplicated(v) [1] TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE class(!duplicated(v)) [1] logical !duplicated(v) + 0 [1] TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE 0 + !duplicated(v) [1] 1 0 1 1 1 1 0 1 0 1 # Now assign !duplicated(v) to an object: w - !duplicated(v) class(w) [1] logical 0 + w [1] 1 0 1 1 1 1 0 1 0 1 w + 0 [1] 1 0 1 1 1 1 0 1 0 1 I can see *what* is going on, but what is the reason for it? I see another notebook entry coming :) See ?Arithmetic and read the paragraph under Details starting 'Logical vectors' Chuck Dennis On Tue, Nov 23, 2010 at 6:12 AM, David Winsemius dwinsem...@comcast.netwrote: On Nov 23, 2010, at 8:33 AM, Joel wrote: Is there any similar function in R to the first. in SAS? What it dose is: Lets say we have this table: a b c 1 1 5 1 0 2 2 0 2 2 0 NA 2 9 2 3 1 3 and then I want do to do one thing the first time the number 1 appers in a and something else the secund time 1 appers in a and so on. so something similar to: if first.a { a$d-1 }else{ a$d-0 } The duplicated function which returns a logical vector with those features can easily be coerced to numeric. df$d - as.numeric(!duplicated(df$a)) I was a bit puzzled about my failure to get coercion by the method which I thought was supposed to work, namely adding 0. df$e - !duplicated(df$a)+0 # does not coerce df$e - 0 + !duplicated(df$a) # pre-adding 0 does coerce Maybe the rules on coercion were amended. -- David This would give me a b c b 1 1 5 1 1 0 2 0 2 0 2 1 2 0 NA 0 2 9 2 0 3 1 3 1 Is there such a function in R or anything similar? thx //Joel -- View this message in context: http://r.789695.n4.nabble.com/the-first-from-SAS-in-R-tp3055417p3055417.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Slow update(insert) on Data.frame
On Tue, 23 Nov 2010, Joel wrote: Hi When I try to update an number in a large data.frame by its pos It's really slow it take almost a sec to do this and I wonder why and if where is any faster way to update a number in a data.frame ive tried DF$col[POS]-number DF[xPOS,yPOS]-number See ?tracemem then try tracemem( DF ) DF$col[POS]-number DF[xPOS,yPOS]-number then mat - as.matrix( DF ) tracemem( mat ) mat[ 2,3 ] - 4 mat[ 15 ] - 5 If your data.frame can be represented by a matrix, a single update will likely be faster as it can be done without making any copies. If you are doing more than one update, it will help to do them all at once. Reread ?Subscript and section 5.2 and 5.3 of Intro to R if you are unsure how to do that. HTH, Chuck Thx //Joel -- View this message in context: http://r.789695.n4.nabble.com/Slow-update-insert-on-Data-frame-tp3055707p3055707.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the first. from SAS in R
On Tue, 23 Nov 2010, David Winsemius wrote: On Nov 23, 2010, at 11:04 AM, Charles C. Berry wrote: On Tue, 23 Nov 2010, Dennis Murphy wrote: Interesting. Check this out: u - sample(c(TRUE, FALSE), 10, replace = TRUE) u [1] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE class(u) [1] logical u + 0 [1] 0 0 1 0 0 1 0 0 0 0 0 + u [1] 0 0 1 0 0 1 0 0 0 0 v - rpois(10, 3) !duplicated(v) [1] TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE class(!duplicated(v)) [1] logical !duplicated(v) + 0 [1] TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE 0 + !duplicated(v) [1] 1 0 1 1 1 1 0 1 0 1 # Now assign !duplicated(v) to an object: w - !duplicated(v) class(w) [1] logical 0 + w [1] 1 0 1 1 1 1 0 1 0 1 w + 0 [1] 1 0 1 1 1 1 0 1 0 1 I can see *what* is going on, but what is the reason for it? I see another notebook entry coming :) See ?Arithmetic and read the paragraph under Details starting 'Logical vectors' Chuck; Compare these three, all of which are using binary operators on logical vectors which is what is being discussed in ?Arithmetic: duplicated(c(a, a, b) ) + 0 [1] 0 1 0 !duplicated(c(a, a, b) ) + 0 [1] TRUE FALSE TRUE 0 + !duplicated(c(a, a, b) ) [1] 1 0 1 I believe the proper place to go is ?Syntax where operator precedence is discussed. I think the precendence rules implicitly do this in the second instance, because + has higher precendence than negation: ! ( duplicated(c(a, a, b) ) + 0 ) David, Thanks. Both you and David Lorenz are correct in pointing to operator precedence as the answer to Dennis' question. Mea culpa for my not reading Dennis' question carefully enough to understand what his question really was! Best, Chuck -- David. Chuck Dennis On Tue, Nov 23, 2010 at 6:12 AM, David Winsemius dwinsem...@comcast.netwrote: On Nov 23, 2010, at 8:33 AM, Joel wrote: Is there any similar function in R to the first. in SAS? What it dose is: Lets say we have this table: a b c 1 1 5 1 0 2 2 0 2 2 0 NA 2 9 2 3 1 3 and then I want do to do one thing the first time the number 1 appers in a and something else the secund time 1 appers in a and so on. so something similar to: if first.a { a$d-1 }else{ a$d-0 } The duplicated function which returns a logical vector with those features can easily be coerced to numeric. df$d - as.numeric(!duplicated(df$a)) I was a bit puzzled about my failure to get coercion by the method which I thought was supposed to work, namely adding 0. df$e - !duplicated(df$a)+0 # does not coerce df$e - 0 + !duplicated(df$a) # pre-adding 0 does coerce Maybe the rules on coercion were amended. -- David This would give me a b c b 1 1 5 1 1 0 2 0 2 0 2 1 2 0 NA 0 2 9 2 0 3 1 3 1 Is there such a function in R or anything similar? thx //Joel -- View this message in context: http://r.789695.n4.nabble.com/the-first-from-SAS-in-R-tp3055417p3055417.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 David Winsemius, MD West Hartford, CT Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the first. from SAS in R
On Tue, 23 Nov 2010, seeliger.c...@epamail.epa.gov wrote: Is there any similar function in R to the first. in SAS? ?duplicated a$d - ifelse( duplicated( a$a ), 0 , 1 ) a$d.2 - as.numeric( !duplicated( a$a ) ) Actually, duplicated does not duplicate SAS' first. operator, though it may suffice for the OP's needs. To illustrate, let's start with a dataframe of 3 key columns and some data in x: tt - data.frame(k1 = rep(1:3, each=10), k2 = rep(1:5, each=2, times=3), k3=rep(1:2, times=15), x = 1:30) # Try to mimic what the following SAS datastep would do, # assuming 'tt' is already sorted: # data foo; # set tt; # by k1, k2; # put first.k1=, first.k2=; # run; # SAS' first. operations would result in these values: tt$sas.first.k1 - rep(c(1, rep(0,9)), 3) tt$sas.first.k2 - rep(1:0, 15) # R duplicated() returns these values. You can see they # are the same for k1, but dissimilar after row 10 for k2. tt$duplicated.k1 - 0+!duplicated(tt$k1) tt$duplicated.k2 - 0+!duplicated(tt$k2) It depends on how you use duplicated() all.equal( tt$sas.first.k2, 0+!duplicated( tt[, c(k1,k2) ] ) ) [1] TRUE Chuck # I've found I need to lag a column to mimic SAS' first. # operator, thusly, though perhaps someone else knows # differently. Note this does not work on unordered # dataframes! lag.k1 - c(NA, tt$k1[1:(nrow(tt) - 1)]) tt$r.first.k1 - ifelse(is.na(lag.k1), 1, tt$k1 != lag.k1) lag.k2 - c(NA, tt$k2[1:(nrow(tt) - 1)]) tt$r.first.k2 - ifelse(is.na(lag.k2), 1, tt$k2 != lag.k2) Mimicking SAS' last. operation can be done in a similar manner, by anti-laging the column of interest and changing the comparisons somewhat. Enjoy the days, cur -- Curt Seeliger, Data Ranger Raytheon Information Services - Contractor to ORD seeliger.c...@epa.gov 541/754-4638 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficient conversion of matrix column rows to list elements
On Wed, 17 Nov 2010, Chris Carleton wrote: Hi List, I'm hoping to get opinions for enhancing the efficiency of the following code designed to take a vector of probabilities (outcomes) and calculate a union of the probability space. As part of the union calculation, combn() must be used, which returns a matrix, and the parallelized version of lapply() provided in the multicore package requires a list. I've found that parallelization is very necessary for vectors of outcomes greater in length than about 10 or 15 elements, which is why I need to make use of multicore (and, therefore, convert the combn() matrix to a list). It would speed the process up if there was a more direct way to convert the columns of combn() to elements of a single list. I think you are mistaken. Is this what Rprof() tells you? On my system, combn() is the culprit Rprof() outcomes - 1:25 nada - replicate(200, {apply(combn(outcomes,2),2,column2list);NULL}) Rprof(NULL) summaryRprof() $by.self self.time self.pct total.time total.pct combn0.6461.54 0.70 67.31 apply0.2019.23 1.04100.00 FUN 0.10 9.62 1.04100.00 != 0.04 3.85 0.04 3.85 0.02 1.92 0.02 1.92 -0.02 1.92 0.02 1.92 is.null 0.02 1.92 0.02 1.92 And it hardly takes any time at that! HTH, Chuck p.s. Isn't as.data.frame( combn( outcomes, 2 ) ) or combn(outcomes, 2, list ) good enough? Any constructive suggestions will be greatly appreciated. Thanks for your consideration, C code: unionIndependant - function(outcomes) { intsctn - c() column2list - function(x){list(x)} pb - ProgressBar(max=length(outcomes),stepLength=1,newlineWhenDone=TRUE) for (i in 2:length(outcomes)){ increase(pb) outcomes_ - apply(combn(outcomes,i),2,column2list) for (j in 1:length(outcomes_)){outcomes_[[j]] - outcomes_[[j]][[1]]} outcomes_container - mclapply(outcomes_,prod,mc.cores=3) intsctn[i] - sum(unlist(outcomes_container)) } intsctn - intsctn[-1] return(sum(outcomes) - sum(intsctn[which(which((intsctn %in% intsctn)) %% 2 == 1)]) + sum(intsctn[which(which((intsctn %in% intsctn)) %% 2 == 0)]) + ((-1)^length(intsctn) * prod(outcomes))) } PS This code has been tested on vectors of up to length(outcomes) == 25 and it should be noted that ProgressBar() requires the R.utils package. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.matrix behaves weird
On Sun, 14 Nov 2010, Feng Mai wrote: ncol is not an argument for as.matrix() Perhaps this is more than Alexx wanted to know, but ... Things can get a wee bit tricky when the function is a 'generic'. More precisely: 'ncol' is not an argument for any method for as.matrix() that is on your search path. as.matrix(x,...) is an S3 generic function. try methods( as.matrix ) and you will see what methods are available for it. And some do have arguments other than 'x': args( as.matrix.data.frame ) function (x, rownames.force = NA, ...) NULL If you wanted to (but you don't, this is just didactic) you could make a method that accepts the ncol arg for a particular class as.matrix.columned - function(x,ncol=1) matrix(x,ncol=ncol) x - 1:3 class(x) - 'columned' as.matrix(x, ncol=3) [,1] [,2] [,3] [1,]123 HTH, Chuck Alexx Hardt mikrowelle1234 at gmx.de writes: Hi, can someone tell me why x is still a 2x1-matrix in the last step? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count occurrence and distance of characters in string
On Thu, 4 Nov 2010, Immanuel wrote: Hello all, I want to know how often one character occurs in a given string and the distance from between every two occurences. (distance = other characters between them). You should provide commented, minimal, self-contained, reproducible code as asked. And especially for a question like this one with many simple answers that RespondeRs will shower you with if only you give them a starting point. Use tapply, strsplit, seq, nchar, unlist, diff, -, and table for one way. Chuck thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count occurrence and distance of characters in string
On Fri, 5 Nov 2010, Immanuel wrote: Hey, thanks for the answer, actually I already typed an example but deleted it since I thought it's superfluous. regards - string - kjokllokkoadddo # f1(string, o) should return that o was found 4 times # f2(string, o) should return that the distances between the o's found is 3 , 2, 4 - In that case, I'd use split: res - split(seq(nchar(string)),unlist(strsplit(string,''))) length(res[['o']]) [1] 4 ## or sapply(res,length) a d j k l o 1 3 1 4 2 4 diff(res[['o']])-1 [1] 3 2 4 # or sapply(sapply(res,diff),-,1) $a numeric(0) $d [1] 0 0 $j numeric(0) $k [1] 2 3 0 $l [1] 0 $o [1] 3 2 4 Chuck On 11/05/2010 12:28 AM, Charles C. Berry wrote: On Thu, 4 Nov 2010, Immanuel wrote: Hello all, I want to know how often one character occurs in a given string and the distance from between every two occurences. (distance = other characters between them). You should provide commented, minimal, self-contained, reproducible code as asked. And especially for a question like this one with many simple answers that RespondeRs will shower you with if only you give them a starting point. Use tapply, strsplit, seq, nchar, unlist, diff, -, and table for one way. Chuck thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merging nested lists
On Wed, 27 Oct 2010, Charles C. Berry wrote: On Wed, 27 Oct 2010, Alex P. wrote: Hello All, I have multiple list of lists in the form of Mylist1[[N]][[K]]$Name_i, with N=1..6, K=1..3, and i=1..7. Each Name_i is a matrix. I have 30 of these objects Mylist1, Mylist2, ... I would like to merge these lists by each Name_i using rbind, but I couldn't figure out how to do it. What I want at the end is a single list of lists, again in the form of Mylist[[N]][[K]]$Name_i. Manually doing it is not feasible given the large number of Mylist objects. Turn them into a single array of mode 'list', then do the rbind'ing, and save the result as an array (or see ?relist for imposing the nested structure of the orignal lists) Something like: I see a few hiccups below. Maybe Alex P. should include a minimal, self-contained example that RespondeRs could try out... objs - paste('MyList',1:30,sep='') big.list - lapply( objs, get ) for (i in 1:4) big.list - unlist(big.list,recursive=FALSE) for (i in 1:3) ... dim( big.list ) - c(30, 6, 3, 7 ) dim( big.list ) - rev( c(30, 6, 3, 7 ) ) res - apply( big.list, 2:4, rbind ) dim(res) - c( 30, 3, 7 ) res - apply( big.list, 1:3, function(x) list( do.call( rbind, x ) )) Chuck HTH, Chuck Thanks in advance, Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merging nested lists
On Wed, 27 Oct 2010, Alex P. wrote: Hello All, I have multiple list of lists in the form of Mylist1[[N]][[K]]$Name_i, with N=1..6, K=1..3, and i=1..7. Each Name_i is a matrix. I have 30 of these objects Mylist1, Mylist2, ... I would like to merge these lists by each Name_i using rbind, but I couldn't figure out how to do it. What I want at the end is a single list of lists, again in the form of Mylist[[N]][[K]]$Name_i. Manually doing it is not feasible given the large number of Mylist objects. Turn them into a single array of mode 'list', then do the rbind'ing, and save the result as an array (or see ?relist for imposing the nested structure of the orignal lists) Something like: objs - paste('MyList',1:30,sep='') big.list - lapply( objs, get ) for (i in 1:4) big.list - unlist(big.list,recursive=FALSE) dim( big.list ) - c(30, 6, 3, 7 ) res - apply( big.list, 2:4, rbind ) dim(res) - c( 30, 3, 7 ) HTH, Chuck Thanks in advance, Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on passing the subset argument to an lm wrapper
On Sun, 24 Oct 2010, Erik Iverson wrote: Hello, How would you go about handling the following situation? This is on R 2.12.0 on Ubuntu 32-bit. I have a wrapper function to lm. I want to pass in a subset argument. First, I just thought I'd use The subset arg needs to be unevaluated until lm() - or whatever function - is called in your function. The canonical advice for this kind of question about passing unevaluated args is to study the first lines of the function lm noting what it does with objects cl - match.call() and mf - match.call( expand.dots = FALSE ). Something like this might be what you want: testlm2 - function(formula, ...) { mc - match.call() mc[[1]] - as.name('lm') eval(mc) } testlm2(bmi ~ age, data= df1, subset = age 50) HTH, Chuck ## make example reproducible set.seed(123) df1 - data.frame(age = rnorm(100, 50, 10), bmi = rnorm(100, 30, sd = 2)) ## create a wrapper using ... testlm - function(formula, ...) { lm(formula, data = df1, ...) } testlm(bmi ~ age, subset = age 50) Error in eval(expr, envir, enclos) : ..1 used in an incorrect context, no ... to look in I found some other examples of this error message, but couldn't piece together how it fits in with this example. Next, I tried specifying a subset argument. testlm2 - function(formula, subset) { lm(formula, data = df1, subset = subset) } testlm2(bmi ~ age, subset = age 50) Error in xj[i] : invalid subscript type 'closure' I also don't understand this one. Any pointers on if I'm just missing the easy solution to do what I want? Any explanations as to the above behavior (I know it has to do with model.frame, but not sure how) would also be greatly appreciated! Thanks! --Erik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Putting the same array into a matrix
On Thu, 14 Oct 2010, Desmond Lim wrote: Hi, I have an array and I want to put in into a matrix x number of times. Currently I doing this matrix - cbind(array, array, array). Is there a more elegant way of doing this? Fortunately! If 'array' really is a matrix (bad choice of names here, Bub!), then a.matrix - matrix( rep( array, 3 ), nc = ncol( array )* 3 ) But this will work too: a.matrix - do.call( cbind, rep( list( array ), 3) ) even if 'array' is a data.frame HTH, Chuck I've tried matrix - cbind(rep(array, times=x)) and matrix - rep(cbind(array), times = 5) but it didn't work. Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [OT] (slightly) - OpenOffice Calc and text files
On Wed, 13 Oct 2010, Schwab,Wilhelm K wrote: It will get a good look, as will gnumeric - thanks to all! emacs org-mode can convert your tab delimited file to a 'table' that you can edit either using org-mode functions OR as plain text by switching to fundamental mode. In emacs speak, just put the cursor at the top of a buffer holding your file and do M-x replace-string RET TAB RET | RET I think, then move your cursor to a line that has a '|' in it and hit TAB, and you have a neatly formatted table. See, http://orgmode.org/worg/org-tutorials/tables.php for an intro. A big advantage in using an org-mode table is you can place an R source code block further down in the same file, and it can read in the data in the table. Then you can go back to the table to edit, then rerun R, ... I append an example below. There is a load of tutorial info at http://orgmode.org/worg/org-tutorials/index.php HTH, Chuck #+begin_example #+tblname: simpleDF | a | b | c | |---+---+---| | 1 | 2 | 3 | | 5 | 4 | 2 | |---+---+---| #+end_example #+begin_src R :var df=simpleDF :results output :colnames yes summary( df ) #+end_src #+results: :a b c : Min. :1 Min. :2.0 Min. :2.00 : 1st Qu.:2 1st Qu.:2.5 1st Qu.:2.25 : Median :3 Median :3.0 Median :2.50 : Mean :3 Mean :3.0 Mean :2.50 : 3rd Qu.:4 3rd Qu.:3.5 3rd Qu.:2.75 : Max. :5 Max. :4.0 Max. :3.00 Bill From: Albyn Jones [jo...@reed.edu] Sent: Wednesday, October 13, 2010 2:14 PM To: Schwab,Wilhelm K Subject: Re: [R] [OT] (slightly) - OpenOffice Calc and text files emacs shows you exactly what is there, nothing more nor less. it isn't a spreadsheet, but tabs will align columns. albyn On Wed, Oct 13, 2010 at 01:53:46PM -0400, Schwab,Wilhelm K wrote: Albyn, I'll look into it. In fact, I have a small book on it that I bought in my very early days of using Linux. I quickly found TeX Maker (for the obvious), Code::Blocks for C/C++ and I would not have started the move without a working Smalltalk (http://pharo-project.org/home). For editing data files, I really just want something that shows data in an understandable grid and does not do weird stuff thinking it's being helpful. Bill From: Albyn Jones [jo...@reed.edu] Sent: Wednesday, October 13, 2010 1:39 PM To: Schwab,Wilhelm K Cc: r-help@r-project.org Subject: Re: [R] [OT] (slightly) - OpenOffice Calc and text files How about emacs? albyn On Wed, Oct 13, 2010 at 01:13:03PM -0400, Schwab,Wilhelm K wrote: Hello all, . Have any of you found a nice (or at least predictable) way to use OO Calc to edit files like this? If it insists on thinking for me, I wish it would think in 24 hour time and 4 digit years :) I work on Linux, so Excel is off the table, but another spreadsheet or text editor would be a viable option, as would configuration changes to Calc. Bill __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Poisson Regression
On Wed, 13 Oct 2010, David Winsemius wrote: On Oct 13, 2010, at 4:50 PM, Antonio Paredes wrote: Hello everyone, I wanted to ask if there is an R-package to fit the following Poisson regression model log(\lambda_{ijk}) = \phi_{i} + \alpha_{j} + \beta_{k} i=1,\cdots,N (subjects) j=0,1 (two levels) k=0,1 (two levels) treating the \phi_{i} as nuinsance parameters. If I am reading this piece correctly there should be no difference between a conditional treatment of phi_i in that model and results from the unconditional model one would get from fitting with glm(lambda ~ phi + alpha + beta ,family=poisson). Right. But if N is large, the model.matrix will be huge and there may be problems with memory and elapsed time. loglin() and loglm() will fit the same model without need for a model.matrix (modulo having enough data to actually fit that model), and large values of N are no big deal. HTH, Chuck http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.6.9679rep=rep1type=pdf (But I am always looking for corrections to my errors.) -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to convert a list to a ... argument for a function
On Tue, 5 Oct 2010, Richard R. Liu wrote: Hi, I have a function f - function(..., func){ something }, where func is a function of the form function(...). ??I would like to pass func all the arguments passed to f except the last. ??I know that I can manipulate the variable number of arguments passed to f by converting ... to a list, i.e., arglist - list(...). ??But how do I pass func the first n-1 list items of arglist (n - length(arglist)), as n-1 arguments, not as one list of n-1 items? One way: foo - function(...,func){ mc - match.call() mc[[1]] -mc$func mc$func - NULL mc[[ length(mc) ]] - NULL eval(mc) } foo(1,2,3,4,func=c) foo(1,2,4,3,func=sum) or perhaps you meant like this: foo - function(...,func){ mc - match.call() mc[[1]] -mc$func mc$func - NULL eval(mc) } foo(1,2,3,4,func=c) foo(1,2,4,3,func=sum) HTH, Chuck Regards, Richard Richard R. Liu richard@pueo-owl.ch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about Reduce
On Fri, 1 Oct 2010, Dimitri Liakhovitski wrote: Hello! In the example below the Reduce() function by default assigns a to be the last accumulated value and b to be the current value in x. I could not find this documented anywhere as the default settings for the Reduce() function. Does any sort of documentation for this behavior exist? Yes. Under 'Details' on the Reduce help page: ... a left reduce computes l_1 = f(v_1, v_2), l_2 = f(l_1, v_3), etc... HTH, Chuck x - c(0,0,0,0,0,1,0,0,0,5,0,0,0,7,0,0,0,8,5,10) Reduce(function(a,b) b + (a * 0.5),x,accumulate=T,init=0) [1] 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.500 0.250 [10] 0.125 5.0625000 2.5312500 1.2656250 0.6328125 7.3164062 3.6582031 1.8291016 0.9145508 [19] 8.4572754 9.2286377 14.6143188 Thanks a lot! -- Dimitri Liakhovitski Ninah Consulting www.ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to apply vector value function to a multidimensional array indexed by the remaining dimensions?
On Fri, 1 Oct 2010, yunjiangster wrote: Hi, I am looking for some generalization of colSums and rowSums for general vector valued functions, and for arrays of more than 2 dimensions. So as a concrete example, suppose I have a 3 dimensional array, given by x = array(1:100,c(3,4,5)). and I want to sum the 3rd index of x to obain a 3 by 4 matrix. Using rowSums would return a vector of length 3 because it treats the last two indices as a single index. see the help page for colSums, esp. the 'dims' arg Besides summation, let's say if I define a vector valued function f-function(x){min(x[1],min(x[2],x[3]))}, and I want to apply f to the last index of x, meaning for each 1= i = 3, 1=j =4, I want to compute f(x[i,j,]), and then put them in a 3 by 4 matrix. How would I be able to do that without using a for loop? ?apply thanks. Sincerely, John -- View this message in context: http://r.789695.n4.nabble.com/How-to-apply-vector-value-function-to-a-multidimensional-array-indexed-by-the-remaining-dimensions-tp2937368p2937368.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave and LaTeX beamer class
On Thu, 30 Sep 2010, Johannes Huesing wrote: I am failing to uncover Sweave chunks step by step using the LaTeX beamer class. The following minimal example: \documentclass{beamer} \usepackage{Sweave} \begin{document} \begin{frame}[fragile] In the year \uncover2-{25}\uncover3-{\Sexpr{5*5}} \uncover4-{ echo=TRUE, print=TRUE= 5*5*101 @ } \end{frame} \end{document} leads to an error message when running pdflatex over the *.tex file: [...] ! FancyVerb Error: Extraneous input ` 2525 \end {Soutput} \end {Schunk} \bea...@endcovered ' bet ween \begin{Soutput}[key=value] and line end . \...@error ... {FancyVerb Error: \space \space #1 } I do not have enough of an understanding of LaTeX to explain this, but it seems \uncover can be tricky. This works for me: \documentclass{beamer} \usepackage{Sweave} \begin{document} \begin{frame}[fragile] In the year \uncover2-{25}\uncover3-{\Sexpr{5*5}} \begin{uncoverenv}4- echo=TRUE, print=TRUE= 5*5*101 @ %def \end{uncoverenv} \end{frame} \end{document} -- Johannes H??sing There is something fascinating about science. One gets such wholesale returns of conjecture mailto:johan...@huesing.name from such a trifling investment of fact. http://derwisch.wikidot.com (Mark Twain, Life on the Mississippi) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] drawing samples based on a matching variable
On Tue, 28 Sep 2010, L Brown wrote: Hi, everyone. I have what I hope will be a simple coding question. It seems this is a common job, but so far I've had trouble finding the answer in searches. I have two matrices (x and y) with a different number of observations in each. I need to draw a random sample without replacement of observations from x, and then, using a matching variable, draw a sample of equal size from y. It is the matching variable that is hanging me up. For example-- # example matrices. lets assume seed always equals 1. (lets also assume I have assigned variable names A and B to my columns..) set.seed(1) x-cbind(1:10,sample(1:5,10,rep=T)) x [A] [B] [1,]12 [2,]22 [3,]33 [4,]45 [5,]52 [6,]65 [7,]75 [8,]84 [9,]94 [10,] 101 Looks like set.seed(1) was invoked here, too. y-cbind(1:14,sample(1:5,14,rep=T)) y [A] [B] [1,]12 [2,]22 [3,]33 [4,]45 [5,]52 [6,]65 [7,]75 [8,]84 [9,]94 [10,] 101 [11,] 112 [12,] 121 [13,] 134 [14,] 142 #draw random sample of n=4 without replacement from matrix x. x.samp-x[sample(10,4,replace=F),] x.samp [A] [B] [1,]33 [2,]45 [3,]52 [4,]75 Next, I would need to draw four observations from matrix y (without replacement) so that the distribution of y$B is identical to x.samp$B. I'd appreciate any help, and sorry to post such a basic question! Break it down like this: x.levels - sort( unique(x[,2]) ) x.samp.tab - table( factor( x.samp[,2], x.levels ) ) y.rows - split(1:nrow(y),factor( y[,2], x.levels ) ) unlist( mapply( sample, y.rows, x.samp.tab ),use.names=FALSE ) In some cases sample might fail if length( y.rows[[i]] ) x.samp.tab[ i ] you can trace which element that was using 'traceback()' or write a wrapper for sample() that checks that condition. HTH, Chuck LB [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sample from very large distribution
On Thu, 30 Sep 2010, Matthew Finkbeiner wrote: I don't have enough RAM for this problem, so I need a work around. This is what I want to do: y- sample(2^32, 10, replace=FALSE) y - trunc(runif( 10, 1, 2^32+1)) while( any( dup.y -duplicated(y) ) ) y[dup.y] - trunc(runif( sum(dup.y), 1, 2^32+1)) HTH, Chuck but my machine won't let me do that. so I now do this: x- seq(1,2^32, by=100) y- sample(x, 10, replace=FALSE) this works fine, but by selecting every 100th item, it introduces a systematicity that may be problematic. I've tried this: x- seq(1,2^32, by=sample(1:200, 1)) but that yields some unpredictable behavior so, any suggestions? Thank you kindly, Matthew [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OT: Is randomization for targeted cancer therapies ethical?
On Mon, 20 Sep 2010, Bert Gunter wrote: Hi Folks: **Off Topic** Those interested in clinical trials may find the following of interest: http://www.nytimes.com/2010/09/19/health/research/19trial.html It concerns the ethicality of randomizing those with life-threatening disease to relatively ineffective SOC when new biologically targeted therapies appear to be more effective. While the context may be new, the debate, itself, is not: Tukey wrote (or maybe it was talked -- I can't remember for sure) about this about 30 years ago. I'm sure many other also have done so. Anscombe's remarkable (and influential) review of Armitage's 'Sequential Medical Trials' back in 1963 http://www.jstor.org/stable/2283272 is worth a look by any statistician who is interested in this topic. It makes explicit several factors that weigh in the ethical assessment of a particular trial design. He discusses in formal terms the weighing of outcomes for patients in the trial at hand aginst those of future patients and the impact that this might have on design decisions. HTH, Chuck Cheers, Bert -- Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] break the long R code lines automatically
On Tue, 24 Aug 2010, heyi xiao wrote: Dear all, I have written some R source program with many thousands of lines. I didn???t insert line breaks automatically or manually for the long lines. But now I would like to edit the source code in Emacs/ESS to make it more formal as a package. One of the major problems here is how to break the long lines automatically. Emacs auto-fill-mode only works for the lines you are typing in currently, and fill commands like M-q (fill-paragraph) or M-x fill-region (fill-region) mess up the R code lines as they take a whole function/paragraph as a long line, and remove the original line breaks. I find simple solutions for indenting code regions in Emacs/ESS, but no good ones for breaking code lines. However, I saw the nice multi-line codes in all R/Bioconductor packages. Please let me know if you have any ideas on how people usually break the existent long R code lines automatically. I will really appreciate your kind help! Not particualrly elegant, but a combination of parse and print will break long lines: cat(y - ,paste( 1:20,collapse= + ),\n,y2 - , + paste( 1:20,collapse=+),\n,file=testwrap.R) for (iexpr in parse(testwrap.R)) print(iexpr) y - 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 y2 - 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 nchar(readLines(testwrap.R)) [1] 95 59 and of course you will want 'sink' or some such to save the lines. HTH, Chuck Heyi [[alternative HTML version deleted]] Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] on abort error, always show call stack?
On Sun, 22 Aug 2010, Matt Shotwell wrote: On Sun, 2010-08-22 at 11:41 -0400, ivo welch wrote: Dear R Wizards---is it possible to get R to show its current call stack (sys.calls()) upon an error abort? I don't use ESS for execution, and it is often not obvious how to locate how I triggered an error in an R internal function. Seeing the call stack would make this easier. (right now, I sprinkle cat statements everywhere, just to locate the line where the error appears.) Of course, I would really love to see the line in my program that triggered this, but I have asked this before, and I understand this is too difficult to get into the R language. The traceback() function will print out the call stack after an error. However, you may find the debug() family of functions more useful for debugging. Also see the browser() function. Further, you might set options( error = recover ) which prints the list of current calls, and prompts the user to select one of them. And then allows browser()-ing. Chuck -Matt regards, /iaw Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew S. Shotwell Graduate Student Division of Biostatistics and Epidemiology Medical University of South Carolina __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] spare matrix replacing values efficiently
On Sun, 22 Aug 2010, david h shanabrook wrote: I am working with a large sparse matrix trying replace all 1 values with 0. The normal method doesn't work. Here is a small example: x - Matrix(0,nrow=10,ncol=10,sparse=TRUE) x[,1] - 1:2 x 10 x 10 sparse Matrix of class dgCMatrix [1,] 1 . . . . . . . . . [2,] 2 . . . . . . . . . [3,] 1 . . . . . . . . . [4,] 2 . . . . . . . . . [5,] 1 . . . . . . . . . [6,] 2 . . . . . . . . . [7,] 1 . . . . . . . . . [8,] 2 . . . . . . . . . [9,] 1 . . . . . . . . . [10,] 2 . . . . . . . . . x[x==1] - 0 Error in .local(x, i, j, ..., value) : not-yet-implemented 'Matrix[-' method Suggestions? Any solution must be memory efficient as my sparse matrix is large. Looks like attr( x, x )[ attr( x, x ) == 1 ] - 0 should do it, but does not reduce the density of the matrix. In order to get that you'll need to do x - drop0(x) HTH, Chuck dhs __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Making a series of similar, but modified .r files - suggested method(s)?
On Sat, 21 Aug 2010, Laura S wrote: Dear all: Any suggestions are much appreciated. I am looking for a way to make a series of similar, but slightly modified, .r files. My issue is automating making 320 .r files that change the for(i in 1:x) in my base .r file (as well as other elements, e.g., the load(...), setwd(...)). For smaller jobs running on a single computer with batch files, I have been manually changing the for(i in 1:x) line, etc.. R is a scripting language (among other things!). You can read in a template .R file and then substitute pieces of it inside a loop that writes the revised peices to a directory from which you can later source or BATCH them. Here is a simple example with just one substitution: Here is the template file where the pieces to be substituted is #for i#: ,[ template.R ] | #for i#{ | toupper(i) | } | ` template - readLines(template.R) # read it in! dir.create(temp) # make a place to save new .Rs # modify the template for (i in letters[1:5]){ + cat( sub( #for i#, +paste(i - ',i,'\n,sep=''), +template), sep='\n', +file=file.path(temp,paste(i,R,sep='.'))) + } # look at a.R readLines(file.path(temp,a.R)) [1] i - 'a' { toupper(i) } sapply(Sys.glob(temp/*.R),source) # Run them all temp/a.R temp/b.R temp/c.R temp/d.R temp/e.R value A B C D E visible TRUE TRUE TRUE TRUE TRUE HTH, Chuck Why does this matter to me? I am planning on running a simulation experiment on a linux cluster as a serial job. Although not elegant, it has been suggested I make 320 .r files so qsub runs one .r file and then selects other jobs. Thus, the manual route I am currently using would take a very long time (given multiple runs of 320 .r files, given experimental replication). Thank you, Laura -- Genius is the summed production of the many with the names of the few attached for easy recall, unfairly so to other scientists - E. O. Wilson (The Diversity of Life) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] U value from wilcox.test
Issues like that in this thread can often be resolved by reading the help page for the relevant function. From: ?wilcox.test Note The literature is not unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney tests. The two most common definitions correspond to the sum of the ranks of the first sample with the minimum value subtracted or not: R subtracts and S-PLUS does not, giving a value which is larger by m(m+1)/2 for a first sample of size m. (It seems Wilcoxon's original paper used the unadjusted sum of the ranks but subsequent tables subtracted the minimum.) R's value can also be computed as the number of all pairs (x[i], y[j]) for which y[j] is not greater than x[i], the most common definition of the Mann-Whitney test. HTH, Chuck On Fri, 20 Aug 2010, Cedric Laczny wrote: Hi Chloe, first of all, I want to note, that you should be careful using the WMW-test. Even though it is often reported to be some sort of a swiss-army-knife for comparing two distributions, recent research on this test has revelaed that it is crucial what hypotheses you consider. Also the assumptions imposed to the test are critical. For the assumptions, the test basically is a test on identical distributions. For your sample sizes, this is in my opinion quite problematic, as you can not really know what the population distributions look like. Furthermore, the test has shown to be quite strongly influenced by differing variances in the two groups. All this is more or less valid for not necessarily small sample sizes, therefore I am not sure how much it might affect your results. Therefore, caution should be adressed to the interpretation of the results. On Friday, 20. August 2010 19:41:55 Chloe wrote: Dear all, I want to compare the efficiency of 2 methods in extracting proteins from algal samples. I collected 6 independant algal samples and I extracted 3 by the method 1 and 3 others by the method 2. So I have 2 groups of 3 samples, that are not paired. I would like to know if the results obtained by these 2 methods are significantly different, I hope method 2 to be more efficient than method 1. As I have few data I went for the Mann-whitney test: method1=c(35,40,56) method2=c(90,110,115) wilcox.test(method1,method2,paired=FALSE,alternative=less) Wilcoxon rank sum test data: method1 and method2 W = 0, p-value = 0.05 alternative hypothesis: true location shift is less than 0 As I have a small number of samples, I would prefer to compare the U value of the Mann-Whitney test to critical value for table rather than to rely on the p-value. Is W value correspond to U value ? From the help I understand that W=U+m*(m+1)/2, is this true ? In the case it is, my U values would be U=W-6=-6!! I thought that a U value could not be neagtive. Im a little bit puzzled on this one... I would agree with you. I can't really help you with this one right now, but doing the calculation of U manually is not really hard for your problem. All the values from method 2 are higher than the ones from method 1. So the ranking would be: method1 : 1,2,3 method2: 4,5,6 = W(rank sum)_m,n = 1 + 2 + 3 = 6 If I use the definition of U from http://de.wikipedia.org/wiki/Mann-Whitney-U- Test I would calculate U = 0 , which goes with your formula: U = W - 6 = 6 - 6 = 0, what makes sense because the values of X are never greater than the ones of Y. (s. link: the formula for U with the two summations ) Thinking of that, the usage of W in R might simply be misleading and it could indeed represent U. I would be happy to have any information about how to obtain the U value from the Mann-Withney test (wilcox.test()) in order to be able to compare it with table of critical U value commonly found. Thanks a lot for your time and help Have a nice day, Chlo?? For your sample sizes you can nicely use the critical value tables that can be found easily on the net. I hope I could help with your problem, if not, please feel free to ask further. Best, Cedric __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] functions and multiple levels
On Wed, 18 Aug 2010, chris20 wrote: Hi, I am trying to write a function; I want to subtract the mean of each class in level 2 from the mean of each class in level 1 and square the answer, eg. level.1 level.2 observation 1 1 0.5 1 1 0.2 1 2 0.6 1 2 0.4 2 3 0.8 2 3 0.7 2 4 0.6 2 4 0.4 (mean(1$level.1) - mean(1$level.2))^2 (mean(1$level.1) - mean(2$level.2))^2 etc. Chris, Almost always best to break things into little pieces, like this: # read data (copy above to clipboard) dat - read.table(clipboard,head=T, +colClasses=c('factor','factor','numeric')) # means of level.1 m1 - coef(lm(observation~0+level.1,dat)) # means of level.2 m2 - coef(lm(observation~0+level.2,dat)) # all differences outer( m1, m2, '-') level.21 level.22 level.23 level.24 level.110.075 -0.075 -0.325 -0.075 level.120.2750.125 -0.1250.125 # all differences squared outer( m1, m2, '-')^2 level.21 level.22 level.23 level.24 level.11 0.005625 0.005625 0.105625 0.005625 level.12 0.075625 0.015625 0.015625 0.015625 HTH, Chuck I want to write a function because I have lots of levels and lots of different observations. I thought this should be easy (it's my first attempt at writing a function) but I just can't work it out! Thanks Chris -- View this message in context: http://r.789695.n4.nabble.com/functions-and-multiple-levels-tp2329935p2329935.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot in cartesian plane
On Wed, 18 Aug 2010, Pablo Cerdeira wrote: Hi all, I'm trying to plot this two curves in a single cartesian plane, but when I plot the first one, the plot appears with no negative y value. When I plot the second curve, it almost does not apear in the graph. I was trying the plot.window but with no success. Can someone help me with this? The answer is in ?par but it requires a bit of deduction. Start with the description of yaxs, which refers to xaxs. Notice what the xaxs paragraph says about xlim. If you can figure out that there is also a ylim arg, then you are pretty much finished. Something like plot( f, -10, 10, ylim=c(-100,100) ) will reveal both curves. xlim and ylim also appear briefly in R-Intro ni Appendix A. FWIW, one can use xlim and ylim to 'zoom in' on a section of a plot. HTH, Chuck If possible, I'd like to plot this curves in a perfect cartesian plane. f = function(x) { x^2 } f2 = function(x) { -x^2 } plot(f,-10,10) abline(h=0, v=0, col = gray60) curve(f2,col=orange, add=T) Thanks in advanced! *pablo de camargo cerdeira* pablo.cerde...@gmail.com | pa...@fgv.br Phone: +55-(21)-3799-6065 [image: Facebook] http://www.facebook.com/pablo.cerdeira[image: LinkedIn]http://br.linkedin.com/in/pablocerdeira[image: Google] http://www.google.com/profiles/pablo.cerdeira [image: Google Talk/]pablo.cerdeira [image: Skype/]pablocerdeira [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function in R?
On Fri, 13 Aug 2010, fishkbob wrote: So I have a bunch of c(start,end) points and want to consolidate them into as few c(start,end) as possible. For example: sample startend A 5 10 B 7 18 C 14 D 16 20 I'd want the function to return the two distinct sets (1,4) and (5,20) Is there an R function that already does this? Yes. See the reduce() function in the IRanges package on BioConductor See pages 11-12 of http://www.bioconductor.org/packages/2.6/bioc/vignettes/IRanges/inst/doc/IRangesOverview.pdf HTH, Chuck or should I write my own? (how would I go about that?) -- View this message in context: http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324684.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading fixed width format data with 2 types of lines
On Thu, 12 Aug 2010, Tim Gruene wrote: I don't know if it's elegant enough for you, but you could split the file into two files with 'grep ^3 file file_3' and 'grep ^4 file file_4' and then read them in separately. along the same lines, but all in R (untested) original.lines - readLines( filename ) tcon.3 - textConnection( grep( ^3, original.lines, value=T )) res.3 - read.fwf( tcon.3, etc ) close(tcon.3) tcon.4 - textConnection( grep( ^4, original.lines, value=T )) res.4 - read.fwf( tcon.4, etc ) close(tcon.4) rm( original.lines ) Or skip the readLines() step and use tcon.3 - pipe(paste(grep '^3',filename)) ... I think you can use 'findstr.exe' on windows in lieu of grep. HTH, Chuck Tim On Thu, Aug 12, 2010 at 01:57:19PM -0400, Denis Chabot wrote: Hi, I know how to read fixed width format data with read.fwf, but suddenly I need to read in a large number of old fwf files with 2 types of lines. Lines that begin with 3 in first column carry one set of variables, and lines that begin with 4 carry another set, like this: ??? 3A00206546L07004901609004599 1015002 001001008010004002004007003 001 3A00206546L07004900609003099 1029001002001001006014002 3A00206546L07004900229000499 1015001001 3A00206546L070049001692559049033 1015 018036024 3A00206546L07004900229000499 1001 002 4A00176546L06804709001011100060651640015001001501063 065914 4A00176546L068047090010111000407616 1092 095614 4A00196546L098000100010111001706214450151062 065914 4A00176546L068047090010111000505913 1062 065914 4A00196546L09800010001011100260472140002001000201042 046114 4A00196546L0980001000101110025042214501200051042 046114 4A00196546L09800010001011100290372140005001220501032 036214 ??? I have searched for tricks to do this but I must not have used the right keywords, I found nothing. I suppose I could read the entire file as a single character variable for each line, then subset for lines that begin with 3 and save this in an ascii file that will then be reopened with a read.fwf call, and do the same with lines that begin with 4. But this does not appear to me to be very elegant nor efficient??? Is there a better method? Thanks in advance, Denis Chabot __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a question regarding updating formulas with coefficients
On Wed, 11 Aug 2010, David Winsemius wrote: On Aug 11, 2010, at 6:03 PM, Jarrett Byrnes wrote: I have formulae with coefficents that I would like to update. However, I get some strange results. For example, see the following: For the formula y ~ d+ 3*r+t Did you really get meaningful results from that formula? Care to provide an example? Maybe there is something more for me to learn. I want to add a variable p, so update(y~d+0*r+t, .~.+p) In formulas the * operator is an interaction creator. so you told R to make 0 + r + 0:r. Probably not what you thought you were doing. So what were you trying to do anyway? produces y ~ d + t + p - 1 Which at least explains why you got the -1 (which in R formulas is that same as +0). If the coefficient is not 0, but rather, something else - say, 3, What do you think you are accomplishing when you put a scalar coefficient in the formula? Maybe he wants ?I ?? Chuck I get the following: update(y~d+3*r+t, .~.+p) Error in terms.formula(tmp, simplify = TRUE) : invalid model formula in ExtractVars Is there a way to do this, or a different call I should be trying? What you should be doing depends on what you want to happen. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a question regarding updating formulas with coefficients
On Wed, 11 Aug 2010, David Winsemius wrote: On Aug 11, 2010, at 6:45 PM, Charles C. Berry wrote: On Wed, 11 Aug 2010, David Winsemius wrote: On Aug 11, 2010, at 6:03 PM, Jarrett Byrnes wrote: I have formulae with coefficents that I would like to update. However, I get some strange results. For example, see the following: For the formula y ~ d+ 3*r+t Did you really get meaningful results from that formula? Care to provide an example? Maybe there is something more for me to learn. I want to add a variable p, so update(y~d+0*r+t, .~.+p) In formulas the * operator is an interaction creator. so you told R to make 0 + r + 0:r. Probably not what you thought you were doing. So what were you trying to do anyway? produces y ~ d + t + p - 1 Which at least explains why you got the -1 (which in R formulas is that same as +0). If the coefficient is not 0, but rather, something else - say, 3, What do you think you are accomplishing when you put a scalar coefficient in the formula? Maybe he wants ?I Possibly, but wouldn't that just result in a deflation by a factor of 3 of the estimated coefficient if you wrappedI around that rem ... I(3*r) ? Quite so. It is just one way (and I am not touting it) of rescaling a variable. I also wondered if he might need to be referred to: ?offset ?? ?? back 'atcha. My '??' was intended for the OP, but your guess seems good to me. Chuck -- David. Chuck I get the following: update(y~d+3*r+t, .~.+p) Error in terms.formula(tmp, simplify = TRUE) : invalid model formula in ExtractVars Is there a way to do this, or a different call I should be trying? What you should be doing depends on what you want to happen. -- David David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package for measurement error models
On Sat, 7 Aug 2010, Carrie Li wrote: Hi,all, I posted this question couple of days again, but haven't gotten any answers back. I would like to post it again, and if you have any ideas, please let me know. Any helps and suggestions are very much appreciated. Start by reading the Posting Guide. It will tell you to *) Do RSiteSearch(keyword) with different keywords (at the R prompt) to search R functions, contributed packages and R-Help postings. and RSiteSearch(measurement error) gives a number of hits that might be of interest. If these hits do not pan out, you should let us know why none of the available packages that have 'measurement error' in their descriptions satisfy your needs. HTH, Chuck The problem is about linear regression with both y and x have measurement, and the variance of errors are heterogeneous. The estimated regression coefficient and its variance are of interest. Is any R package doing this task ? Thank you Carrie-- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] discrete ECDF
On Wed, 4 Aug 2010, David Winsemius wrote: Dear list; I just created a utility function that replicates what I have done in the past with Excel or OO.org by putting a formula of the form =sum($A1:A$1) in an upper-corner of a section and then doing a fill procedure by dragging the lower-rt corner down and to the right. When divided by the grand sum of the entries this function then calculates a 2D-discrete-ECDF. I keep thinking I am missing the obvious, but I did try searching. Here is my effort at creating that functionality: ecdf.tbl - function (.dat) { .dat - data.matrix(.dat) #speeds up calculations .sdat - matrix(0, nrow(.dat), ncol(.dat) ) .sdat[] - sapply(1:ncol(.dat), function(x) sapply(1:nrow(.dat), function(y) sum(.dat[1:y, 1:x], na.rm=TRUE ) ) ) return(.sdat) } ecdf.tbl2 - function(mat) { mat[is.na(mat)] - 0 t( apply( apply( mat,2, cumsum ), 1, cumsum ))} HTH, Chuck tst - read.table(textConnection(NA 5 6 4 5 7 5 6 8 6 7 9 NA 8 NA) ) tst V1 V2 V3 1 NA 5 6 2 4 5 7 3 5 6 8 4 6 7 9 5 NA 8 NA ecdf.tbl(tst) [,1] [,2] [,3] [1,]05 11 [2,]4 14 27 [3,]9 25 46 [4,] 15 38 68 [5,] 15 46 76 ecdf.tbl(tst)/sum(tst, na.rm=TRUE) [,1] [,2] [,3] [1,] 0. 0.06578947 0.1447368 [2,] 0.05263158 0.18421053 0.3552632 [3,] 0.11842105 0.32894737 0.6052632 [4,] 0.19736842 0.5000 0.8947368 [5,] 0.19736842 0.60526316 1.000 Did I miss a more compact vectorized or sweep()-ed solution? (I realize this is not really a function in the sense that ecdf() is.) I have seen prop.table and margin.table, but could not see how they would address this problem. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with indicators for switch
On Mon, 2 Aug 2010, Roy Davy wrote: Hi, I am having some problems setting up some indicators and would appreciate any help. I have some data called 'lights' with 3 variables called x, a and b. x - is the date a - equals 1 to indicate an 'on' button is activated b - equals 1 to indicate an 'off' button is activated Essentially i wannt to create 2 new variables c and d c - will reflect the current state of the light (1 being on) d - will be a count for how many days the light has been on here's some data with the date omitted to illustrate what i have and what is required. a b c d 0 0 0 0 0 0 0 0 1 0 1 1 1 0 1 2 0 0 1 3 0 0 1 4 1 0 1 5 0 0 1 6 0 0 1 7 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 1 2 0 0 1 3 0 0 1 4 0 1 0 0 0 0 0 0 After some considerable time i have managed to create variable c with a loop but it's really slow with the volume of data i have. Could anyone please show me how to do this efficiently? Two solutions: 1) inline - if you are configured to compile source, use the inline package to render your loop as C or Fortran 2) findInterval - like this: on.pos - which(a==1) off.pos - which(b==1) last.on - c(0,on.pos)[ 1 + findInterval(1:20,on.pos)] last.off - c(0,off.pos)[ 1 + findInterval(1:20,off.pos)] cee - as.integer(last.onlast.off) Don't use 'c' as a variable name as it is a common function HTH, Chuck I hope this is clear... Thanks Roy __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lognormal distribution - Range Factor
On Sun, 1 Aug 2010, ted.hard...@manchester.ac.uk wrote: On 01-Aug-10 10:59:03, Tims Corbett wrote: Hi, What does it mean to say Lognormal distribution with a mean of 1.03E-6 with a range factor of 100 ? How can I find the lognormal distribution paramters from this information? Thanks, Tims If you can, please say what is meant by range factor. I have seen it used to denote the ratio maximum/minimum for a sample, but it might be intended to mean something else in the context of your question. Tim, you should learn to use Google to answer this question instead of sending postings like this to lots of differnt mailing lists. Googling lognormal range factor easily turns up the definition of range factor as a function of the quantiles of the distribution. FWIW, Range factor looks like a term of art used in certain circles. RSiteSearch'ing (or just ??lognormal, for crying out loud!) turns up everything else needed to answer this, viz the R functions to find the quantiles, and even the relation between the mean of the lognormal and its parameters. I think a little bit of algebra makes it possible to solve this directly from tables of the normal (or lognormal) even without having to resort to uniroot(). Is this homework? HTH, Chuck p.s. can the posting guide advise that OT questions like this be preceeded by a suitable amount of Googling before coming here?? If it does mean that, then the context is a sample of values, deemed to be log-normally distributed, with (presumably) the sample mean equal to 1.03E-6 (0.0103) and max/min=100. However this would be really inadequate information to determine what the parameters of the log-normal distribution might be. At the very least one needs also the sample size, and even then the determination would not be reliable. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 01-Aug-10 Time: 12:33:45 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help splitting a data frame
On Thu, 29 Jul 2010, kayj wrote: Hi All, I have a dataset that I would like to split based on : or – ( the data file is tab delimited) for example: Ny:23-45AC BA:88-91DB KJ:21-13PA And I would like the data to be splitted and the final results look like NY 23 45 AC BA 88 91 DB KJ 21 13 PA read.table(textConnection(gsub([-:],\t,readLines(your.data.file.tab or read.table(textConnection(chartr(-:,\t\t,readLines(your.data.file.tab HTH, Chuck I would like to have the resulting data as a data frame so each column is a variable. Thanks, -- View this message in context: http://r.789695.n4.nabble.com/help-splitting-a-data-frame-tp2306832p2306832.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival analysis MLE gives NA or enormous standard errors
On Tue, 27 Jul 2010, Christopher David Desjardins wrote: Hi Charles, On Fri, 2010-07-23 at 14:40 -0700, Charles C. Berry wrote: On Fri, 23 Jul 2010, Christopher David Desjardins wrote: Sorry. I should have included some data. I've attached a subset of my data (50/192) cases in a Rdata file and have pasted it below. Running anova I get the following: anova(sr.reg.s4.nore) Df Deviance Resid. Df-2*LL P(|Chi|) NULL NA NA45 33.89752NA as.factor(lifedxm) 2 2.43821143 31.45931 0.2954943 That would just be an omnibus test right and should that first NULL NA line be worrisome? What if I want to test specifically that CONTROL and BIPOLAR were different and that MAJOR DEPRESSION and BIPOLAR were different? You wrote: Construct a likelikehood ratio test for each hypothesis by fitting three models - two containing each term and one containing both - and comparing each simpler model to the fuller model. Would that be recoding lifedxm (presently a dummy variable where 0 - BIPOLAR, 1 - CONTROL, and 2 - MAJOR DEPRESSED) as three seperate variables: CONTROL (0 - No, 1 - Yes), BIPOLAR (0 - N0, 1 - Yes), and MAJOR DEPRESSED (0 - No, 1 - Yes) and then comparing the following models with anova()? CONTROL + BIPOLAR to MAJOR CONTROL + MAJOR to BIPOLAR I am sorry I am just a little confused. Basically I want to know if BIPOLAR is a higher risk than MAJOR and CONTROL and if MAJOR is a higher risk than CONTROL. It would help communication if you would use a standard notation like R. The meaning of pseudocodes tends to be a bit fuzzy. And from far below, it seems you have a factor called lifedxm not a 'dummy variable' in the sense that that term is often used. Here are three models defined using the formula language: frm1 - Surv(age_sym4, sym4) ~ I(lifedxm==MAJOR) frm2 - Surv(age_sym4, sym4) ~ I(lifedxm==BIPOLAR) frm3 - Surv(age_sym4, sym4) ~ I(lifedxm==MAJOR) + I(lifedxm==BIPOLAR) The model of frm1 is nested under that of frm3. The model of frm2 is nested under that of frm3. anova( frm1, frm3 ) will report on the effect of adding I(lifedxm==BIPOLAR) to frm1. i.e. it will give the increase in likelihood associated with that term. anova( frm2, frm3 ) will report on the effect of adding I(lifedxm==MAJOR) to frm2. Chuck Thank you very much for all your help, Chris I'll look at Hauck-Donner effect. Thanks, Chris bip.surv.s age_sym4 sym4 lifedxm 1 16.128680 MAJOR 2 19.326490 MAJOR 3 16.550310 CONTROL 4 19.367560 CONTROL 5 16.090350 MAJOR 6 21.505820 MAJOR 7 16.361400 MAJOR 8 20.572210 MAJOR 9 16.457220 CONTROL 10 19.945240 CONTROL 11 15.791920 MAJOR 12 20.766600 MAJOR 13 16.150580 BIPOLAR 14 19.258040 BIPOLAR 15 17.363450 MAJOR 16 21.180010 MAJOR 17 NA0 BIPOLAR 18 NA0 BIPOLAR 19 16.317591 MAJOR 20 18.297060 MAJOR 21 16.407940 MAJOR 22 19.137580 MAJOR 23 16.194390 CONTROL 24 21.368930 CONTROL 25 15.890490 CONTROL 26 18.997950 CONTROL 27 NA0 BIPOLAR 28 18.904860 BIPOLAR 29 16.364130 MAJOR 30 20.427100 MAJOR 31 16.659820 MAJOR 32 19.457910 MAJOR 33 16.643390 CONTROL 34 19.400410 CONTROL 35 15.373031 BIPOLAR 36 19.838470 BIPOLAR 37 15.422311 MAJOR 38 19.370290 MAJOR 39 15.069130 MAJOR 40 17.815200 MAJOR 41 15.504450 BIPOLAR 42 17.921970 BIPOLAR 43 15.345650 CONTROL 44 18.075290 CONTROL 45 15.594800 CONTROL 46 19.674200 CONTROL 47 14.789870 MAJOR 48 20.054760 MAJOR 49 14.787130 MAJOR 50 19.868580 MAJOR On Fri, 2010-07-23 at 11:52 -0700, Charles C. Berry wrote: On Fri, 23 Jul 2010, Christopher David Desjardins wrote: Hi, I am trying to fit the following model: sr.reg.s4.nore - survreg(Surv(age_sym4,sym4), as.factor(lifedxm), data=bip.surv) Next time include a reproducible example. i.e. something we can run. Now, Google Hauck Donner Effect to understand why anova(sr.reg.s4.nore) is preferred. Chuck Where age_sym4 is the age that a subject develops clinical thought problems; sym4 is whether they develop clinical thoughts problems (0 or 1); and lifedxm is mother's diagnosis: BIPOLAR, MAJOR DEPRESSION, or CONTROL. I am interested in whether or not survival differs by this covariate. When I run my model, I am getting the following output: summary(sr.reg.s4.nore) Call: survreg(formula = Surv(age_sym4, sym4) ~ as.factor(lifedxm), data = bip.surv) Value Std. Error z p (Intercept)4.037 0.455 8.86643 0.00755 as.factor(lifedxm)CONTROL 14.844 4707.383 0.00315 0.997484052845082791450 as.factor(lifedxm)MAJOR0.706 0.447 1.58037 0.114022774867277756905 Log(scale)-0.290 0.267 -1.08493 0.277952437474223823521 Scale
Re: [R] Fwd: Questions about templates for R
On Mon, 26 Jul 2010, Alon Friedman wrote: Hi I am looking for R templates to introduce the R to my students at Seton hall university. The templates are predefined scripts in R that will retain its primary intent when individually customized with their own variable data or text. Well the help pages for many (most?) R functions contain runnable examples. example( image ) is one such. If you want you students to perform a linear regression (say), having them read a bit about it, then start R, run example( lm ) then read through the input and output and display the help pages for functions that are called is not a bad way to start. There are BTW numerous books listed at http://www.r-project.org/doc/bib/R-publications.html which have plenty of runnable code in them. And there are as well as online resources accessible under 'Documnetation' at http://www.r-project.org/ HTH, Chuck In this case, my students at Seton Hall University. For example, PSPad editor provides new users predefined templates before writing their own scripts. Please let me know if it makes more sense. Yours AF Alon Friedman, PhD New York, NY 10014 Phone: 212-645-1538 -- Yours AF Alon Friedman, PhD New York, NY 10014 Phone: 212-645-1538 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave and scan()
On Tue, 27 Jul 2010, Murray Jorgensen wrote: I am introducing the scan() function to my class. Consider the following file (Scanexamp.Rnw ) \documentclass[12pt]{article} \begin{document} = height = scan() 64 62 66 65 62 69 72 72 70 part = scan(what = character(0)) Soprano Soprano Soprano AltoAltoTenor Tenor BassBass sh = data.frame(height, part) sh @ \end{document} Now what happens when I attempt to Sweave this is Sweave(scanexamp.Rnw) Writing to file scanexamp.tex Processing code chunks ... 1 : echo term verbatim Error: chunk 1 Error in parse(text = chunk) : unexpected numeric constant in: height = scan() 64 62 Right. Sweave is trying to parse the whole chunk. It cannot parse 64 62 66 65 62. (And the command line cannot parse it either - try typing it at the R prompt.) If you put each number on a separate line, Sweave will parse it, but when scan() runs, it will prompt for input and accept it from STDIN just as when run from the command line. Which probably isn't what you want. I'd guess the path of least resistance is to have a bit of deception. Use two chunks - one like that above but with eval=F and another with eval=T,echo=F with code like this tcon - textConnection( 64 62 66 65 62 69 72 72 70 ) height = scan(tcon) close(tcon) ... If the deception doesn't please you, then use a file as in example( scan ) to illustrate scan() HTH, Chuck Comments would be appreciated. (And thanks to Ross Darnell for a lot of help on another list.) Cheers, Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: m...@waikato.ac.nzFax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441 Mobile 021 0200 8350 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loading Rdata files in a Package
On Sun, 25 Jul 2010, Andrew Leeser wrote: I have an .Rdata file saved in my data folder in a package I created. However, when I use the data() function it doesn't recognize that there any dataset exists within the package. Is there anything specific I need to do such that the data() function will load my dataset? No reproducible example was provided. Did you provide a name for the .Rdata file? e.g. mydat.RData Run example( package.skeleton ) then edit the 'mypkg' Rd files to put add dummy \title{} entries. Then INSTALL, then run R, library(mypkg) data(d) str(d) . . . Emulate what you see in mypkg/* Or use package.skeleton() to get you started. HTH, Chuck Thanks in advance, Andrew [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] , Updating Table
On Fri, 23 Jul 2010, Marcus Liu wrote: Hi everyone, Is there any command for updating table withing a loop?? Loops? We don't need no stinking loops! (From 'The Good, the Bad, and the Rgly') tab - table(data.raw, findInterval(seq(along=data.raw), ind+1 ) ) tab %*% upper.tri(tab,diag=T) or tab2 - tapply( factor(data.raw), findInterval(seq(along=data.raw), ind+1 ), table) Reduce( +, tab2, accum=TRUE ) HTH, Chuck p.s. See the posting guide re including a reproducible example with requests like yours. For instance, at i, I have a table as ZZ = table(data.raw[1:ind[i]]) where ind = c(10, 20, 30, ...).?Then , ZZ will be as follow A B C ?3??? 10?? 2 At (i + 1), ZZ = table(data.raw[(ind[i]+1):ind[i+1]]) A B D ?4 ?? 7??? 8 Is there any command that can update the table ZZ for each time so that in the above example, ZZ will be A B C D ?7??? 17?? 2??? 8 Thanks. liu [[alternative HTML version deleted]] Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival analysis MLE gives NA or enormous standard errors
On Fri, 23 Jul 2010, Christopher David Desjardins wrote: Hi, I am trying to fit the following model: sr.reg.s4.nore - survreg(Surv(age_sym4,sym4), as.factor(lifedxm), data=bip.surv) Next time include a reproducible example. i.e. something we can run. Now, Google Hauck Donner Effect to understand why anova(sr.reg.s4.nore) is preferred. Chuck Where age_sym4 is the age that a subject develops clinical thought problems; sym4 is whether they develop clinical thoughts problems (0 or 1); and lifedxm is mother's diagnosis: BIPOLAR, MAJOR DEPRESSION, or CONTROL. I am interested in whether or not survival differs by this covariate. When I run my model, I am getting the following output: summary(sr.reg.s4.nore) Call: survreg(formula = Surv(age_sym4, sym4) ~ as.factor(lifedxm), data = bip.surv) Value Std. Error z p (Intercept)4.037 0.455 8.86643 0.00755 as.factor(lifedxm)CONTROL 14.844 4707.383 0.00315 0.997484052845082791450 as.factor(lifedxm)MAJOR0.706 0.447 1.58037 0.114022774867277756905 Log(scale)-0.290 0.267 -1.08493 0.277952437474223823521 Scale= 0.748 Weibull distribution Loglik(model)= -76.3 Loglik(intercept only)= -82.6 Chisq= 12.73 on 2 degrees of freedom, p= 0.0017 Number of Newton-Raphson Iterations: 21 n=186 (6 observations deleted due to missingness) I am concerned about the p-value of 0.997 and the SE of 4707. I am curious if it has to do with the fact that the CONTROL group doesn't have a mixed response, meaning that all my subjects do not develop clinical levels of thought problems and subsequently 'survive'. table(bip.surv$sym4,bip.surv$lifedxm) BIPOLAR CONTROL MAJOR 0 41 6078 1 7 0 6 Is there some sort of way that I can overcome this? Is my model misspecified? Is this better suited to be run as a Bayesian model using priors to overcome the lack of a mixed response? Also, please cc me on an email as I am a digest subscriber. Thanks, Chris -- Christopher David Desjardins PhD student, Quantitative Methods in Education MS student, Statistics University of Minnesota 192 Education Sciences Building http://cddesjardins.wordpress.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival analysis MLE gives NA or enormous standard errors
On Fri, 23 Jul 2010, Christopher David Desjardins wrote: Sorry. I should have included some data. I've attached a subset of my data (50/192) cases in a Rdata file and have pasted it below. Running anova I get the following: anova(sr.reg.s4.nore) Df Deviance Resid. Df-2*LL P(|Chi|) NULL NA NA45 33.89752NA as.factor(lifedxm) 2 2.43821143 31.45931 0.2954943 That would just be an omnibus test right and should that first NULL NA line be worrisome? What if I want to test specifically that CONTROL and BIPOLAR were different and that MAJOR DEPRESSION and BIPOLAR were different? Construct a likelikehood ratio test for each hypothesis by fitting three models - two containing each term and one containing both - and comparing each simpler model to the fuller model. I'll look at Hauck-Donner effect. Thanks, Chris bip.surv.s age_sym4 sym4 lifedxm 1 16.128680 MAJOR 2 19.326490 MAJOR 3 16.550310 CONTROL 4 19.367560 CONTROL 5 16.090350 MAJOR 6 21.505820 MAJOR 7 16.361400 MAJOR 8 20.572210 MAJOR 9 16.457220 CONTROL 10 19.945240 CONTROL 11 15.791920 MAJOR 12 20.766600 MAJOR 13 16.150580 BIPOLAR 14 19.258040 BIPOLAR 15 17.363450 MAJOR 16 21.180010 MAJOR 17 NA0 BIPOLAR 18 NA0 BIPOLAR 19 16.317591 MAJOR 20 18.297060 MAJOR 21 16.407940 MAJOR 22 19.137580 MAJOR 23 16.194390 CONTROL 24 21.368930 CONTROL 25 15.890490 CONTROL 26 18.997950 CONTROL 27 NA0 BIPOLAR 28 18.904860 BIPOLAR 29 16.364130 MAJOR 30 20.427100 MAJOR 31 16.659820 MAJOR 32 19.457910 MAJOR 33 16.643390 CONTROL 34 19.400410 CONTROL 35 15.373031 BIPOLAR 36 19.838470 BIPOLAR 37 15.422311 MAJOR 38 19.370290 MAJOR 39 15.069130 MAJOR 40 17.815200 MAJOR 41 15.504450 BIPOLAR 42 17.921970 BIPOLAR 43 15.345650 CONTROL 44 18.075290 CONTROL 45 15.594800 CONTROL 46 19.674200 CONTROL 47 14.789870 MAJOR 48 20.054760 MAJOR 49 14.787130 MAJOR 50 19.868580 MAJOR On Fri, 2010-07-23 at 11:52 -0700, Charles C. Berry wrote: On Fri, 23 Jul 2010, Christopher David Desjardins wrote: Hi, I am trying to fit the following model: sr.reg.s4.nore - survreg(Surv(age_sym4,sym4), as.factor(lifedxm), data=bip.surv) Next time include a reproducible example. i.e. something we can run. Now, Google Hauck Donner Effect to understand why anova(sr.reg.s4.nore) is preferred. Chuck Where age_sym4 is the age that a subject develops clinical thought problems; sym4 is whether they develop clinical thoughts problems (0 or 1); and lifedxm is mother's diagnosis: BIPOLAR, MAJOR DEPRESSION, or CONTROL. I am interested in whether or not survival differs by this covariate. When I run my model, I am getting the following output: summary(sr.reg.s4.nore) Call: survreg(formula = Surv(age_sym4, sym4) ~ as.factor(lifedxm), data = bip.surv) Value Std. Error z p (Intercept)4.037 0.455 8.86643 0.00755 as.factor(lifedxm)CONTROL 14.844 4707.383 0.00315 0.997484052845082791450 as.factor(lifedxm)MAJOR0.706 0.447 1.58037 0.114022774867277756905 Log(scale)-0.290 0.267 -1.08493 0.277952437474223823521 Scale= 0.748 Weibull distribution Loglik(model)= -76.3 Loglik(intercept only)= -82.6 Chisq= 12.73 on 2 degrees of freedom, p= 0.0017 Number of Newton-Raphson Iterations: 21 n=186 (6 observations deleted due to missingness) I am concerned about the p-value of 0.997 and the SE of 4707. I am curious if it has to do with the fact that the CONTROL group doesn't have a mixed response, meaning that all my subjects do not develop clinical levels of thought problems and subsequently 'survive'. table(bip.surv$sym4,bip.surv$lifedxm) BIPOLAR CONTROL MAJOR 0 41 6078 1 7 0 6 Is there some sort of way that I can overcome this? Is my model misspecified? Is this better suited to be run as a Bayesian model using priors to overcome the lack of a mixed response? Also, please cc me on an email as I am a digest subscriber. Thanks, Chris -- Christopher David Desjardins PhD student, Quantitative Methods in Education MS student, Statistics University of Minnesota 192 Education Sciences Building http://cddesjardins.wordpress.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http
Re: [R] (no subject)
On Sun, 11 Jul 2010, li li wrote: Hi all, I want to add the red color under the standard normal curve to the right of 1.96. Can anyone give me a hand? Please see the code below. Thank you. x - seq(-4, 4, length=100) hx - dnorm(x) par(pty=s) plot(x, hx, type=l, xlab=z value, ylab=Density, main=density of N(0,1)) abline(v=1.96, col=red) ?polygon example(polygon) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weired problem when passing arguments using ...?
On Wed, 7 Jul 2010, thmsfuller...@gmail.com wrote: Hello All, I'm trying to pass the argument col.names to write.csv using '...'. But I got the following warnings. Maybe it is very simple. But I'm not sure what I am wrong. Could you please help point to me what the problem is? Its not a _problem_, its a _feature_. read ?write.csv and use write.table() # fun=function(x, ...) { fr=parent.frame() tmp=get(x, envir=fr) write.csv( tmp , file=paste(x, '.csv', sep='') , ... ) } f=data.frame(x=1:10,y=letters[1:10]) fun('f', col.names=F) fun('f', col.names=F) Warning message: In write.csv(tmp, file = paste(x, .csv, sep = ), ...) : attempt to set 'col.names' ignored -- Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] forcing a zero level in contr.sum
mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I test against a simple null that two regressions coefficients are equal?
On Wed, 7 Jul 2010, chen jia wrote: Hi there, I run two regressions: y = a1 + b1 * x + e1 y = a2 + b2 * z + e2 I want to test against the null hypothesis: b1 = b2. How do I design the test? You are testing a non-nested hypothesis, which requires special handling. The classical test is due to Hotelling, but see the references (and R code snippets) in this posting: http://markmail.org/message/egnowmdzpzjtahy7 (it is the merest coincidence that the above thread was initiated by Mark Leeds and that the URL is 'markmail' :-) ) HTH, Chuck I think I can add two equations together and divide both sides by 2: y = 0.5*(a1+a2) + 0.5*b1 * x + 0.5*b2 * z + e3, where e3 = 0.5*(e1 + e2). or just y = a3 + 0.5*b1 * x + 0.5*b2 * z + e3 If I run this new regression, I can test against the null b1 = b2 in this regression. Is it an equivalent test as the original one? If yes, how do I do that in R? Alternatively, I think I can just test against the null: correlation(y, x) = correlation(y, z), where correlation(. , .) is the correlation between two random variables. Is this equivalent too? If yes, how do I do it in R? Thanks. Best, Jia -- Ohio State University - Finance 248 Fisher Hall 2100 Neil Ave. Columbus, Ohio 43210 Telephone: 614-292-2830 http://www.fisher.osu.edu/~chen_1002/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotmath vector problem; full program enclosed
On Tue, 6 Jul 2010, David Winsemius wrote: On Jul 6, 2010, at 1:41 PM, Duncan Murdoch wrote: On 06/07/2010 10:54 AM, Paul Johnson wrote: Here's another example of my plotmath whipping boy, the Normal distribution. A colleague asks for a Normal plotted above a series of axes that represent various other distributions (T, etc). I want to use vectors of equations in plotmath to do this, but have run into trouble. Now I've isolated the problem down to a relatively small piece of working example code (below). If you would please run this, I think you will see the problem. When plotmath meets one vector of expressions, it converts all but one to math, so in the figure output i get, in LaTeX speak b1 $\mu-1.0 \sigma$$\mu$ All values except the first come out correctly. This happens only when I try to use bquote or substitute to get variables to fill in where the 1.96, 1.0, and so forth should be. In the figure output, you should see a second axis where all of the symbols are resolved correctly. As usual, thanks in advance for your help, sorry if I've made an obvious mistake or overlooked a manual. ### Filename: plotMathProblem.R ### Paul Johnson July 5, 2010 ### email me paulj...@ku.edu sigma - 10.0 mu - 4.0 myx - seq( mu - 3.5*sigma, mu+ 3.5*sigma, length.out=500) myDensity - dnorm(myx,mean=mu,sd=sigma) ### xpd needed to allow writing outside strict box of graph ### Need big bottom margin to add several x axes par(xpd=TRUE, ps=10, mar=c(18,2,2,2)) plot(myx, myDensity, type=l, xlab=, ylab=Probability Density , main=myTitle1, axes=FALSE) axis(2, pos= mu - 3.6*sigma) axis(1, pos=0) lines(c(myx[1],myx[length(myx)]),c(0,0)) ### closes off axes addInteriorLine - function(x, m, sd){ for (i in 1:(length(x))){ lines( c(x[i],x[i]), c(0, dnorm(x[i],m=m,sd=sd)), lty= 14, lwd=.2) } } dividers - c(qnorm(0.025), -1, 0, 1, qnorm(0.975)) addInteriorLine(mu+sigma*dividers, mu,sigma) # bquote creates an expression that text plotters can use t1 - bquote( mu== .(mu)) mtext(bquote( mu == .(mu)), 1, at=mu, line=-1) addInteriorLabel - function(pos1, pos2, m, s){ area - abs(100*( pnorm(m+pos1*s,m,s)-pnorm(m+pos2*s, m,s))) mid - m+0.5*(pos1+pos2)*s text(mid, 0.5*dnorm(mid,m,s),label=paste(round(area,2),%)) } addInteriorLabel(dividers[1],dividers[2], mu, sigma) addInteriorLabel(dividers[2],dividers[3], mu, sigma) addInteriorLabel(dividers[3],dividers[4], mu, sigma) addInteriorLabel(dividers[4],dividers[5], mu, sigma) ### Following is problem point: axis will ### end up with correct labels, except for first point, ### where we end up with b1 instead of mu - 1.96*sigma. b1 - substitute( mu - d*sigma, list(d=round(dividers[1],2)) ) b2 - substitute( mu - sigma ) b3 - substitute( mu ) b4 - substitute( mu + sigma ) b5 - substitute( mu + d*sigma, list(d=round(dividers[5],2)) ) ## plot(-20:50,-20:50,type=n,axes=F) axis(1, line=4,at=mu+dividers*sigma, labels=c(expression(b1),b2,b3,b4,b5), padj=-1) You want as.expression(b1), not expression(b1). The latter means the expression consisting of the symbol b1. The former means take the object stored in b1, and convert it to an expression.. It's not perfect, because you'll end up with mu - -1.96sigma (i.e. two minus signs), but it's closer than what you had. Easily addressed in this case with ~ instead of -. The value of d provides the minus: b1 - substitute( mu ~ d*sigma, list(d=round(dividers[1],2)) ) But if d = 0, there will be no sign so maybe use b1 - substitute( mu ~ d*sigma, list(d=sprintf(%+.2f, dividers[1]))) b5 - substitute( mu ~ d*sigma, list(d=sprintf(%+.2f, dividers[5]))) etc. Chuck Duncan Murdoch ### This gets right result but have to hard code the dividers b1 - expression( mu - 1.96*sigma ) b2 - expression( mu - sigma ) b3 - expression( mu ) b4 - expression( mu + sigma ) b5 - expression( mu + 1.96*sigma ) axis(1, line=8,at=mu+dividers*sigma, labels=c(b1,b2,b3,b4,b5), padj=-1) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901