Re: [R] mlogit and model-based recursive partitioning
Tudor: Has anyone tried to model-based recursive partition (using mob from package party; thanks Achim and colleagues) a data set based on a multinomial logit model (using mlogit from package mlogit; thanks Yves)? Interesting question: in principle, this is possible but I wouldn't know of anyone who has tried this. I attempted to do so, but there are at least two reasons why I could not. First, in mob I am not quite sure that a model of class StatModel exists for mlogit models. Second, as mlogit uses the pipe character | to specify the model, I wonder how this would interact with mob which uses pipe to differentiate between explanatory and segmentation variables. This is one but not the only complication when trying to actually combine mlogit and mob. I think the building blocks would have to be: - Set up the data plus formula handling. As you point out, that would need a three-part formula separating alternative-specific and subject-specific regressors and partitioning variables. Furthermore you would probably need to translate between the long format used by mlogit (subjects x alternatives) to the wide format because mob would want to partition the subjects. - A StatModel object would be required. Personally, if I wanted to do it, would try to set up the StatModel object on the fly (rather than predefine it in a package) so that the StatModel creator can depend on the formula/data. The formula/data processing described above can be done inside the StatModel object. - Finally, the required methods for the fitted model object would have to be defined. In particular, the subject-specific gradients would be required. I think currently, mlogit just provides the overall gradient. So, in summary: It can be done but it would likely need more than just an hour of coding... hth, Z An example (not working) of what I would like to accomplish follows below. Thanks a lot. Tudor library(party) library(mlogit) data(Fishing, package = mlogit) Fish - mlogit.data(Fishing, varying = c(2:9), shape = wide, choice = mode) # FIT AN mlogit MODEL m1 - mlogit(mode ~ price + catch, data=Fish) # THE DESIRED END RESULT: SEGMENT m1 BASED ON INCOME AND/OR OTHER POSSIBLE COVARIATES -- View this message in context: http://r.789695.n4.nabble.com/mlogit-and-model-based-recursive-partitioning-tp4644743.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Basic question about: - and method start with dot.
On 02/10/12 17:00, Fabrice Tourre wrote: Dear list, When I read some source code, I find lot of place used symbol - , e.g. lastTime - newTime; What is the meaning here? ?- See also: require(fortunes) fortune(-) Also, I find some method with the name start with dot, e.g. .RowStandardizeCentered = function(x) { div = sqrt( rowSums(x^2) ); div[ div == 0 ] = 1; return( x/div ); } What is the special meaning for the method name start with a dot? ?ls Note the argument all.names. cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Transform pairwise observations into a table
HI David, Thanks for your input. Third and fourth solutions looks very interesting. with(dat1,tapply(coef,list(ind1,ind2),function(x) x)) # also gave the same result. # I tried with aggregate() and ddply(), but may be not as elegant solution as yours. matrix(with(dat1,aggregate(coef,list(ind1,ind2),FUN=function(x) x)[,3]),nc=4) library(plyr) matrix(ddply(dat1,.(ind1,ind2),FUN=function(x) x$coef)[,3],nc=4) # [,1] [,2] [,3] [,4] #[1,] 1.000 0.250 0.125 0.5 #[2,] 0.250 1.000 0.125 0.5 #[3,] 0.125 0.125 1.000 0.5 #[4,] 0.500 0.500 0.500 1.0 A.K. - Original Message - From: David Winsemius dwinsem...@comcast.net To: arun smartpink...@yahoo.com Cc: Hadjixenofontos, Athena ahadjixenofon...@med.miami.edu; R help r-help@r-project.org; Bert Gunter gunter.ber...@gene.com Sent: Monday, October 1, 2012 10:33 PM Subject: Re: [R] Transform pairwise observations into a table On Oct 1, 2012, at 2:30 PM, arun wrote: HI AHJ, No problem. One more way in addition to reshape() (Rui's suggestion) to get the same result. library(reshape) as.matrix(cast(melt(dat1,id=c(ind1,ind2)),ind1~ind2,value=value)) # 1 2 3 4 #1 1.000 0.250 0.125 0.5 #2 0.250 1.000 0.125 0.5 #3 0.125 0.125 1.000 0.5 #4 0.500 0.500 0.500 1.0 A.K. That looks a tad ... well, ... complicated. So perhaps these base-only solutions with tapply might be more accessible: Some of them do border on the whimsical, I will admit: with (dat1, tapply(coef, list(ind1,ind2), I)) with (dat1, tapply(coef, list(ind1,ind2), c)) with (dat1, tapply(coef, list(ind1,ind2), ^, 1)) with (dat1, tapply(coef, list(ind1,ind2), +, 0)) It is a specific response to the request for a `table`-like function tha twouldallow the application of other functions. Cnage the `1` to `2` in the third instance and you get the tabulated squares. And do not forget the availability of `ftable` to flatten the output of `tapply` retunred values. -- David. - Original Message - From: Hadjixenofontos, Athena ahadjixenofon...@med.miami.edu To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Monday, October 1, 2012 12:59 PM Subject: Re: [R] Transform pairwise observations into a table Thank you. I had looked at xtabs but misunderstood the syntax. This is great. :) AHJ On Oct 1, 2012, at 12:53 PM, arun smartpink...@yahoo.com wrote: Hi, Try this: dat1-read.table(text= ind1 ind2 coef 1 1 1 1 2 0.25 1 3 0.125 1 4 0.5 2 2 1 2 1 0.25 2 3 0.125 2 4 0.5 3 3 1 3 1 0.125 3 2 0.125 3 4 0.5 4 4 1 4 1 0.5 4 2 0.5 4 3 0.5 ,sep=,header=TRUE) mat1-as.matrix(xtabs(coef~ind1+ind2,data=dat1)) # ind2 #ind1 1 2 3 4 # 1 1.000 0.250 0.125 0.500 #2 0.250 1.000 0.125 0.500 #3 0.125 0.125 1.000 0.500 #4 0.500 0.500 0.500 1.000 A.K. - Original Message - From: AHJ ahadjixenofon...@med.miami.edu To: r-help@r-project.org Cc: Sent: Monday, October 1, 2012 12:17 PM Subject: [R] Transform pairwise observations into a table Hi, I have a table of pairs of individuals and a coefficient that belongs to the pair: ind1 ind2 coef 1 1 1 1 2 0.25 1 3 0.125 1 4 0.5 2 2 1 2 1 0.25 2 3 0.125 2 4 0.5 3 3 1 3 1 0.125 3 2 0.125 3 4 0.5 4 4 1 4 1 0.5 4 2 0.5 4 3 0.5 And I want to convert it to a matrix where each individual is both a row and a column and at the intersection of each pair is the coefficient that belongs to that pair: 1 2 3 4 1 1 0.25 0.125 0.5 2 0.25 1 0.125 0.5 3 0.125 0.125 1 0.5 4 0.5 0.5 0.5 1 If table() would allow me to specify something other than frequencies to fill the table with, it would be what I need. I tried a few different combinations of t() and unique() but none of it made enough sense to post as my starting code... I am just lost. Any help would be greatly appreciated. Thank you, AHJ -- View this message in context: http://r.789695.n4.nabble.com/Transform-pairwise-observations-into-a-table-tp4644706.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA __
Re: [R] Basic question about: - and method start with dot.
On Oct 2, 2012, at 06:00 , Fabrice Tourre wrote: Dear list, When I read some source code, I find lot of place used symbol - , e.g. lastTime - newTime; What is the meaning here? Did you check help(-) ? The explanation there seems at least as clear as anything I could cook up in a quick mail... Also, I find some method with the name start with dot, e.g. .RowStandardizeCentered = function(x) { div = sqrt( rowSums(x^2) ); div[ div == 0 ] = 1; return( x/div ); } What is the special meaning for the method name start with a dot? It means nothing in particular, except that such objects don't show up in ls() by default. The _intention_ is usually that the function is only to be used internally and not for end-user use. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] smoothScatter plot
Hi, I want to make a plot similar to sm1 (attached). The code I tried is: dcols - densCols(x,y) smoothScatter(x,y, col = dcols, pch=20,xlab=A,ylab=B) abline(h=0, col=red) But it turned out to be s1 (attached) with big dots. I was wondering if anything wrong with my code. Thanks,Zhengyu attachment: sm1.pngattachment: s1.png__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem about R
How to run pie chart for each categorical variable in a dataset? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is there any R function for data normalization?
Hello, I have a matrix with values, with columns c1..cn. I need the values to be normalized between 0 and 1 by column. Therefor, the 0 should correspond to the minimum value in the column c1 and 1 should correspond to the maximum value in the column c1. The remaining columns should be organized in the same way. Does a function in R exists for this purpose? Thanks, Rui [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there any R function for data normalization?
On 12-10-02 5:51 AM, Rui Esteves wrote: Hello, I have a matrix with values, with columns c1..cn. I need the values to be normalized between 0 and 1 by column. Therefor, the 0 should correspond to the minimum value in the column c1 and 1 should correspond to the maximum value in the column c1. The remaining columns should be organized in the same way. Does a function in R exists for this purpose? No, but it is easy to construct one using sweep. First subtract column mins, then divide by column maxes: normalize - function(x) { x - sweep(x, 2, apply(x, 2, min)) sweep(x, 2, apply(x, 2, max), /) } Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there any R function for data normalization?
Hi Rui, It doesn't really need one... doit - function(x) {(x - min(x, na.rm=TRUE))/(max(x,na.rm=TRUE) - min(x, na.rm=TRUE))} # use lapply to apply doit() to every column in a data frame # mtcars is built into R normed - as.data.frame(lapply(mtcars, doit)) # very that the range of all is [0, 1] lapply(normed, range) Cheers, Josh On Tue, Oct 2, 2012 at 2:51 AM, Rui Esteves ruimax...@gmail.com wrote: Hello, I have a matrix with values, with columns c1..cn. I need the values to be normalized between 0 and 1 by column. Therefor, the 0 should correspond to the minimum value in the column c1 and 1 should correspond to the maximum value in the column c1. The remaining columns should be organized in the same way. Does a function in R exists for this purpose? Thanks, Rui [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there any R function for data normalization?
Hello, ?scale in base package. Best Regards, Pascal Le 12/10/02 18:51, Rui Esteves a écrit : Hello, I have a matrix with values, with columns c1..cn. I need the values to be normalized between 0 and 1 by column. Therefor, the 0 should correspond to the minimum value in the column c1 and 1 should correspond to the maximum value in the column c1. The remaining columns should be organized in the same way. Does a function in R exists for this purpose? Thanks, Rui [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Release plans: R-2.15.2 on October 26
Just a note that we intend to have a second patch release version on October 26. The nickname will be Trick or Treat. Details of current changes can be found at http://stat.ethz.ch/R-manual/R-devel/doc/html/NEWS.html (scroll past the R-devel stuff and look at 2.15.1 patched.) -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com ___ r-annou...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-announce __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there any R function for data normalization?
Hello, Try the following. fun - function(x){ a - min(x) b - max(x) (x - a)/(b - a) } mat - matrix(rnorm(12), ncol=3) apply(mat, 2, fun) Hope this helps, Rui Barradas Em 02-10-2012 10:51, Rui Esteves escreveu: Hello, I have a matrix with values, with columns c1..cn. I need the values to be normalized between 0 and 1 by column. Therefor, the 0 should correspond to the minimum value in the column c1 and 1 should correspond to the maximum value in the column c1. The remaining columns should be organized in the same way. Does a function in R exists for this purpose? Thanks, Rui [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem about R
Hello, What is a dataset? In R there are lots of types, including 'data.frame' and 'list'. And R's type 'factor' corresponds to categorical variables. You must be more specific, show us an example dataset using dput(). dput( head(x, 30) ) # paste the output of this in a post Hope this helps, Rui Barradas Em 02-10-2012 05:50, Leung Chen escreveu: How to run pie chart for each categorical variable in a dataset? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lattice xyplot, get current level
Hi xyplot(y ~ x | subject) plots a separate graph of y against x for each level of subject. But I would like to have an own function for each level. Something like xyplot(y ~ x | subject, panel = function(x,y) { panel.xyplot(x,y) panel.curve(x,y) { # something that dependents on the current subject ... } }) How I get the current subject? (The current subject is the title of the graph, too) thx Christof __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Basic question about: - and method start with dot.
What is the special meaning for the method name start with a dot? It means nothing in particular, except that such objects don't show up in ls() by default. The _intention_ is usually that the function is only to be used internally and not for end-user use. But these days, if you're writing a package, you're better off using namespaces. Hadley -- RStudio / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading labels for very large heatmaps
Jacob, Try increasing the size of the pdf. For example, I can read all 919 labels in this plot ... pdf(width=200, height=200) plot(1:919, 1:919, axes=FALSE) axis(1) axis(2, at=1:919, las=1, cex=0.01) box() graphics.off() Jean JIMonroe jim...@virginia.edu wrote on 10/01/2012 03:42:24 PM: Hello, I have a large (919 X 919), hierarchically clustered heatmap that I would like to read the labels off of. I have tried saving the figure in pdf format and enlarging it until I can see the labels, but if I make the labels small enough to read (so that they don't overlap) using cexRow and cexCol, they do not appear in the pdf. The limit seems to be anything below cexRow or Col = 0.06. Is there a way to have smaller labels visible in the pdf? Is this an issue with resolution? I could get by with just a quarter of the heatmap (i.e. half of a row and half of a column) so that the labels don't have to be so small, but the heatmap must be clustered before this is done. I tried to cut the dendrogram at just the halfway point of the heatmap, but was not successful. Any suggestions? Thanks, Jacob [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there any R function for data normalization?
In this case you could use the apply function. Let your k*l matrix is named as y. Then, in order to standardize the values within each column use the following function aver-apply(y,2,mean) # calculate the mean within each column std-apply(y,2,sd) # calculate the stabdard deviation within each column z-matrix(0,nrow=k,ncol=l) for(i in 1:k){ for(j in 1:l){ z[i,j]-(y[i,j]-aver[j])/std[j] } } z On Tue, Oct 2, 2012 at 12:51 PM, Rui Esteves ruimax...@gmail.com wrote: Hello, I have a matrix with values, with columns c1..cn. I need the values to be normalized between 0 and 1 by column. Therefor, the 0 should correspond to the minimum value in the column c1 and 1 should correspond to the maximum value in the column c1. The remaining columns should be organized in the same way. Does a function in R exists for this purpose? Thanks, Rui [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mlogit and model-based recursive partitioning
Hi Achim: Excellent points. Thank you so much for your prompt reply. Tudor -- View this message in context: http://r.789695.n4.nabble.com/mlogit-and-model-based-recursive-partitioning-tp4644743p4644767.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem about R
On 10/02/2012 02:50 PM, Leung Chen wrote: How to run pie chart for each categorical variable in a dataset? Hi Leung, I depends upon your definition of categorical. If a character variable has 100 possible values, you won't get a very informative pie chart out of it. Therefore I suggest the following function as a start: pie_shopping-function(x,maxval=5) { xname-deparse(substitute(x)) for(var in 1:length(x)) { if(length(unique(x[,var])) = maxval) { png(paste(xname,var,.png,sep=)) pie(table(x[,var])) dev.off() } } } Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating an autocorrelated binary variable
Thank you very much for your answer Rolf. It helped. I try to simulate a trade indicator model from market microstructure, where the 1 or -1 indicate a buyer or seller initiated trade respectively. I use a Gaussian copula for simulation, so I can put in some correlation if I want to. So I generate my multivariate Gaussian sample and transform it to a uniform sample so I can use whatever marginal distribution I want to. Then I generate the autocorrelated binary by beginning with a transformation on the first observation and then by using the uniform sample to compare it to 0.6. Afterwards I have a 0.6 autocorrelated binary variable: sampleCop - function(n = 1000, rho = 0.6) { require(splus2R) mvrs - rmvnorm(n + 1, mean = rep(0, 3), cov = diag(3)) pmvrs - pnorm(mvrs, 0, 1) var1 - matrix(0, nrow = n + 1, ncol = 1) var1[1] - qbinom(pmvrs[1, 1], 1, 0.5) if(var1[1] == 0) var1[1] - -1 for(i in 1:(nrow(pmvrs) - 1)) { if(pmvrs[i + 1, 1] = rho) var1[i + 1] - var1[i] else var1[i + 1] - var1[i] * (-1) } sample - matrix(0, nrow = n, ncol = 4) sample[, 2] - var1[1:nrow(var1) - 1] sample[, 3] - var1[2:nrow(var1)] sample[, 4] - qnorm(pmvrs[1:nrow(var1) - 1, 2], 0, 1, 1, 0) + qnorm(pmvrs[1:nrow(var1) - 1, 3], 0, 1, 1, 0) - qnorm(pmvrs[2:nrow(var1), 3], 0, 1, 1, 0) sample[, 1] - sample[,2] * (0.015 + 0.03) - sample[,3] * (0.015 + 0.2*0.03) + sample[,4] sample } It is interesting though, that if I run my GMM-model on it it needs very high numbers of observations to converge to the real parameters. Using the gmm-function gmmfun - function(theta, data) { res = data[,1] - (theta[1] + theta[2]) * data[,2] + (theta[1] + theta[3] * theta[2]) * data[,3] moments = matrix(0, nrow=nrow(data), ncol=3) moments[,1] = data[,2] * data[,3] - data[,2]^2 * theta[3] moments[,2] = res * data[,2] moments[,3] = res * data[,3] return(moments) } And the objective function obj_fun - function(theta, data) { moments - gmmfun(theta, data) moments_sum - colSums(moments) val - moments_sum %*% diag(3) %*% moments_sum * 1.0 / nrow(data) val } Running then the command optim(fn = obj_fun, par = c(0.05, 0.05, .1), method = BFGS) gives a result with the first parameter fairly high away from its true value (true values for the three parameters were (0.015, 0.03, 0.2)). I wonder if this is due to the autocorrelation in the binary variable (btw. I also tried here lm, as the third parameter can be estimated via the correlation, but it remains highly dependent on the number of observations). Any suggestions here? Best regards Simon On Sep 28, 2012, at 5:02 AM, Rolf Turner rolf.tur...@xtra.co.nz wrote: I have no idea what your code is doing, nor why you want correlated binary variables. Correlation makes little or no sense in the context of binary random variables --- or more generally in the context of discrete random variables. Be that as it may, it is an easy calculation to show that if X and Y are binary random variables both with success probability of 0.5 then cor(X,Y) = 0.2 if and only if Pr(X=1 | Y = 1) = 0.6. So just generate X and Y using that fact: set.seed(42) X - numeric(1000) Y - numeric(1000) for(i in 1:1000) { Y[i] - rbinom(1,1,0.5) X[i] - if(Y[i]==1) rbinom(1,1,0.6) else rbinom(1,1,0.4) } # Check: cor(X,Y) # Get 0.2012336 Looks about right. Note that the sample proportions are 0.484 and 0.485 for X and Y respectively. These values do not differ significantly from 0.5. cheers, Rolf Turner On 28/09/12 08:26, Simon Zehnder wrote: Hi R-fellows, I am trying to simulate a multivariate correlated sample via the Gaussian copula method. One variable is a binary variable, that should be autocorrelated. The autocorrelation should be rho = 0.2. Furthermore, the overall probability to get either outcome of the binary variable should be 0.5. Below you can see the R code (I use for simplicity a diagonal matrix in rmvnorm even if it produces no correlated sample): sampleCop - function(n = 1000, rho = 0.2) { require(splus2R) mvrs - rmvnorm(n + 1, mean = rep(0, 3), cov = diag(3)) pmvrs - pnorm(mvrs, 0, 1) var1 - matrix(0, nrow = n + 1, ncol = 1) var1[1] - qbinom(pmvrs[1, 1], 1, 0.5) if(var1[1] == 0) var1[nrow(mvrs)] - -1 for(i in 1:(nrow(pmvrs) - 1)) { if(pmvrs[i + 1, 1] = rho) var1[i + 1] - var1[i] else var1[i + 1] - var1[i] * (-1) } sample - matrix(0, nrow = n, ncol = 4) sample[, 1] - var1[1:nrow(var1) - 1] sample[, 2] - var1[2:nrow(var1)] sample[, 3] - qnorm(pmvrs[1:nrow(var1) - 1, 2], 0, 1, 1, 0) sample[, 4] - qnorm(pmvrs[1:nrow(var1) - 1, 3], 0, 1, 1, 0) sample
[R] glpk package missing?
I have a piece of code (from Xie et al. 2009 Autophagy 5:217) that runs in R and requires the glpk package. A year or so ago, I was able to download and install the glpk package directly from insider the R program (for Windows), and everything worked fine. Now I have installed R for Windows on a new computer, and I cannot find the glpk package on the list of available packages on my local CRAN mirror. I do see two packages labeled Rglpk and glpkAPI, but as far as I can understand these just interface with an outside version of glpk, which I don't have. I also found an old link to R glpk in a Wikibooks page, but it now says Package ‘glpk’ was removed from the CRAN repository. Is there a reason why the glpk package is no longer available? Can anyone suggest the easiest workaround? I tried but failed to install an outside version of glpk for windows. I am working on putting together a protocol paper that includes the use of this code, so I need some option that is simple enough that anyone can do it, without first needing to be a computer expert. Thanks in advance for your help. Steven Backues __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] smoothScatter plot
It's hard to know what's wrong with your code since you did not supply it. Please supply a small working example and some data. To supply data use the dput() function, see ?dput() for details. John Kane Kingston ON Canada -Original Message- From: zhyjiang2...@hotmail.com Sent: Tue, 2 Oct 2012 11:38:31 +0800 To: r-help@r-project.org Subject: [R] smoothScatter plot Hi, I want to make a plot similar to sm1 (attached). The code I tried is: dcols - densCols(x,y) smoothScatter(x,y, col = dcols, pch=20,xlab=A,ylab=B) abline(h=0, col=red) But it turned out to be s1 (attached) with big dots. I was wondering if anything wrong with my code. Thanks,Zhengyu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ffbase, help with %in%
It doesn't seem possible to index an ff-vector using a logical ff-vector. You can use subset (also in ffbase) or first convert 'a' to a normal logical vector: library(ff) library(ffbase) data1 - as.ffdf(data.frame(a = letters[1:10], b=1:10)) data2 - as.ffdf(data.frame(a = letters[5:26], b=5:26)) a - data1[[1]] %in% data2$a subset(data1, a) data1[a[], ] HTH, Jan Lucas Chaparro lpchaparro...@gmail.com schreef: Hello to everyone. I'm trying to use the %in% to match to vectors in ff format. a-as.ff(data[,1]) %in% fire$fecha aff (open) logical length=3653 (3653) [1][2][3][4][5][6][7][8][3646] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE : FALSE [3647] [3648] [3649] [3650] [3651] [3652] [3653] FALSE FALSE FALSE FALSE FALSE FALSE FALSE Here you see a part of the data: data[1:20,] (just a sample, data has 3653 obs) fecha juliano altura UTM.E UTM.N 1 1990-07-01 182 15 248500 6239500 2 1990-07-02 183 15 248500 6239500 3 1990-07-03 184 15 248500 6239500 4 1990-07-04 185 15 248500 6239500 5 1990-07-05 186 15 248500 6239500 6 1990-07-06 187 15 248500 6239500 7 1990-07-07 188 15 248500 6239500 8 1990-07-08 189 15 248500 6239500 9 1990-07-09 190 15 248500 6239500 10 1990-07-10 191 15 248500 6239500 11 1990-07-11 192 15 248500 6239500 12 1990-07-12 193 15 248500 6239500 13 1990-07-13 194 15 248500 6239500 14 1990-07-14 195 15 248500 6239500 15 1990-07-15 196 15 248500 6239500 16 1990-07-16 197 15 248500 6239500 17 1990-07-17 198 15 248500 6239500 18 1990-07-18 199 15 248500 6239500 19 1990-07-19 200 15 248500 6239500 20 1990-07-20 201 15 248500 6239500 fire$fecha[1:20,] [1] 1984-11-08 1984-11-08 1984-11-09 1984-11-09 1984-11-09 [6] 1984-11-10 1984-11-10 1984-11-11 1984-11-11 1984-11-11 [11] 1984-11-11 1984-11-11 1984-11-11 1984-11-12 1984-11-12 [16] 1984-11-13 1984-11-13 1984-11-13 1984-11-14 1984-11-14 to see if a got any match: table.ff(a) FALSE TRUE 1687 1966 Mensajes de aviso perdidosIn if (useNA == no) c(NA, NaN) : la condición tiene longitud 1 y sólo el primer elemento será usado in a regular data.frame I use data[a,] to extract the rows that a == TRUE, but when i do this in a ffdf i get this error: data[a,]Error: vmode(index) == integer is not TRUE I'm just learning how to use the ff package so, obviously I'm missing something If any of you knows how to solve this, please teach me. Thank you so much. Lucas. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
It's hard to know what's wrong since you did not supply your code. Please supply a small working example and some data. To supply data use the dput() function, see ?dput() for details. Welcome to R. John Kane Kingston ON Canada -Original Message- From: mbhpat...@gmail.com Sent: Mon, 1 Oct 2012 14:25:20 -0700 To: r-help@r-project.org Subject: [R] (no subject) Hello, I am a new R -user and request your help for the following problem. I need to merge two dataset of longitudinal study which has two column (id and respose) common. when I used merge option to join the datas side be side, because of the repeated subject id, I got larger data set which is not accurate. I would like to connect twi data sets by id and response in such a way that data are connected by same id and response type and if the same subject has less data point in one data set, then produce NA. Sample data sets is attached. Bibek __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lattice xyplot, get current level
subj - levels(subject) subj[panel.number()] seems to be a good solution is there something like panel.legend (instead of panel.text)? Am 02-10-2012 12:59, schrieb Christof Kluß: Hi xyplot(y ~ x | subject) plots a separate graph of y against x for each level of subject. But I would like to have an own function for each level. Something like xyplot(y ~ x | subject, panel = function(x,y) { panel.xyplot(x,y) panel.curve(x,y) { # something that dependents on the current subject ... } }) How I get the current subject? (The current subject is the title of the graph, too) thx Christof __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with mutli-dimensional array
I want to make a multi-dimensional array. To be specific I want to make the following array results-array(0,dim=c(2,2,64,7)) This is the code I have created but it gives no result due to the error subscript out of bound. x-rep(7,7) # Missingness in intervention y-rep(7,7) # Missingness in control arraynames-list(Group=c(Success,Failure),Outcome=c(Intervention,Control),Trial=c(1:7)) mat.stat-array(c(9,16,10,15,66,12,44,23,102,88,66,104,277,60,247,119,23,43,20,41,201,162,122, 263,14,41,4,41),dim=c(2,2,7),dimnames=arraynames);mat.stat Nx-rep(0,length(x)) Ny-rep(0,length(y)) n-(x+1)*(y+1) results-array(0,dim=c(2,2,64,7)) l-1 for(i in 1:length(x)){ Nx[i] - length(1:(x[i]+1)) Ny[i] - length(1:(y[i]+1)) for(j in 1:(Nx[i])){ for(k in 1:(Ny[i])){ results[,,l,i]-mat.stat[,,i]+matrix(c(c(0:x[i])[j],c(0:y[i])[k],-c(0:x[i])[j], -c(0:y[i])[k]),nrow=2,ncol=2,byrow=T) l-l+1 } } } results Any suggestion would be really welcome! Thank you very much! Loukia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lattice xyplot, get current level
Christof: You are aware, I assume, that the subject level name can be incorporated into the strip label via the strip function argument; e.g. xyplot(..., strip = strip.custom(style = 1, strip.levels=c(TRUE,TRUE)), ...) Cheers, Bert On Tue, Oct 2, 2012 at 6:45 AM, Christof Kluß ckl...@email.uni-kiel.de wrote: subj - levels(subject) subj[panel.number()] seems to be a good solution is there something like panel.legend (instead of panel.text)? Am 02-10-2012 12:59, schrieb Christof Kluß: Hi xyplot(y ~ x | subject) plots a separate graph of y against x for each level of subject. But I would like to have an own function for each level. Something like xyplot(y ~ x | subject, panel = function(x,y) { panel.xyplot(x,y) panel.curve(x,y) { # something that dependents on the current subject ... } }) How I get the current subject? (The current subject is the title of the graph, too) thx Christof __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glpk package missing?
On 10/2/2012 6:08 AM, Steven Backues wrote: I have a piece of code (from Xie et al. 2009 Autophagy 5:217) that runs in R and requires the glpk package. A year or so ago, I was able to download and install the glpk package directly from insider the R program (for Windows), and everything worked fine. Now I have installed R for Windows on a new computer, and I cannot find the glpk package on the list of available packages on my local CRAN mirror. I do see two packages labeled Rglpk and glpkAPI, but as far as I can understand these just interface with an outside version of glpk, which I don't have. I also found an old link to R glpk in a Wikibooks page, but it now says Package ‘glpk’ was removed from the CRAN repository. Is there a reason why the glpk package is no longer available? Can anyone suggest the easiest workaround? I tried but failed to install an outside version of glpk for windows. I am working on putting together a protocol paper that includes the use of this code, so I need some option that is simple enough that anyone can do it, without first needing to be a computer expert. Have you tried to contact the author(s) and maintainer(s) of Rglpk and glpkAPI to see what they recommend? If it were my project, I think I might try that first. Parallel with that, I might download glpk_4.8-0.5.tar.gz from CRAN - Packages - Archived. Then I'd do R CMD check and INSTALL from that. Then I'd look for the author(s) and maintainer(s) listed there, and try to contact them (or him or her). I don't know, but I'd guess that glpk was removed from CRAN after it failed to pass some new CRAN test and the maintainer failed to reply to requests from the CRAN maintainers for updates. What other package(s) will be referenced in your protocol paper? Is there one you control? If yes, and you only wanted a few functions from glpk, you might just copy that into your package and run with it. If no, I think you could name yourself the maintainer of glpk, give it a newer number, get it to pass R CMD check, and submit your new version to CRAN. Along the way, you could add other things to make the protocol even easier. Hope this helps. Spencer Thanks in advance for your help. Steven Backues __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Spencer Graves, PE, PhD President and Chief Technology Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 web: www.structuremonitoring.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Transform pairwise observations into a table
Another base-R-only solution uses a 2-column matrix of subscripts to fill a matrix. E.g., f - function(data) { + mat - matrix(NA_real_, nrow=max(data[[1]]), ncol=max(data[[2]])) + mat[cbind(data[[1]], data[[2]])] - data[[3]] + mat + } f(dat1) [,1] [,2] [,3] [,4] [1,] 1.000 0.250 0.125 0.5 [2,] 0.250 1.000 0.125 0.5 [3,] 0.125 0.125 1.000 0.5 [4,] 0.500 0.500 0.500 1.0 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius Sent: Monday, October 01, 2012 7:33 PM To: arun Cc: R help; Bert Gunter; Hadjixenofontos, Athena Subject: Re: [R] Transform pairwise observations into a table On Oct 1, 2012, at 2:30 PM, arun wrote: HI AHJ, No problem. One more way in addition to reshape() (Rui's suggestion) to get the same result. library(reshape) as.matrix(cast(melt(dat1,id=c(ind1,ind2)),ind1~ind2,value=value)) # 1 2 3 4 #1 1.000 0.250 0.125 0.5 #2 0.250 1.000 0.125 0.5 #3 0.125 0.125 1.000 0.5 #4 0.500 0.500 0.500 1.0 A.K. That looks a tad ... well, ... complicated. So perhaps these base-only solutions with tapply might be more accessible: Some of them do border on the whimsical, I will admit: with (dat1, tapply(coef, list(ind1,ind2), I)) with (dat1, tapply(coef, list(ind1,ind2), c)) with (dat1, tapply(coef, list(ind1,ind2), ^, 1)) with (dat1, tapply(coef, list(ind1,ind2), +, 0)) It is a specific response to the request for a `table`-like function tha twouldallow the application of other functions. Cnage the `1` to `2` in the third instance and you get the tabulated squares. And do not forget the availability of `ftable` to flatten the output of `tapply` retunred values. -- David. - Original Message - From: Hadjixenofontos, Athena ahadjixenofon...@med.miami.edu To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Monday, October 1, 2012 12:59 PM Subject: Re: [R] Transform pairwise observations into a table Thank you. I had looked at xtabs but misunderstood the syntax. This is great. :) AHJ On Oct 1, 2012, at 12:53 PM, arun smartpink...@yahoo.com wrote: Hi, Try this: dat1-read.table(text= ind1 ind2 coef 1 1 1 1 2 0.25 1 3 0.125 1 4 0.5 2 2 1 2 1 0.25 2 3 0.125 2 4 0.5 3 3 1 3 1 0.125 3 2 0.125 3 4 0.5 4 4 1 4 1 0.5 4 2 0.5 4 3 0.5 ,sep=,header=TRUE) mat1-as.matrix(xtabs(coef~ind1+ind2,data=dat1)) #ind2 #ind1 1 2 3 4 # 1 1.000 0.250 0.125 0.500 #2 0.250 1.000 0.125 0.500 #3 0.125 0.125 1.000 0.500 #4 0.500 0.500 0.500 1.000 A.K. - Original Message - From: AHJ ahadjixenofon...@med.miami.edu To: r-help@r-project.org Cc: Sent: Monday, October 1, 2012 12:17 PM Subject: [R] Transform pairwise observations into a table Hi, I have a table of pairs of individuals and a coefficient that belongs to the pair: ind1ind2coef 111 120.25 130.125 140.5 221 210.25 230.125 240.5 331 310.125 320.125 340.5 441 410.5 420.5 430.5 And I want to convert it to a matrix where each individual is both a row and a column and at the intersection of each pair is the coefficient that belongs to that pair: 1 2 3 4 11 0.25 0.1250.5 20.25 1 0.1250.5 30.1250.12510.5 40.5 0.5 0.51 If table() would allow me to specify something other than frequencies to fill the table with, it would be what I need. I tried a few different combinations of t() and unique() but none of it made enough sense to post as my starting code... I am just lost. Any help would be greatly appreciated. Thank you, AHJ -- View this message in context: http://r.789695.n4.nabble.com/Transform-pairwise- observations-into-a-table-tp4644706.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org
Re: [R] Basic question about: - and method start with dot.
On Oct 2, 2012, at 13:35 , Hadley Wickham wrote: What is the special meaning for the method name start with a dot? It means nothing in particular, except that such objects don't show up in ls() by default. The _intention_ is usually that the function is only to be used internally and not for end-user use. But these days, if you're writing a package, you're better off using namespaces. Sure, but that doesn't keep package writers from using that kind of naming convention for non-exported objects. (Notice that this started as a question about reading 3rd party source codes.) -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with mutli-dimensional array
On 02-10-2012, at 16:20, Loukia Spineli spinelilouki...@gmail.com wrote: I want to make a multi-dimensional array. To be specific I want to make the following array results-array(0,dim=c(2,2,64,7)) This is the code I have created but it gives no result due to the error subscript out of bound. x-rep(7,7) # Missingness in intervention y-rep(7,7) # Missingness in control arraynames-list(Group=c(Success,Failure),Outcome=c(Intervention,Control),Trial=c(1:7)) mat.stat-array(c(9,16,10,15,66,12,44,23,102,88,66,104,277,60,247,119,23,43,20,41,201,162,122, 263,14,41,4,41),dim=c(2,2,7),dimnames=arraynames);mat.stat Nx-rep(0,length(x)) Ny-rep(0,length(y)) n-(x+1)*(y+1) results-array(0,dim=c(2,2,64,7)) l-1 for(i in 1:length(x)){ Nx[i] - length(1:(x[i]+1)) Ny[i] - length(1:(y[i]+1)) for(j in 1:(Nx[i])){ for(k in 1:(Ny[i])){ results[,,l,i]-mat.stat[,,i]+matrix(c(c(0:x[i])[j],c(0:y[i])[k],-c(0:x[i])[j], -c(0:y[i])[k]),nrow=2,ncol=2,byrow=T) l-l+1 } } } results If you had inserted this line cat(i=,i,l=,l,j=,j,k=,k,\n) before the assignment to results[,,l,i] in the inner loop you could have quite easily seen what the problem is. The counter l (letter el) keeps on increasing in your code and when it reaches 65 R will complain. So at an appropriate place in the code you need to reset l to 1. I suggest you move the assignment l-1 to just before the line with for( j in 1:Nx[i]) ... Whether the results are what you desire is up to you to decide. Please use some more whitespace and do some indenting in your code. As presented it is unreadable. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with mutli-dimensional array
Hello, See if this is it. Nx - rep(0,length(x)) Ny - rep(0,length(y)) n - (x+1)*(y+1) results - array(0, dim=c(2,2,64,7)) # l - 1 # --- This changed place for(i in 1:length(x)){ Nx[i] - length(1:(x[i]+1)) Ny[i] - length(1:(y[i]+1)) l - 1 # --- To here for(j in 1:(Nx[i])){ for(k in 1:(Ny[i])){ Hope this helps, Rui Barradas Em 02-10-2012 15:20, Loukia Spineli escreveu: I want to make a multi-dimensional array. To be specific I want to make the following array results-array(0,dim=c(2,2,64,7)) This is the code I have created but it gives no result due to the error subscript out of bound. x-rep(7,7) # Missingness in intervention y-rep(7,7) # Missingness in control arraynames-list(Group=c(Success,Failure),Outcome=c(Intervention,Control),Trial=c(1:7)) mat.stat-array(c(9,16,10,15,66,12,44,23,102,88,66,104,277,60,247,119,23,43,20,41,201,162,122, 263,14,41,4,41),dim=c(2,2,7),dimnames=arraynames);mat.stat Nx-rep(0,length(x)) Ny-rep(0,length(y)) n-(x+1)*(y+1) results-array(0,dim=c(2,2,64,7)) l-1 for(i in 1:length(x)){ Nx[i] - length(1:(x[i]+1)) Ny[i] - length(1:(y[i]+1)) for(j in 1:(Nx[i])){ for(k in 1:(Ny[i])){ results[,,l,i]-mat.stat[,,i]+matrix(c(c(0:x[i])[j],c(0:y[i])[k],-c(0:x[i])[j], -c(0:y[i])[k]),nrow=2,ncol=2,byrow=T) l-l+1 } } } results Any suggestion would be really welcome! Thank you very much! Loukia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] kmeans cluster analysis. How do I (1) determine probability of cluster membership (2) determine cluster membership for a new subject
Window XP R 2.15 I am running a cluster analysis in which I ask for three clusters (see code below). The analysis nicely tells me what cluster each of the subjects in my input dataset belongs to. I would like two pieces of information (1) for every subject in my input data set, what is the probability of the subject belonging to each of the three cluster (2) given a new subject, someone who was not in my original dataset, how can I determine their cluster assignment? Thanks, John # K-Means Cluster Analysis jclusters - 3 fit - kmeans(datascaled, jclusters) # 3 cluster solution and fit$cluster tells me what cluster each observation in my input dataset belongs to (output truncated for brevity): fit$cluster 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 . . . . 1 1 1 1 3 1 1 1 1 2 1 2 1 1 1 1 1 . . . . How do I get probability of being in cluster 1, cluster 2, and cluster 3 for a given subject, e.g datascaled[1,]?How do I get the cluster assigment for a new subject?Thanks,John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Integration in R
Dear R-users, I am facing problem with integrating in R a likelihood function which is a function of four parameters. It's giving me the result at the end but taking more than half an hour to run. I'm wondering is there any other efficient way deal with. The following is my code. I am ready to provide any other description of my function if you need to move forward. library(cubature) dose-c(2,3,5) y0-c(2,1,0) y1-c(1,1,1) y2-c(0,1,2) lf-function (x) { v-1 for (i in 1:length(dose)) { psi0-1/((1+exp(x[1]+x[2]*dose[i]))*(1+exp(x[3]+x[4]*dose[i]))) psi1-exp(x[1]+x[2]*dose[i])/((1+exp(x[1]+x[2]*dose[i]))*(1+exp(x[3]+x[4]*dose[i]))) v-v*(psi0^y0[i])*(psi1^y1[i])*((1-psi0-psi1)^y2[i]) } return(v) } adaptIntegrate(lf, lowerLimit = c(-20, 0,-20,0), upperLimit = c(0,10,0,10)) - Thanks for your attention. Kind regards, Jamil. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Basic question about: - and method start with dot.
require(fortunes) Loading required package: fortunes fortune(-) I wish - had never been invented, as it makes an esoteric and dangerous feature of the language *seem* normal and reasonable. If you want to dumb down R/S into a macro language, this is the operator for you. -- Bill Venables R-help (July 2001) On Tue, Oct 2, 2012 at 11:14 AM, peter dalgaard pda...@gmail.com wrote: On Oct 2, 2012, at 13:35 , Hadley Wickham wrote: What is the special meaning for the method name start with a dot? It means nothing in particular, except that such objects don't show up in ls() by default. The _intention_ is usually that the function is only to be used internally and not for end-user use. But these days, if you're writing a package, you're better off using namespaces. Sure, but that doesn't keep package writers from using that kind of naming convention for non-exported objects. (Notice that this started as a question about reading 3rd party source codes.) -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] count data as independent variable in logistinc regression
Dear R users, I would like to employ count data as covariates while fitting a logistic regression model. My question is: do I violate any assumption of the logistic (and, more in general, of the generalized linear) models by employing count, non-negative integer variables as independent variables? I found a lot of references in the literature regarding hot to use count data as outcome, but not as covariates; see for example the very clear paper: N E Breslow (1996) Generalized Linear Models: Checking Assumptions and Strengthening Conclusions, Congresso Nazionale Societa Italiana di Biometria, Cortona June 1995, available at http://biostat.georgiahealth.edu/~dryu/course/stat9110spring12/land16_ref.pdf. Loosely speaking, it seems that glm assumptions may be expressed as follows: iid residuals; the link function must correctly represent the relationship among dependent and independent variables; absence of outliers Does everybody knows whether there exists any other assumption/technical problem that may suggest to use some other type of models for dealing with count covariates? Finally, please notice that my data contain relatively few samples (100) and that count variables' ranges can vary within 3-4 order of magnitude (i.e. some variables has value in the range 0-10, while other variables may have values within 0-1). A simple example code follows: ### #genrating simulated data var1 = sample(0:10, 100, replace = TRUE); var2 = sample(0:1000, 100, replace = TRUE); var3 = sample(0:10, 100, replace = TRUE); outcome = sample(0:1, 100, replace = TRUE); dataset = data.frame(outcome, var1, var2, var3); #fitting the model model = glm(outcome ~ ., family=binomial, data = dataset) #inspecting the model print(model) ### Regards, -- Vincenzo Lagani Research Fellow BioInformatics Laboratory Institute of Computer Science Foundation for Research and Technology - Hellas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data as independent variable in logistinc regression
This is not primarily an R question, although I grant you that it might intersect packages in R that do what you want. Nevertheless, I think you would do better posting on a statistical list, like stats.stackexchange.com . Maybe once you've figured out there what you want, you can come back to R to find an implementation. Cheers, Bert On Tue, Oct 2, 2012 at 9:10 AM, vlag...@ics.forth.gr wrote: Dear R users, I would like to employ count data as covariates while fitting a logistic regression model. My question is: do I violate any assumption of the logistic (and, more in general, of the generalized linear) models by employing count, non-negative integer variables as independent variables? I found a lot of references in the literature regarding hot to use count data as outcome, but not as covariates; see for example the very clear paper: N E Breslow (1996) Generalized Linear Models: Checking Assumptions and Strengthening Conclusions, Congresso Nazionale Societa Italiana di Biometria, Cortona June 1995, available at http://biostat.georgiahealth.edu/~dryu/course/stat9110spring12/land16_ref.pdf. Loosely speaking, it seems that glm assumptions may be expressed as follows: iid residuals; the link function must correctly represent the relationship among dependent and independent variables; absence of outliers Does everybody knows whether there exists any other assumption/technical problem that may suggest to use some other type of models for dealing with count covariates? Finally, please notice that my data contain relatively few samples (100) and that count variables' ranges can vary within 3-4 order of magnitude (i.e. some variables has value in the range 0-10, while other variables may have values within 0-1). A simple example code follows: ### #genrating simulated data var1 = sample(0:10, 100, replace = TRUE); var2 = sample(0:1000, 100, replace = TRUE); var3 = sample(0:10, 100, replace = TRUE); outcome = sample(0:1, 100, replace = TRUE); dataset = data.frame(outcome, var1, var2, var3); #fitting the model model = glm(outcome ~ ., family=binomial, data = dataset) #inspecting the model print(model) ### Regards, -- Vincenzo Lagani Research Fellow BioInformatics Laboratory Institute of Computer Science Foundation for Research and Technology - Hellas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error messages when attempting to calculate polychoric correlation matrices using the psych package
Dear Professor Fox, Apologies for my oversight relating to the polychor command and thank you for your advice. I turned to the polychor command when trying to find an equivalent for the polychoric command found in the psych package (I am following a procedure outlined in Gadermann, Guhn Zumbo, 2012 that uses this command rather than the polycor package) . The polychoric command is returning an error: polychoric(data.matrix18) Error in max(x, na.rm = TRUE) - min(x, na.rm = TRUE) : non-numeric argument to binary operator I have now computed a polychoric correlation matrix using your hetcor command and the two-step estimation method (ML returned an error that I have read about in other posts); I didn't have to modify the numeric data. I have framed the output into a matrix using crude copy and paste methods with Excel as I was not able to frame the output values using R. pol.matrix - structure(list(V1 = c(1, 0.1810841, 0.380326, 0.2882147, 0.3106325, 0.2296951, 0.2055215, 0.262325, 0.6560471, 0.3579354, 0.4106915, 0.4857118, 0.3846493, 0.3773653, 0.3806744, 0.3021153, 0.56, 0.2136178), V2 = c(0.1810841, 1, 0.4008101, 0.4456874, 0.5201655, 0.2299313, 0.1537878, 0.6107596, 0.1961909, 0.245016, 0.261282, 0.232519, 0.3282176, 0.3853363, 0.3923762, 0.2678565, 0.594987, 0.5488453), V3 = c(0.380326, 0.4008101, 1, 0.5972196, 0.5136961, 0.2760067, 0.2929657, 0.3148125, 0.4218976, 0.4259719, 0.4035297, 0.4345366, 0.5482426, 0.6126576, 0.5080505, 0.3579256, 0.4045489, 0.2919147), V4 = c(0.2882147, 0.4456874, 0.5972196, 1, 0.4262034, 0.2351873, 0.278928, 0.434891, 0.3305545, 0.3283345, 0.2996847, 0.3660209, 0.4479252, 0.4864451, 0.4788135, 0.2796844, 0.3950568, 0.3319467), V5 = c(0.3106325, 0.5201655, 0.5136961, 0.4262034, 1, 0.2102538, 0.3045312, 0.5690656, 0.2422597, 0.3044538, 0.2995296, 0.217496, 0.3671066, 0.421044, 0.4349045, 0.2683132, 0.6243316, 0.551007), V6 = c(0.2296951, 0.2299313, 0.2760067, 0.2351873, 0.2102538, 1, 0.428175, 0.2127635, 0.2863833, 0.3223868, 0.5212511, 0.3909867, 0.3676703, 0.3744179, 0.3025505, 0.7096538, 0.308744, 0.2504025), V7 = c(0.2055215, 0.1537878, 0.2929657, 0.278928, 0.3045312, 0.428175, 1, 0.2071432, 0.2756563, 0.1874141, 0.4163661, 0.3852869, 0.336576, 0.3489307, 0.3197317, 0.5118747, 0.3065872, 0.2786799), V8 = c(0.262325, 0.6107596, 0.3148125, 0.434891, 0.5690656, 0.2127635, 0.2071432, 1, 0.3062833, 0.3587556, 0.2742062, 0.2397737, 0.3471734, 0.3757533, 0.4882055, 0.2490953, 0.7115149, 0.635405), V9 = c(0.6560471, 0.1961909, 0.4218976, 0.3305545, 0.2422597, 0.2863833, 0.2756563, 0.3062833, 1, 0.4575778, 0.433267, 0.6138503, 0.3967928, 0.4067687, 0.4040989, 0.3540081, 0.2188774, 0.2753278), V10 = c(0.3579354, 0.245016, 0.4259719, 0.3283345, 0.3044538, 0.3223868, 0.1874141, 0.3587556, 0.4575778, 1, 0.3523538, 0.4246141, 0.3921508, 0.4832027, 0.4696606, 0.3012235, 0.3051279, 0.2754337), V11 = c(0.4106915, 0.261282, 0.4035297, 0.2996847, 0.2995296, 0.5212511, 0.4163661, 0.2742062, 0.433267, 0.3523538, 1, 0.5791717, 0.4740649, 0.4200025, 0.4236943, 0.6216296, 0.3404673, 0.2766948), V12 = c(0.4857118, 0.232519, 0.4345366, 0.3660209, 0.217496, 0.3909867, 0.3852869, 0.2397737, 0.6138503, 0.4246141, 0.5791717, 1, 0.6038785, 0.5410787, 0.5420467, 0.4370163, 0.2656908, 0.1940703), V13 = c(0.3846493, 0.3282176, 0.5482426, 0.4479252, 0.3671066, 0.3676703, 0.336576, 0.3471734, 0.3967928, 0.3921508, 0.4740649, 0.6038785, 1, 0.6883049, 0.553011, 0.3793382, 0.3710298, 0.3068081), V14 = c(0.3773653, 0.3853363, 0.6126576, 0.4864451, 0.421044, 0.3744179, 0.3489307, 0.3757533, 0.4067687, 0.4832027, 0.4200025, 0.5410787, 0.6883049, 1, 0.6113808, 0.3660114, 0.3863373, 0.3410903), V15 = c(0.3806744, 0.3923762, 0.5080505, 0.4788135, 0.4349045, 0.3025505, 0.3197317, 0.4882055, 0.4040989, 0.4696606, 0.4236943, 0.5420467, 0.553011, 0.6113808, 1, 0.407302, 0.5054361, 0.4577031), V16 = c(0.3021153, 0.2678565, 0.3579256, 0.2796844, 0.2683132, 0.7096538, 0.5118747, 0.2490953, 0.3540081, 0.3012235, 0.6216296, 0.4370163, 0.3793382, 0.3660114, 0.407302, 1, 0.3743952, 0.3585206), V17 = c(0.56, 0.594987, 0.4045489, 0.3950568, 0.6243316, 0.308744, 0.3065872, 0.7115149, 0.2188774, 0.3051279, 0.3404673, 0.2656908, 0.3710298, 0.3863373, 0.5054361, 0.3743952, 1, 0.7651242), V18 = c(0.2136178, 0.5488453, 0.2919147, 0.3319467, 0.551007, 0.2504025, 0.2786799, 0.635405, 0.2753278, 0.2754337, 0.2766948, 0.1940703, 0.3068081, 0.3410903, 0.4577031, 0.3585206, 0.7651242, 1)), .Names = c(V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, V11, V12, V13, V14, V15, V16, V17, V18), class = data.frame, row.names = c(NA, -18L)) I have then used this matrix with the alpha command in psych: alpha(pol.matrix) Reliability analysis Call: alpha(x = pol.matrix) raw_alpha std.alpha G6(smc) average_r 0.92 0.920.94 0.39 Reliability if an item is dropped: raw_alpha std.alpha G6(smc) average_r V1 0.92 0.920.94 0.39 V2 0.92 0.920.94 0.39 V3 0.91
Re: [R] R for commercial use
cool, im none the wiser now, but thanks anyway That would be because your IT questionnaire requires telepathy to understand what the writer wanted to know. My take on these, in case it helps: 1. Is R a Clientsoftware / Serversoftware / Systemsoftware? R can run both on a client or on a server. However, it is normally installed on a client machine in the same way as Word, Excel, Gimp etc. If you intend to install it on a desktop I suggest the answer 'client' 2. Does R need a chellenge-response treatment for activation? No 3. Is R proxy-able? R can access the internet through a proxy. Usually installing usint the Internet2 option lets R work through Windows' existing proxy setup. 4. Is the personalised WINDOWS-NTLM-authorization at ISA-proxy possible? I have no idea why any sensible IT staff would expect an end user to be able to answer this question. I deduce your IT staff are not sensible. Further, this refers to a LAN security protocol that authenticates client-server interactions (badly, I assume, as even microsoft no longer recommend it). I suggest you answer 'N/A' because that authentication should be handled by client-server networking protocols and not by R 5. Are any exceptions for virus scanner needed? If yes, which ones? Not usually; R does not trigger antivirus alerts unless the antivirus software is making mistakes. That happened once to my knowledge. 6. Does R changes system files or folders? (e.g. system32, registry). If yes, which ones? R's installation process adds the R version to the registry and associates .Rdata files with R via the usual registry mechanism. You can tell it not to do that. Further, if it fails to do so for permission reasons R will still run normally. I pefer to get our admin staff to install R as administrators to get the registry properly updated, and I also ask them to grant write permission to the R subdirectory in the program files directory so that R can install contributed packages there. But that is not essential; R can install packages in user space instead if permission is not granted for program files space. Of course if a user uses R to read and write from the system areas, then of course it can change files there; byt that is equally true of word, excel etc. *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R process must die - can I save history?
I connected from my desktop Linux box to a Linux server using ssh in an xterm, but that xterm was running in Xvnc. I'm running R on the server in that xterm (over ssh). Something went wrong with Xvnc that has caused it to hang, probably this bug: https://bugs.launchpad.net/ubuntu/+source/vnc4/+bug/819473 So I can't get back to that ssh session or to R. I had done a bunch of work in R but the command history hasn't been written out. If I kill R, I assume the command history is gone. I wish I could somehow cause R to dump the command history. Is there any way to tell the running R process to write the history somewhere? Thanks in advance. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Integration in R
On 02-10-2012, at 17:23, Naser Jamil jamilnase...@gmail.com wrote: Dear R-users, I am facing problem with integrating in R a likelihood function which is a function of four parameters. It's giving me the result at the end but taking more than half an hour to run. I'm wondering is there any other efficient way deal with. The following is my code. I am ready to provide any other description of my function if you need to move forward. library(cubature) dose-c(2,3,5) y0-c(2,1,0) y1-c(1,1,1) y2-c(0,1,2) lf-function (x) { v-1 for (i in 1:length(dose)) { psi0-1/((1+exp(x[1]+x[2]*dose[i]))*(1+exp(x[3]+x[4]*dose[i]))) psi1-exp(x[1]+x[2]*dose[i])/((1+exp(x[1]+x[2]*dose[i]))*(1+exp(x[3]+x[4]*dose[i]))) v-v*(psi0^y0[i])*(psi1^y1[i])*((1-psi0-psi1)^y2[i]) } return(v) } adaptIntegrate(lf, lowerLimit = c(-20, 0,-20,0), upperLimit = c(0,10,0,10)) There are several things you could do. 1. Use the compiler package to make a compiled version of your function. 2. rewrite your function by putting x[1] etc. in separate variables x1, x2,... so avoiding the [..] indexing. You can do the same for dose[i]. And also compiling this version of the function. 3. do not recompute expressions such as exp(x1+x2*dose.i) several times. Store the result in a temporary variable and use that variable. With these functions library(compiler) lf.c - cmpfun(lf) lf1 -function (x) { v-1 x1 - x[1] x2 - x[2] x3 - x[3] x4 - x[4] for (i in 1:length(dose)) { dose.i - dose[i] z1 - exp(x1+x2*dose.i) z2 - exp(x3+x4*dose.i) psi0-1/((1+z1)*(1+z2)) psi1-z1*psi0 v-v*(psi0^y0[i])*(psi1^y1[i])*((1-psi0-psi1)^y2[i]) } return(v) } lf1.c - cmpfun(lf1) I tested relative speeds with this code (small tolerance and max. function evaluations) library(rbenchmark) f1 - function() adaptIntegrate(lf , lowerLimit = c(-20, 0,-20,0), upperLimit = c(0,10,0,10), tol=1e-3,maxEval=5) f2 - function() adaptIntegrate(lf.c , lowerLimit = c(-20, 0,-20,0), upperLimit = c(0,10,0,10), tol=1e-3,maxEval=5) f3 - function() adaptIntegrate(lf1 , lowerLimit = c(-20, 0,-20,0), upperLimit = c(0,10,0,10), tol=1e-3,maxEval=5) f4 - function() adaptIntegrate(lf1.c, lowerLimit = c(-20, 0,-20,0), upperLimit = c(0,10,0,10), tol=1e-3,maxEval=5) benchmark(z1 - f1(),z2 - f2(), z3 - f3(),z4 - f4(),replications=1) Result: benchmark(z1 - f1(),z2 - f2(), z3 - f3(),z4 - f4(),replications=1) test replications elapsed relative user.self sys.self user.child 1 z1 - f1()1 3.1974.535 3.1770.008 0 2 z2 - f2()1 1.2401.759 1.2350.003 0 3 z3 - f3()1 2.1713.079 2.1670.002 0 4 z4 - f4()1 0.7051.000 0.7000.004 0 As you can see you should be able to get at least a fourfold speedup by using the compiled version of the optimized function. I would certainly set tol and maxEval to something reasonable initially. Checking that z1, z2, z3, and z4 are equal is left to you. Finally, it may even be possible to eliminate the for loop in your function but I'll leave that for someone else. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R process must die - can I save history?
?history in a fresh R session, to see what might be possible. I'll bet the answer is, No, you're screwed, though. Nevertheless, maybe Linux experts can save you. May the Force be with you. -- Bert On Tue, Oct 2, 2012 at 10:17 AM, Mike Miller mbmille...@gmail.com wrote: I connected from my desktop Linux box to a Linux server using ssh in an xterm, but that xterm was running in Xvnc. I'm running R on the server in that xterm (over ssh). Something went wrong with Xvnc that has caused it to hang, probably this bug: https://bugs.launchpad.net/ubuntu/+source/vnc4/+bug/819473 So I can't get back to that ssh session or to R. I had done a bunch of work in R but the command history hasn't been written out. If I kill R, I assume the command history is gone. I wish I could somehow cause R to dump the command history. Is there any way to tell the running R process to write the history somewhere? Thanks in advance. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kmeans cluster analysis. How do I (1) determine probability of cluster membership (2) determine cluster membership for a new subject
John, On Tue, 2 Oct 2012 11:35:12 -0400 John Sorkin jsor...@grecc.umaryland.edu wrote: Window XP R 2.15 I am running a cluster analysis in which I ask for three clusters (see code below). The analysis nicely tells me what cluster each of the subjects in my input dataset belongs to. I would like two pieces of information (1) for every subject in my input data set, what is the probability of the subject belonging to each of the three cluster K-means provides hard clustering, whatever cluster has closest mean gets the assignment. (2) given a new subject, someone who was not in my original dataset, how can I determine their cluster assignment? Look at the distance between the subject the cluster means: the one that is closest gets assigned the cluster. If you are looking for probabilistic clustering (under Gaussian mixture model assumptions), you could use model-based clustering: one R package is mclust. Btw, note that kmeans is very sensitive to initialization (as is mclust): you may want to try several random starts (for kmeans), at the very least. Use the argument nstart with a huge number. HTH, Ranjan Thanks, John # K-Means Cluster Analysis jclusters - 3 fit - kmeans(datascaled, jclusters) # 3 cluster solution and fit$cluster tells me what cluster each observation in my input dataset belongs to (output truncated for brevity): fit$cluster 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 . . . . 1 1 1 1 3 1 1 1 1 2 1 2 1 1 1 1 1 . . . . How do I get probability of being in cluster 1, cluster 2, and cluster 3 for a given subject, e.g datascaled[1,]?How do I get the cluster assigment for a new subject?Thanks,John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for ...{{dropped:16}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Integration in R
Hello, Yes, it's possible to remove the loop. Since the loop is used to compute a running product and all we want is the final result, use the vectorized behavior of R and a final ?prod(). Seedup: another 2x. And 4x2 == 8 == 1 [decimal] order of magnitude. lf2 -function (x) { v-1 x1 - x[1] x2 - x[2] x3 - x[3] x4 - x[4] z1 - exp(x1+x2*dose) z2 - exp(x3+x4*dose) psi0-1/((1+z1)*(1+z2)) psi1-z1*psi0 v - (psi0^y0)*(psi1^y1)*((1-psi0-psi1)^y2) return( prod(v) ) } lf2.c - cmpfun(lf2) Hope this helps, Rui Barradas Em 02-10-2012 18:21, Berend Hasselman escreveu: On 02-10-2012, at 17:23, Naser Jamil jamilnase...@gmail.com wrote: Dear R-users, I am facing problem with integrating in R a likelihood function which is a function of four parameters. It's giving me the result at the end but taking more than half an hour to run. I'm wondering is there any other efficient way deal with. The following is my code. I am ready to provide any other description of my function if you need to move forward. library(cubature) dose-c(2,3,5) y0-c(2,1,0) y1-c(1,1,1) y2-c(0,1,2) lf-function (x) { v-1 for (i in 1:length(dose)) { psi0-1/((1+exp(x[1]+x[2]*dose[i]))*(1+exp(x[3]+x[4]*dose[i]))) psi1-exp(x[1]+x[2]*dose[i])/((1+exp(x[1]+x[2]*dose[i]))*(1+exp(x[3]+x[4]*dose[i]))) v-v*(psi0^y0[i])*(psi1^y1[i])*((1-psi0-psi1)^y2[i]) } return(v) } adaptIntegrate(lf, lowerLimit = c(-20, 0,-20,0), upperLimit = c(0,10,0,10)) There are several things you could do. 1. Use the compiler package to make a compiled version of your function. 2. rewrite your function by putting x[1] etc. in separate variables x1, x2,... so avoiding the [..] indexing. You can do the same for dose[i]. And also compiling this version of the function. 3. do not recompute expressions such as exp(x1+x2*dose.i) several times. Store the result in a temporary variable and use that variable. With these functions library(compiler) lf.c - cmpfun(lf) lf1 -function (x) { v-1 x1 - x[1] x2 - x[2] x3 - x[3] x4 - x[4] for (i in 1:length(dose)) { dose.i - dose[i] z1 - exp(x1+x2*dose.i) z2 - exp(x3+x4*dose.i) psi0-1/((1+z1)*(1+z2)) psi1-z1*psi0 v-v*(psi0^y0[i])*(psi1^y1[i])*((1-psi0-psi1)^y2[i]) } return(v) } lf1.c - cmpfun(lf1) I tested relative speeds with this code (small tolerance and max. function evaluations) library(rbenchmark) f1 - function() adaptIntegrate(lf , lowerLimit = c(-20, 0,-20,0), upperLimit = c(0,10,0,10), tol=1e-3,maxEval=5) f2 - function() adaptIntegrate(lf.c , lowerLimit = c(-20, 0,-20,0), upperLimit = c(0,10,0,10), tol=1e-3,maxEval=5) f3 - function() adaptIntegrate(lf1 , lowerLimit = c(-20, 0,-20,0), upperLimit = c(0,10,0,10), tol=1e-3,maxEval=5) f4 - function() adaptIntegrate(lf1.c, lowerLimit = c(-20, 0,-20,0), upperLimit = c(0,10,0,10), tol=1e-3,maxEval=5) benchmark(z1 - f1(),z2 - f2(), z3 - f3(),z4 - f4(),replications=1) Result: benchmark(z1 - f1(),z2 - f2(), z3 - f3(),z4 - f4(),replications=1) test replications elapsed relative user.self sys.self user.child 1 z1 - f1()1 3.1974.535 3.1770.008 0 2 z2 - f2()1 1.2401.759 1.2350.003 0 3 z3 - f3()1 2.1713.079 2.1670.002 0 4 z4 - f4()1 0.7051.000 0.7000.004 0 As you can see you should be able to get at least a fourfold speedup by using the compiled version of the optimized function. I would certainly set tol and maxEval to something reasonable initially. Checking that z1, z2, z3, and z4 are equal is left to you. Finally, it may even be possible to eliminate the for loop in your function but I'll leave that for someone else. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lattice xyplot, get current level
On Oct 2, 2012, at 3:59 AM, Christof Kluß wrote: Hi xyplot(y ~ x | subject) plots a separate graph of y against x for each level of subject. But I would like to have an own function for each level. Something like xyplot(y ~ x | subject, panel = function(x,y) { panel.xyplot(x,y) panel.curve(x,y) { # something that dependents on the current subject ... } }) This seems to be different question than what you later requested. This question would seem to be documented in the 'panel' section of help(xyplot) in particular the paragraph beginning with this sentence: One can also use functions called panel.number and packet.number, representing panel order and packet order respectively, inside the panel function (as well as the strip function or while interacting with a lattice display using trellis.focus etc). -- David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Integration in R
On 02-10-2012, at 20:01, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Yes, it's possible to remove the loop. Since the loop is used to compute a running product and all we want is the final result, use the vectorized behavior of R and a final ?prod(). Seedup: another 2x. And 4x2 == 8 == 1 [decimal] order of magnitude. lf2 -function (x) { v-1 x1 - x[1] x2 - x[2] x3 - x[3] x4 - x[4] z1 - exp(x1+x2*dose) z2 - exp(x3+x4*dose) psi0-1/((1+z1)*(1+z2)) psi1-z1*psi0 v - (psi0^y0)*(psi1^y1)*((1-psi0-psi1)^y2) return( prod(v) ) } lf2.c - cmpfun(lf2) Hope this helps, Wonderful. It certainly does help. A single nitpick: the v - 1 at the start of the function can now be removed. I got a speedup of 7.5 compared to the very first version lf1. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Integration in R
Hello, Em 02-10-2012 19:18, Berend Hasselman escreveu: On 02-10-2012, at 20:01, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Yes, it's possible to remove the loop. Since the loop is used to compute a running product and all we want is the final result, use the vectorized behavior of R and a final ?prod(). Seedup: another 2x. And 4x2 == 8 == 1 [decimal] order of magnitude. lf2 -function (x) { v-1 x1 - x[1] x2 - x[2] x3 - x[3] x4 - x[4] z1 - exp(x1+x2*dose) z2 - exp(x3+x4*dose) psi0-1/((1+z1)*(1+z2)) psi1-z1*psi0 v - (psi0^y0)*(psi1^y1)*((1-psi0-psi1)^y2) return( prod(v) ) } lf2.c - cmpfun(lf2) Hope this helps, Wonderful. It certainly does help. A single nitpick: the v - 1 at the start of the function can now be removed. Yes, I thought about removing it but in the end forgot to. I got a speedup of 7.5 compared to the very first version lf1. My system is a Windows 7, R 2.15.1. Rui Barradas sessionInfo() R version 2.15.1 (2012-06-22) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252 [3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C [5] LC_TIME=Portuguese_Portugal.1252 attached base packages: [1] compiler stats graphics grDevices utils datasets methods [8] base other attached packages: [1] rbenchmark_1.0.0 cubature_1.1-1 loaded via a namespace (and not attached): [1] fortunes_1.5-0 tools_2.15.1 Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kmeans cluster analysis. How do I (1) determine probability of cluster membership (2) determine cluster membership for a new subject
Ranjan, Thank you for your help. What eludes me is how one computes the distance from each cluster for each subject. For my first subject, datascaled[1,], I have tried to use the following: v1 - sum(fit$centers[1,]*datascaled[1,]) v2 - sum(fit$centers[2,]*datascaled[1,]) v3 - sum(fit$centers[2,]*datascaled[1,]) hoping the max(v1,v2,v3) would reproduce the group assignment, i.e. simply assign the subject to the group that gives the largest value, but it does not. How is the distance to the three clusters computed for each subject? Thanks, John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Ranjan Maitra maitra.mbox.igno...@inbox.com 10/2/2012 1:59 PM John, On Tue, 2 Oct 2012 11:35:12 -0400 John Sorkin jsor...@grecc.umaryland.edu wrote: Window XP R 2.15 I am running a cluster analysis in which I ask for three clusters (see code below). The analysis nicely tells me what cluster each of the subjects in my input dataset belongs to. I would like two pieces of information (1) for every subject in my input data set, what is the probability of the subject belonging to each of the three cluster K-means provides hard clustering, whatever cluster has closest mean gets the assignment. (2) given a new subject, someone who was not in my original dataset, how can I determine their cluster assignment? Look at the distance between the subject the cluster means: the one that is closest gets assigned the cluster. If you are looking for probabilistic clustering (under Gaussian mixture model assumptions), you could use model-based clustering: one R package is mclust. Btw, note that kmeans is very sensitive to initialization (as is mclust): you may want to try several random starts (for kmeans), at the very least. Use the argument nstart with a huge number. HTH, Ranjan Thanks, John # K-Means Cluster Analysis jclusters - 3 fit - kmeans(datascaled, jclusters) # 3 cluster solution and fit$cluster tells me what cluster each observation in my input dataset belongs to (output truncated for brevity): fit$cluster 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 . . . . 1 1 1 1 3 1 1 1 1 2 1 2 1 1 1 1 1 . . . . How do I get probability of being in cluster 1, cluster 2, and cluster 3 for a given subject, e.g datascaled[1,]?How do I get the cluster assigment for a new subject?Thanks,John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -- Important Notice: This mailbox is ignored: e-mails are set to be deleted on receipt. For those needing to send personal or professional e-mail, please use appropriate addresses. FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more! Confidentiality Statement: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Integration in R
Another nitpick: don't use return() in the last statement. It isn't needed, it looks like some other language, and dropping it saves 8% of the time for the uncompiled code (the compiler seems to get of it). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Berend Hasselman Sent: Tuesday, October 02, 2012 11:19 AM To: Rui Barradas Cc: r-help@r-project.org; Naser Jamil Subject: Re: [R] Integration in R On 02-10-2012, at 20:01, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Yes, it's possible to remove the loop. Since the loop is used to compute a running product and all we want is the final result, use the vectorized behavior of R and a final ?prod(). Seedup: another 2x. And 4x2 == 8 == 1 [decimal] order of magnitude. lf2 -function (x) { v-1 x1 - x[1] x2 - x[2] x3 - x[3] x4 - x[4] z1 - exp(x1+x2*dose) z2 - exp(x3+x4*dose) psi0-1/((1+z1)*(1+z2)) psi1-z1*psi0 v - (psi0^y0)*(psi1^y1)*((1-psi0-psi1)^y2) return( prod(v) ) } lf2.c - cmpfun(lf2) Hope this helps, Wonderful. It certainly does help. A single nitpick: the v - 1 at the start of the function can now be removed. I got a speedup of 7.5 compared to the very first version lf1. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kmeans cluster analysis. How do I (1) determine probability of cluster membership (2) determine cluster membership for a new subject
On Tue, 2 Oct 2012 14:32:12 -0400 John Sorkin jsor...@grecc.umaryland.edu wrote: Ranjan, Thank you for your help. What eludes me is how one computes the distance from each cluster for each subject. For my first subject, datascaled[1,], I have tried to use the following: v1 - sum(fit$centers[1,]*datascaled[1,]) v2 - sum(fit$centers[2,]*datascaled[1,]) v3 - sum(fit$centers[2,]*datascaled[1,]) hoping the max(v1,v2,v3) would reproduce the group assignment, i.e. simply assign the subject to the group that gives the largest value, but it does not. How is the distance to the three clusters computed for each subject? Thanks, John Well, it should be: v - vector(length = 3) for (i in 1:3) v[i] - sum((fit$centers[i, ] - datascaled[1, ])^2) whichmin(v) should provide the cluster assignment. Btw, there is a better, more efficient and automated way to do this, i.e. avoid the loop using matrices and arrays and apply, but I have not bothered with that here. Ranjan -- Important Notice: This mailbox is ignored: e-mails are set to be deleted on receipt. For those needing to send personal or professional e-mail, please use appropriate addresses. GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most webmails __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kmeans cluster analysis. How do I (1) determine probability of cluster membership (2) determine cluster membership for a new subject
Thank you! I just wanted to know how one goes from the values returned by kmeans to a distance metric. You have shown me that is simply the squared distance from the centers! Thanks again. John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Ranjan Maitra maitra.mbox.igno...@inbox.com 10/2/2012 2:52 PM On Tue, 2 Oct 2012 14:32:12 -0400 John Sorkin jsor...@grecc.umaryland.edu wrote: Ranjan, Thank you for your help. What eludes me is how one computes the distance from each cluster for each subject. For my first subject, datascaled[1,], I have tried to use the following: v1 - sum(fit$centers[1,]*datascaled[1,]) v2 - sum(fit$centers[2,]*datascaled[1,]) v3 - sum(fit$centers[2,]*datascaled[1,]) hoping the max(v1,v2,v3) would reproduce the group assignment, i.e. simply assign the subject to the group that gives the largest value, but it does not. How is the distance to the three clusters computed for each subject? Thanks, John Well, it should be: v - vector(length = 3) for (i in 1:3) v[i] - sum((fit$centers[i, ] - datascaled[1, ])^2) whichmin(v) should provide the cluster assignment. Btw, there is a better, more efficient and automated way to do this, i.e. avoid the loop using matrices and arrays and apply, but I have not bothered with that here. Ranjan -- Important Notice: This mailbox is ignored: e-mails are set to be deleted on receipt. For those needing to send personal or professional e-mail, please use appropriate addresses. GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most webmails Confidentiality Statement: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R process must die - can I save history?
On 02/10/2012 18:29, Bert Gunter wrote: ?history in a fresh R session, to see what might be possible. I'll bet the answer is, No, you're screwed, though. Nevertheless, maybe Linux experts can save you. Maybe not. On a Unix-alike see ?Signals. If you can find the pid of the R process and it is still running (and not e.g. suspended), kill -USR1 pid will save the workspace and history. May the Force be with you. -- Bert On Tue, Oct 2, 2012 at 10:17 AM, Mike Miller mbmille...@gmail.com wrote: I connected from my desktop Linux box to a Linux server using ssh in an xterm, but that xterm was running in Xvnc. I'm running R on the server in that xterm (over ssh). Something went wrong with Xvnc that has caused it to hang, probably this bug: https://bugs.launchpad.net/ubuntu/+source/vnc4/+bug/819473 So I can't get back to that ssh session or to R. I had done a bunch of work in R but the command history hasn't been written out. If I kill R, I assume the command history is gone. I wish I could somehow cause R to dump the command history. Is there any way to tell the running R process to write the history somewhere? Thanks in advance. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior with weights in binomial glm()
Thank you all for your help and advice. This wasn't quite the answer I was looking for, but these concepts make more sense to me now and I think I should be able to resolve the issues I've been having. Thanks again! On Sun, Sep 30, 2012 at 6:26 PM, David Winsemius dwinsem...@comcast.net wrote: On Sep 30, 2012, at 4:47 AM, Josh Browning wrote: Hi David, Yes, I agree that the results are very similar but I don't understand why they are not exactly equal given that the data sets are identical. And yes, this 1% numerical difference is hugely important to me. I believe you will find it is not important at all. I have another data set (much larger than this toy example) that works on the aggregated data (returning a coefficient of about 1) but returns the warning about perfect separation on the non-aggregated data (and a coefficient of about 1e15). So, I'd at least like to be able to understand where this numerical difference is coming from and, preferably, a way to tweak my glm() runs (possibly adjusting the numerical precision somehow???) so that this doesn't happen. Here. This shows how you can ratchet up your desired precision: dAgg = aggregate( d$RESP, by=list(d$RESP, d$INDEP), FUN=length ) colnames(dAgg) = c(RESP,INDEP,WT) glm( RESP ~ INDEP, family=binomial, data=dAgg, weights=WT, control = glm.control(epsilon = 1e-10, maxit = 50) ) glm( RESP ~ INDEP, family=binomial, data=dAgg, weights=WT, control = glm.control(epsilon = 1e-11, maxit = 30) ) glm( RESP ~ INDEP, family=binomial, data=dAgg, weights=WT, control = glm.control(epsilon = 1e-12, maxit = 50) ) glm( RESP ~ INDEP, family=binomial, data=dAgg, weights=WT, control = glm.control(epsilon = 1e-13, maxit = 50) ) ... and finally get a message that _should_ demonstrate you why I thought asking for greater numerical precision in an extreme case such as this was a Fool's errand. The true value is positive infinity, but numerical precision is only 8 bytes so we could only get to an estimated odds ratio of exp(32.975 ) = 2.09344e+14, which corresponds to odds of 200 trillion to 1. Further increases on the 'maxit' and decreases in 'epsilon' are vigorously ignored, and rightfully so. After you do such analyses long enough you learn that coefficients above 5 are suspicious and those above 10 are usually a sign of an error in data preparation. (I do not think this is an example of complete separation, since the range of the predictors overlap for the two outcomes. But who know I have made (many) errors before.) -- David, Josh On Sat, Sep 29, 2012 at 7:50 PM, David Winsemius dwinsem...@comcast.net wrote: On Sep 29, 2012, at 7:10 AM, Josh Browning wrote: Hi useRs, I'm experiencing something quite weird with glm() and weights, and maybe someone can explain what I'm doing wrong. I have a dataset where each row represents a single case, and I run glm(...,family=binomial) and get my coefficients. However, some of my cases have the exact same values for predictor variables, so I should be able to aggregate up my data frame and run glm(..., family=binomial,weights=wts) and get the same coefficients (maybe this is my incorrect assumption, but I can't see why it would be). Anyways, here's a minimum working example below: d = data.frame( RESP=c(rep(1,5),rep(0,5)), INDEP=c(1,1,1,1,0,0,0,0,0,0) ) glm( RESP ~ INDEP, family=binomial, data=d ) Call: glm(formula = RESP ~ INDEP, family = binomial, data = d) Coefficients: (Intercept)INDEP -1.609 21.176 Degrees of Freedom: 9 Total (i.e. Null); 8 Residual Null Deviance: 13.86 Residual Deviance: 5.407AIC: 9.407 dAgg = aggregate( d$RESP, by=list(d$RESP, d$INDEP), FUN=length ) colnames(dAgg) = c(RESP,INDEP,WT) glm( RESP ~ INDEP, family=binomial, data=dAgg, weights=WT ) Call: glm(formula = RESP ~ INDEP, family = binomial, data = dAgg, weights = WT) Coefficients: (Intercept)INDEP -1.609 20.975 Degrees of Freedom: 2 Total (i.e. Null); 1 Residual Null Deviance: 13.86 Residual Deviance: 5.407AIC: 9.407 Those two results look very similar and it is with a data situation that seems somewhat extreme. The concern is for the 1% numerical difference in the regression coefficient? Am I reading you correctly? -- David Winsemius, MD Alameda, CA, USA David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Possible error in BCa method for confidence intervals in package 'boot'
I'm using R 2.15.1 on a 64-bit machine with Windows 7 Home Premium. Sample problem (screwy subscripted syntax is a relic of edited down a more complex script): N - 25 s - rlnorm(N, 0, 1) require(boot) Loading required package: boot v - NULL # hold sample variance estimates i - 1 v[i] - var(s) # get sample variance nReal - 10 varf - function (x,i) { var(x[i]) } fabc - function (x, w) { # weighted average (biased) variance + sum(x^2 * w) / sum(w) - (sum(x * w) / sum(w))^2 + } p - c(.25, .75, .2, .8, .15, .85, .1, .9, .05, .95, .025, .975, .005, .995) cl - c(0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99) b - boot(s, varf, R = nReal) # bootstrap bv - NULL # hold bootstrap mean variance estimates bias - NULL #hold bias estimates bv[i] - mean(b$t) # bootstrap mean variance bias[i] - bv[i] - v[i] # bias estimate bCI90 - boot.ci(b, conf = 0.90) Error in bca.ci(boot.out, conf, index[1L], L = L, t = t.o, t0 = t0.o, : estimated adjustment 'a' is NA In addition: Warning messages: 1: In norm.inter(t, (1 + c(conf, -conf))/2) : extreme order statistics used as endpoints 2: In boot.ci(b, conf = 0.9) : bootstrap variances needed for studentized intervals 3: In norm.inter(t, alpha) : extreme order statistics used as endpoints nReal - 25 b - boot(s, varf, R = nReal) # bootstrap bv[i] - mean(b$t) # bootstrap mean variance bias[i] - bv[i] - v[i] # bias estimate bCI90 - boot.ci(b, conf = 0.90) Warning messages: 1: In boot.ci(b, conf = 0.9) : bootstrap variances needed for studentized intervals 2: In norm.inter(t, adj.alpha) : extreme order statistics used as endpoints The problem is that doing 10 resamples generates an NA in the estimation of the 'acceleration' in the function abc.ci(), but doing 25 resamples does not. This implies a connection between the number of resamples and the 'acceleration' which should not exist. ('Acceleration' should be obtained from the original sample via jackknife as 1/6 the coefficient of skewness.) Looking at the script for abc.ci(), there is an anomalous reference to 'n' in the invocation line, yet 'n' is not an argument, so must be defined more globally before the call. Yet 'n' is defined within the script as the length of 'data', which is referred to as the 'bootstrap' vector in the comments, yet should be the original sample data. This confusion, plus the use of an argument 'eps' as a default 0.001/n in the calculations makes me suspect the programming in the script. The script apparently works correctly if the number of resamples equals or exceeds the number of original data, but not otherwise. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: r...@lcfltd.com Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Zero Inflated Models - Plotting Results
Dear SamiC, I am also attempting to plot my zero inflated model on my data. Did you find a solution? Does anyone else on this list have a solution? Thanks, Liesl Message from SamiC Jun 30, 2011: I am fitting a zero inflated negative binomial model to my data. I have pretty much got my selected model and now i am wanting to plot the model to the origional data. Example. f8-formula(Birds~Tide+Fish+Zooplankton+Chlorophylla|StratificationIndex) Nb8-zeroinfl(f8, dist=negbin, link= logit, data=ocean) Tide is a factor, so i assume i have to plot a different graph for each level of the factor. what i essentially want to do is plot a graph for each variables against birds with the fitted line of the model. I have been using the predict function, but i get the same trend for every graph and variable. I was reading that the predict function gives a mean across all the values (or something to this effect). Does anyone know how to code for this in R. from the above model i want to plot birds~fish (the original data) and then fit the line of the relationship from the model. -- View this message in context: http://r.789695.n4.nabble.com/Zero-Inflated-Models-Plotting-Results-tp3636373p4644800.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Proposal: Package update log
I'm relatively new to R and would first like to sincerely thank all those who contribute to its development. Thank you. I would humbly like to propose a rule which creates a standard (i.e., strongly encouraged, mandatory, etc.) for authors to include a `change log' documenting specific changes for each update made to their package(s). The more detailed the description, the better; and it would be exceptionally useful if the document were available from the R-console (similar to the help function). In other words, I am suggesting that the `change log' file be included in the package(s) and preferably accessible from the R-console. I am aware that many packages available on CRAN have a `change log' or `news' page. However; not all packages have something like that and many which do, are not terribly detailed in conveying what has been updated or changed. I am also aware that package authors are not a particularly lazy group, sitting around with nothing to do. My proposal would likely add a non-significant amount of work to the already very generous (and appreciated) work performed by package authors, maintainers, etc. I do, however, believe it would be greatly appreciated and beneficial to have more detailed update information available from the R-console as some of us (users) update packages daily and are often left wondering what exactly has been updated. I did not post this to the R-devel list because I consider this proposal more of a documentation issue than a development issue. Also, I would like to see some discussion of this proposal from a varied pool of stakeholders (e.g., users and not just developers, package authors, package maintainers, etc.). Respectfully, Jon Starkweather, PhD University of North Texas Research and Statistical Support http://www.unt.edu/rss/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Integration in R
Hi I am facing a problem of restricting an intercept of systems of equations. Y1=f(X1,X2,X3) Y2=(X1,X2,X4) I want to restrict an intercept of equation 2 equal to coefficient of X2 of equation 1. Please help Dereje From: Naser Jamil jamilnase...@gmail.com To: r-help@r-project.org Sent: Tuesday, October 2, 2012 10:23 AM Subject: [R] Integration in R Dear R-users, I am facing problem with integrating in R a likelihood function which is a function of four parameters. It's giving me the result at the end but taking more than half an hour to run. I'm wondering is there any other efficient way deal with. The following is my code. I am ready to provide any other description of my function if you need to move forward. library(cubature) dose-c(2,3,5) y0-c(2,1,0) y1-c(1,1,1) y2-c(0,1,2) lf-function (x) { v-1 for (i in 1:length(dose)) { psi0-1/((1+exp(x[1]+x[2]*dose[i]))*(1+exp(x[3]+x[4]*dose[i]))) psi1-exp(x[1]+x[2]*dose[i])/((1+exp(x[1]+x[2]*dose[i]))*(1+exp(x[3]+x[4]*dose[i]))) v-v*(psi0^y0[i])*(psi1^y1[i])*((1-psi0-psi1)^y2[i]) } return(v) } adaptIntegrate(lf, lowerLimit = c(-20, 0,-20,0), upperLimit = c(0,10,0,10)) - Thanks for your attention. Kind regards, Jamil. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Parametric effects in GAM
Hello! Can anyone give a tip how to plot parametric effects in an Generalized Additive Model, from mgcv package? Thanks, PM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Integration in R
On 02-10-2012, at 20:50, Dereje Bacha d_ba...@yahoo.com wrote: Hi I am facing a problem of restricting an intercept of systems of equations. Y1=f(X1,X2,X3) Y2=(X1,X2,X4) I want to restrict an intercept of equation 2 equal to coefficient of X2 of equation 1. Please do not hijack a thread/conversation. Do not reply to a thread with a completely different subject. Start a new thread for a new subject. It is completely unclear what you want. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Proposal: Package update log
There is already support in the packaging system for a NEWS file which can be accessed within R using the 'news' function. What would the changelog that you are proposing contribute or contain beyond what the NEWS file already does? Creating and updating NEWS is not mandatory, but is encouraged. Making the update of NEWS mandatory would probably not change anything for the authors already using it, for those that don't it may cause the author to be less likely to share or update their package rather than bother with logging the updates (though whether that is a bad thing or a good thing is a whole different discussion). How would you mandate the update log? just requiring a change could result in a lazy author adding a single line of gibberish or a statement like updated on this date. To enforce something more meaningful would require some form of moderation and place additional burden where there is already a lot of volunteer work happening. If a package is on R-forge (or other similar locations) then you can get a details view of the changes between versions. Packages on CRAN have previous versions available and there are tools that will let you compare the differences in detail. On Tue, Oct 2, 2012 at 11:01 AM, Starkweather, Jonathan jonathan.starkweat...@unt.edu wrote: I'm relatively new to R and would first like to sincerely thank all those who contribute to its development. Thank you. I would humbly like to propose a rule which creates a standard (i.e., strongly encouraged, mandatory, etc.) for authors to include a `change log' documenting specific changes for each update made to their package(s). The more detailed the description, the better; and it would be exceptionally useful if the document were available from the R-console (similar to the help function). In other words, I am suggesting that the `change log' file be included in the package(s) and preferably accessible from the R-console. I am aware that many packages available on CRAN have a `change log' or `news' page. However; not all packages have something like that and many which do, are not terribly detailed in conveying what has been updated or changed. I am also aware that package authors are not a particularly lazy group, sitting around with nothing to do. My proposal would likely add a non-significant amount of work to the already very generous (and appreciated) work performed by package authors, maintainers, etc. I do, however, believe it would be greatly appreciated and beneficial to have more detailed update information available from the R-console as some of us (users) update packages daily and are often left wondering what exactly has been updated. I did not post this to the R-devel list because I consider this proposal more of a documentation issue than a development issue. Also, I would like to see some discussion of this proposal from a varied pool of stakeholders (e.g., users and not just developers, package authors, package maintainers, etc.). Respectfully, Jon Starkweather, PhD University of North Texas Research and Statistical Support http://www.unt.edu/rss/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ffsave problems
Dear R friends. After having some troubles learning how to create a ffdf object, now I find myself having problems saving it. this is the data i´d like to save: str(DATA) List of 3 $ virtual: 'data.frame': 6 obs. of 7 variables: .. $ VirtualVmode : chr double short integer integer ... .. $ AsIs : logi FALSE FALSE FALSE FALSE FALSE FALSE .. $ VirtualIsMatrix : logi FALSE FALSE FALSE FALSE FALSE FALSE .. $ PhysicalIsMatrix : logi FALSE FALSE FALSE FALSE FALSE FALSE .. $ PhysicalElementNo: int 1 2 3 4 5 6 .. $ PhysicalFirstCol : int 1 1 1 1 1 1 .. $ PhysicalLastCol : int 1 1 1 1 1 1 .. - attr(*, Dim)= int 65640757 6 .. - attr(*, Dimorder)= int 1 2 $ physical: List of 6 .. $ fecha: list() .. ..- attr(*, physical)=Class 'ff_pointer' externalptr .. .. ..- attr(*, vmode)= chr double .. .. ..- attr(*, maxlength)= int 65640757 .. .. ..- attr(*, pattern)= chr clone .. .. ..- attr(*, filename)= chr C:/Users/ADMIN-~1/AppData/Local/Temp/RtmpqmcfGU/clone10d86b45b04.ff .. .. ..- attr(*, pagesize)= int 65536 .. .. ..- attr(*, finalizer)= chr delete .. .. ..- attr(*, finonexit)= logi TRUE .. .. ..- attr(*, readonly)= logi FALSE .. .. ..- attr(*, caching)= chr mmnoflush .. ..- attr(*, virtual)= list() .. .. ..- attr(*, Length)= int 65640757 .. .. ..- attr(*, Symmetric)= logi FALSE .. .. ..- attr(*, ramclass)= chr Date .. .. ..- attr(*, class)= chr virtual .. .. - attr(*, class) = chr [1:2] ff_vector ff .. $ juliano : list() .. ..- attr(*, physical)=Class 'ff_pointer' externalptr .. .. ..- attr(*, vmode)= chr short .. .. ..- attr(*, maxlength)= int 65640757 .. .. ..- attr(*, pattern)= chr ff .. .. ..- attr(*, filename)= chr C:/Users/ADMIN-~1/AppData/Local/Temp/RtmpqmcfGU/ff10d829d36fc8.ff .. .. ..- attr(*, pagesize)= int 65536 .. .. ..- attr(*, finalizer)= chr delete .. .. ..- attr(*, finonexit)= logi TRUE .. .. ..- attr(*, readonly)= logi FALSE .. .. ..- attr(*, caching)= chr mmnoflush .. ..- attr(*, virtual)= list() .. .. ..- attr(*, Length)= int 65640757 .. .. ..- attr(*, Symmetric)= logi FALSE .. .. ..- attr(*, class)= chr virtual .. .. - attr(*, class) = chr [1:2] ff_vector ff .. $ altura : list() .. ..- attr(*, physical)=Class 'ff_pointer' externalptr .. .. ..- attr(*, vmode)= chr integer .. .. ..- attr(*, maxlength)= int 65640757 .. .. ..- attr(*, pattern)= chr ff .. .. ..- attr(*, filename)= chr C:/Users/ADMIN-~1/AppData/Local/Temp/RtmpqmcfGU/ff10d8dcd2eca.ff .. .. ..- attr(*, pagesize)= int 65536 .. .. ..- attr(*, finalizer)= chr delete .. .. ..- attr(*, finonexit)= logi TRUE .. .. ..- attr(*, readonly)= logi FALSE .. .. ..- attr(*, caching)= chr mmnoflush .. ..- attr(*, virtual)= list() .. .. ..- attr(*, Length)= int 65640757 .. .. ..- attr(*, Symmetric)= logi FALSE .. .. ..- attr(*, class)= chr virtual .. .. - attr(*, class) = chr [1:2] ff_vector ff .. $ UTM.E: list() .. ..- attr(*, physical)=Class 'ff_pointer' externalptr .. .. ..- attr(*, vmode)= chr integer .. .. ..- attr(*, maxlength)= int 65640757 .. .. ..- attr(*, pattern)= chr ff .. .. ..- attr(*, filename)= chr C:/Users/ADMIN-~1/AppData/Local/Temp/RtmpqmcfGU/ff10d833f74cfa.ff .. .. ..- attr(*, pagesize)= int 65536 .. .. ..- attr(*, finalizer)= chr delete .. .. ..- attr(*, finonexit)= logi TRUE .. .. ..- attr(*, readonly)= logi FALSE .. .. ..- attr(*, caching)= chr mmnoflush .. ..- attr(*, virtual)= list() .. .. ..- attr(*, Length)= int 65640757 .. .. ..- attr(*, Symmetric)= logi FALSE .. .. ..- attr(*, class)= chr virtual .. .. - attr(*, class) = chr [1:2] ff_vector ff .. $ UTM.N: list() .. ..- attr(*, physical)=Class 'ff_pointer' externalptr .. .. ..- attr(*, vmode)= chr integer .. .. ..- attr(*, maxlength)= int 65640757 .. .. ..- attr(*, pattern)= chr ff .. .. ..- attr(*, filename)= chr C:/Users/ADMIN-~1/AppData/Local/Temp/RtmpqmcfGU/ff10d85124f1a.ff .. .. ..- attr(*, pagesize)= int 65536 .. .. ..- attr(*, finalizer)= chr delete .. .. ..- attr(*, finonexit)= logi TRUE .. .. ..- attr(*, readonly)= logi FALSE .. .. ..- attr(*, caching)= chr mmnoflush .. ..- attr(*, virtual)= list() .. .. ..- attr(*, Length)= int 65640757 .. .. ..- attr(*, Symmetric)= logi FALSE .. .. ..- attr(*, class)= chr virtual .. .. - attr(*, class) = chr [1:2] ff_vector ff .. $ fire_ocur: list() .. ..- attr(*, physical)=Class 'ff_pointer' externalptr .. .. ..- attr(*, vmode)= chr byte .. .. ..- attr(*, maxlength)= int 65640757 .. .. ..- attr(*, pattern)= chr ff .. .. ..- attr(*, filename)= chr C:/Users/ADMIN-~1/AppData/Local/Temp/RtmpqmcfGU/ff10d824ef517f.ff .. .. ..- attr(*, pagesize)= int 65536 .. .. ..- attr(*, finalizer)= chr delete .. .. ..- attr(*, finonexit)= logi TRUE .. .. ..- attr(*, readonly)= logi FALSE .. .. ..- attr(*, caching)= chr mmnoflush .. ..- attr(*, virtual)= list() .. .. ..- attr(*, Length)= int 65640757 .. .. ..- attr(*,
Re: [R] Proposal: Package update log
On 10/2/2012 10:01 AM, Starkweather, Jonathan wrote: I'm relatively new to R and would first like to sincerely thank all those who contribute to its development. Thank you. I would humbly like to propose a rule which creates a standard (i.e., strongly encouraged, mandatory, etc.) for authors to include a `change log' documenting specific changes for each update made to their package(s). The more detailed the description, the better; and it would be exceptionally useful if the document were available from the R-console (similar to the help function). In other words, I am suggesting that the `change log' file be included in the package(s) and preferably accessible from the R-console. Let me just address the accessible from the R-console part of this. The infrastructure to support this already exists; check out the news() function. It does require a structured NEWS file, but will make some attempts to figure out the structure (see the documentation). I am aware that many packages available on CRAN have a `change log' or `news' page. However; not all packages have something like that and many which do, are not terribly detailed in conveying what has been updated or changed. More detail is generally nice. However, I'm not sure what CRAN could enforce. A simple check that could be automated is that news() for the package and the version declared in the DESCRIPTION file returns a non-empty result. But I don't know if that would be a net positive (adding another burden for package authors vs giving more detail). Those authors that care probably already do so; those that don't will just have to work around the check without necessarily adding anything useful. I am also aware that package authors are not a particularly lazy group, sitting around with nothing to do. My proposal would likely add a non-significant amount of work to the already very generous (and appreciated) work performed by package authors, maintainers, etc. I do, however, believe it would be greatly appreciated and beneficial to have more detailed update information available from the R-console as some of us (users) update packages daily and are often left wondering what exactly has been updated. I did not post this to the R-devel list because I consider this proposal more of a documentation issue than a development issue. Also, I would like to see some discussion of this proposal from a varied pool of stakeholders (e.g., users and not just developers, package authors, package maintainers, etc.). Respectfully, Jon Starkweather, PhD University of North Texas Research and Statistical Support http://www.unt.edu/rss/ -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health Science University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Having two different versions of a package in the same R installation
Dear list, I am making some comparisons of two versions of the lme4 package: The CRAN version and the R-Forge version. For the moment I have two different R installations, each with a different version of lme4. However it would be convenient if I could have the same version within the same R installation such that I can run a code chunk on one lme4 version and immediately after run the same code chunk on another lme4 version. Does anyone know if this is possible? Best regards Søren __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Having two different versions of a package in the same R installation
So if you have both loaded in the same instance of R, how will R know which version of lmer or other functions that you want to run? It seems cleanest to me to have the 2 different instances of R running like you do now. The other option would be to change all the names (exported ones anyways) in one version and change that package name as well. Then you would have 2 different package names with different exported function names. On Tue, Oct 2, 2012 at 2:18 PM, Søren Højsgaard sor...@math.aau.dk wrote: Dear list, I am making some comparisons of two versions of the lme4 package: The CRAN version and the R-Forge version. For the moment I have two different R installations, each with a different version of lme4. However it would be convenient if I could have the same version within the same R installation such that I can run a code chunk on one lme4 version and immediately after run the same code chunk on another lme4 version. Does anyone know if this is possible? Best regards Søren __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Having two different versions of a package in the same R installation
I don't know if it would work but a kludgy attempt would be to install lme4 from CRAN, rename the lme4 directory in library to lme4cran; then install lme4 from R-forge and rename the lme4 directory to lme4forge. Then create a script flexible script that would copy one of the directories to a directory called lme4 on the fly. I don't know if it would work, but I just wonder if there would possibly a more elegant way. Regards Søren -Original Message- From: Greg Snow [mailto:538...@gmail.com] Sent: 2. oktober 2012 22:27 To: Søren Højsgaard Cc: r-help@r-project.org Subject: Re: [R] Having two different versions of a package in the same R installation So if you have both loaded in the same instance of R, how will R know which version of lmer or other functions that you want to run? It seems cleanest to me to have the 2 different instances of R running like you do now. The other option would be to change all the names (exported ones anyways) in one version and change that package name as well. Then you would have 2 different package names with different exported function names. On Tue, Oct 2, 2012 at 2:18 PM, Søren Højsgaard sor...@math.aau.dk wrote: Dear list, I am making some comparisons of two versions of the lme4 package: The CRAN version and the R-Forge version. For the moment I have two different R installations, each with a different version of lme4. However it would be convenient if I could have the same version within the same R installation such that I can run a code chunk on one lme4 version and immediately after run the same code chunk on another lme4 version. Does anyone know if this is possible? Best regards Søren __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Having two different versions of a package in the same R installation
Install the two versions in two different *libraries* and update .libPaths() to prioritize one over the other. /Henrik On Tue, Oct 2, 2012 at 1:18 PM, Søren Højsgaard sor...@math.aau.dk wrote: Dear list, I am making some comparisons of two versions of the lme4 package: The CRAN version and the R-Forge version. For the moment I have two different R installations, each with a different version of lme4. However it would be convenient if I could have the same version within the same R installation such that I can run a code chunk on one lme4 version and immediately after run the same code chunk on another lme4 version. Does anyone know if this is possible? Best regards Søren __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Install of R-2.15.1 for Windows (64 bit) on application server
Thanks, will go ahead -- View this message in context: http://r.789695.n4.nabble.com/Install-of-R-2-15-1-for-Windows-64-bit-on-application-server-tp4644042p4644829.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Having two different versions of a package in the same R installation
On 02/10/2012 21:38, Søren Højsgaard wrote: I don't know if it would work but a kludgy attempt would be to install lme4 from CRAN, rename the lme4 directory in library to lme4cran; then install lme4 from R-forge and rename the lme4 directory to lme4forge. Then create a script flexible script that would copy one of the directories to a directory called lme4 on the fly. I don't know if it would work, but I just wonder if there would possibly a more elegant way. Do you want them installed or loaded? For installation, simply use different library directories, and adjust .libPaths. You won't succeed in loading them together: the insuperable problem is not the R functions (they could be in different environments) but the DLLs (you can only load one DLL of a given name). We've been here with versioned installs, which were abandoned a long while ago (not least as we never had versioned namespaces). Regards Søren -Original Message- From: Greg Snow [mailto:538...@gmail.com] Sent: 2. oktober 2012 22:27 To: Søren Højsgaard Cc: r-help@r-project.org Subject: Re: [R] Having two different versions of a package in the same R installation So if you have both loaded in the same instance of R, how will R know which version of lmer or other functions that you want to run? It seems cleanest to me to have the 2 different instances of R running like you do now. The other option would be to change all the names (exported ones anyways) in one version and change that package name as well. Then you would have 2 different package names with different exported function names. On Tue, Oct 2, 2012 at 2:18 PM, Søren Højsgaard sor...@math.aau.dk wrote: Dear list, I am making some comparisons of two versions of the lme4 package: The CRAN version and the R-Forge version. For the moment I have two different R installations, each with a different version of lme4. However it would be convenient if I could have the same version within the same R installation such that I can run a code chunk on one lme4 version and immediately after run the same code chunk on another lme4 version. Does anyone know if this is possible? Best regards Søren __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Proposal: Package update log
One other similar location is Github, where you can watch a package, and this is how I keep track of changes in the packages that I'm interested in. Just for the interest of other R package developers, the NEWS file can be written in Markdown and I have a Makefile (https://github.com/yihui/knitr/blob/master/Makefile) to convert Markdown to R's standard NEWS format, e.g. https://github.com/yihui/knitr/blob/master/NEWS.md - (via make news) https://github.com/yihui/knitr/blob/master/NEWS If you do not belong to the plain-text-is-everything party, you may consider this format. Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: 515-294-2465 Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA On Tue, Oct 2, 2012 at 2:47 PM, Greg Snow 538...@gmail.com wrote: There is already support in the packaging system for a NEWS file which can be accessed within R using the 'news' function. What would the changelog that you are proposing contribute or contain beyond what the NEWS file already does? Creating and updating NEWS is not mandatory, but is encouraged. Making the update of NEWS mandatory would probably not change anything for the authors already using it, for those that don't it may cause the author to be less likely to share or update their package rather than bother with logging the updates (though whether that is a bad thing or a good thing is a whole different discussion). How would you mandate the update log? just requiring a change could result in a lazy author adding a single line of gibberish or a statement like updated on this date. To enforce something more meaningful would require some form of moderation and place additional burden where there is already a lot of volunteer work happening. If a package is on R-forge (or other similar locations) then you can get a details view of the changes between versions. Packages on CRAN have previous versions available and there are tools that will let you compare the differences in detail. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Having two different versions of a package in the same R installation
On Tue, Oct 2, 2012 at 4:18 PM, Søren Højsgaard sor...@math.aau.dk wrote: Dear list, I am making some comparisons of two versions of the lme4 package: The CRAN version and the R-Forge version. For the moment I have two different R installations, each with a different version of lme4. However it would be convenient if I could have the same version within the same R installation such that I can run a code chunk on one lme4 version and immediately after run the same code chunk on another lme4 version. Does anyone know if this is possible? Install the two versions into two different R libraries: install.packages(my.package, repos = ..., lib = ...) # 1st repo/lib install.packages(my.package, repos = ..., lib = ...) # 2nd repo/lib library(my.package, lib.loc = ...) # 1st lib # use it detach(my.package, unload = TRUE) # next line may or may not be needed library.dynam.unload(my.package, system.file(package = my.package)) library(my.package, lib.loc = ...) # 2nd lib -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Having two different versions of a package in the same R installation
On 2 October 2012 at 20:18, Søren Højsgaard wrote: | I am making some comparisons of two versions of the lme4 package: The CRAN version and the R-Forge version. For the moment I have two different R installations, each with a different version of lme4. However it would be convenient if I could have the same version within the same R installation such that I can run a code chunk on one lme4 version and immediately after run the same code chunk on another lme4 version. Does anyone know if this is possible? If you use different directories both versions can be installed at the same time. The one being found first when going down .libPaths() is the default, the other can be loaded by supplying the path to library(). Dirk -- Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R process must die - can I save history?
thanks Dr. R. this will come in handy in the future as I have a knack for hanging R. On Oct 2, 2012 12:01 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On 02/10/2012 18:29, Bert Gunter wrote: ?history in a fresh R session, to see what might be possible. I'll bet the answer is, No, you're screwed, though. Nevertheless, maybe Linux experts can save you. Maybe not. On a Unix-alike see ?Signals. If you can find the pid of the R process and it is still running (and not e.g. suspended), kill -USR1 pid will save the workspace and history. May the Force be with you. -- Bert On Tue, Oct 2, 2012 at 10:17 AM, Mike Miller mbmille...@gmail.com wrote: I connected from my desktop Linux box to a Linux server using ssh in an xterm, but that xterm was running in Xvnc. I'm running R on the server in that xterm (over ssh). Something went wrong with Xvnc that has caused it to hang, probably this bug: https://bugs.launchpad.net/**ubuntu/+source/vnc4/+bug/**819473https://bugs.launchpad.net/ubuntu/+source/vnc4/+bug/819473 So I can't get back to that ssh session or to R. I had done a bunch of work in R but the command history hasn't been written out. If I kill R, I assume the command history is gone. I wish I could somehow cause R to dump the command history. Is there any way to tell the running R process to write the history somewhere? Thanks in advance. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~**ripley/http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Annotate a segmented linear regression plot
On 28 September 2012 16:38, David Winsemius dwinsem...@comcast.net wrote: ?text # should be fairly clear. Thank you. I was stupid to ask such a trivial question along with a not-so-trivial one. The second part of the question was probably more important: is there a way to obtain the location of segments produced by the segmented package, so that I can annotate them easily? I have since asked this question of Vito, the package's author. Here is his response: # dear Ben, here a possible solution, o is the segmented fit r-o$rangeZ[,1] est.psi-o$psi[,2] v-sort(c(r, est.psi)) xCoord-rowMeans(cbind(v[-length(v)], v[-1])) Z-o$model[,o$nameUV$Z] id-sapply(xCoord, function(x)which.min(abs(x-Z))) yCoord-broken.line(o)[id] plot(o, col=2:4, res=T) text(xCoord, yCoord, labels=formatC(slope(o)[[1]][,1], digits=2), pos=4, cex=1.3) Play with pos, cex and digits to modify the appearance of the labels. vito # Please learn to post in plain text. I hope this is better. Ben. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] add values in one column getting range from other column
Hi, My dataframe has two columns one with area and other with percent. How can i add the areas that are within a range of percentage?? My dataframe looks like Area Percent 456 0 3400 10 79 25 56 18 467 0 67 67 839 85 1120 0 3482 85 I want to add the area for values whose percent is 0, 0-25, 25-50, 50-75, 75. How can I do that??? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Proposal: Package update log
Thanks to all for the responses and suggestions. I was primarily proposing a more detailed change log for packages on CRAN. To my mind, repositories like R-forge host packages more 'raw' than those on CRAN (i.e. CRAN seems to me to contain more 'finished' packages which occasionally are updated or added-to). Also, some packages on R-forge do not contain any information regarding changes/updates [I'm hesitant to offer an example because I'm really just a lemming in terms of R community stature...]. I guess what I'm saying is, the news(package = yourPackageHere) function is not particularly useful currently because (in my *limited* experience) very few packages contain the news file and those which do, do not contain much in the way of description. Perhaps I'm being a bit too ambitious here, but I would just like to be able to see what has been changed (and why; if possible) each time a package is updated. It would seem, from my rather rudimentary understanding, that using current TeX/LaTeX based tools for the basis of package documentation lends itself to having a better, more organized, change log or news file based on the package manual table of contents (toc). For instance, it would be great if we had something like: news(package = yourPackageHere, function = functionOfInterest) which could display a log of changes/updates sequentially for the named function of interest. Admittedly, I have not created a package myself, but I do have some experience with LaTeX and it may be as simple as changing the preamble to existing TeX file templates or style files. In terms of enforcement; yes I agree it would require more work from the package authors, as well as managers/moderators of CRAN; but, if we expect each package to have a working help file, then why not a (meaningfully) working 'news' file. Respectfully, Jon Starkweather, PhD University of North Texas Research and Statistical Support http://www.unt.edu/rss/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Efficient Way to gather data from various files
Hello, Sorry if this process is too simple for this list. I know I can do it, but I always read online about how when using R one should always try to avoid loops and use vectors. I am wondering if there exists a more R friendly way to do this than to use for loops. I have a dataset that has a list of IDs. Let's call this dataset Master Each of these IDs has an associated DBF file. The DBF files each have the same title, and they are each located in a directory path that includes, as one of the folder names, the ID. These DBF files have 2 columns of interest. One is the run number the other is the statistic. I'm interested in the median and 90th percentile of the statistic as well as their corresponding run numbers. Ultimately, I want a table that consists of ID Run_50th Stat_50 Run_90 Stat_90 1AB 5102010 3 144376 1AC 399 6 9 etc. Where I currently have a dataset that has ID 1AB 1AC etc. And there are several DBF files that are in folders i.e. folder1/1AC/folder2/blah.dbf This dbf looks like run Stat 1 10 2 10 3 99 4 1000 5 1 6 99 7 1 8 10 9 10 1010 11 100 I know i could do this with a loop, but I can't see the efficient, R way. I was hoping that you experienced R programmers could give me some pointers on the most efficient way to achieve this result. Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R process must die - can I save history?
Agreed -- very cool trick. Thanks Prof Ripley Michael On Oct 2, 2012, at 9:59 PM, steven mosher mosherste...@gmail.com wrote: thanks Dr. R. this will come in handy in the future as I have a knack for hanging R. On Oct 2, 2012 12:01 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On 02/10/2012 18:29, Bert Gunter wrote: ?history in a fresh R session, to see what might be possible. I'll bet the answer is, No, you're screwed, though. Nevertheless, maybe Linux experts can save you. Maybe not. On a Unix-alike see ?Signals. If you can find the pid of the R process and it is still running (and not e.g. suspended), kill -USR1 pid will save the workspace and history. May the Force be with you. -- Bert On Tue, Oct 2, 2012 at 10:17 AM, Mike Miller mbmille...@gmail.com wrote: I connected from my desktop Linux box to a Linux server using ssh in an xterm, but that xterm was running in Xvnc. I'm running R on the server in that xterm (over ssh). Something went wrong with Xvnc that has caused it to hang, probably this bug: https://bugs.launchpad.net/**ubuntu/+source/vnc4/+bug/**819473https://bugs.launchpad.net/ubuntu/+source/vnc4/+bug/819473 So I can't get back to that ssh session or to R. I had done a bunch of work in R but the command history hasn't been written out. If I kill R, I assume the command history is gone. I wish I could somehow cause R to dump the command history. Is there any way to tell the running R process to write the history somewhere? Thanks in advance. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~**ripley/http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] svyby and make.formula
Hello, Although my R code for the svymean () and svyquantile () functions works fine, I am stuck with the svyby () and make.formula () functions. I got the following error messages. - Error: object of type 'closure' is not subsettable # svyby () - Error in xx[[1]] : subscript out of bounds# make.formula () A reproducible example is appended below. I would appreciate if someone could help me. Thank you in advance. Pradip Muhuri Below is a reproducible example ## setwd (E:/RDATA) library (survey) xd1 - dthage ypll_ler ypll_75 xspd2 psu stratum wt8 mortstat NA NANA 2 1 1 1683.73870 NA NANA 2 1 1 640.89500 NA NANA 2 1 1 714.06620 NA NANA 2 1 1 714.06620 NA NANA 2 1 1 530.52630 NA NANA 2 1 1 2205.28630 NA NANA 2 1 339 1683.73870 NA NANA 2 1 339 640.89500 NA NANA 2 1 339 714.06620 NA NANA 2 1 339 714.06620 NA NANA 2 1 339 530.52630 NA NANA 2 1 339 2205.28630 788.817926 0 2 2 1 592.3100 1 809.291881 0 2 2 1 1014.7387 1 875.001076 0 2 2 1 853.4763 1 875.001076 0 2 2 1 505.1475 1 885.510514 0 2 2 1 1429.5963 1 788.817926 0 2 2 339 592.31001 809.291881 0 2 2 339 1014.73871 875.001076 0 2 2 339 853.47631 875.001076 0 2 2 339 505.14751 885.510514 0 2 2 339 1429.59631 788.817926 0 2 2 339 592.31001 809.291881 0 2 2 339 1014.73871 875.001076 0 2 2 339 853.47631 875.001076 0 2 2 339 505.14751 885.510514 0 2 2 339 1429.59631 newdata - read.table (textConnection(xd1), header=TRUE, as.is=TRUE) dim (newdata) # make the grouping variable (xspd)2 newdata$xspd2 - factor(newdata$xspd2,levels=c (1,2),labels=c('SPD', 'No SPD'), ordered=TRUE) nhis - svydesign (id=~psu,strat=~stratum, weights=~wt8, data=newdata, nest=TRUE) # mean age at death - nationwide svymean( ~dthage, data=nhis , subset (nhis, mortstat==1)) # mean by SPD status svyby(~dthage, ~xspd2 , design=nhis, svymean ) #percentile svyquantile(~dthage, data = nhis , subset (nhis, mortstat==1), c( 0 , .25 , .5 , .75 , 1 ) ) # percentile by SPD status svyby(~dthage, ~xspd2, desin=nhis, svyquantile, c( 0 , .25 , .5 , .75 , 1 ), keep.var = F) # mean for each of the 3 variables vars - names(nhis) %in% c(dthage, ypll_ler, ypl_75) vars svymean(make.formula(vars),nhis,subset (nhis, mortstat==1), na.rm=TRUE) # Pradip K. Muhuri Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] svyboxplot - library (survey)
Hi Thomas, Thank you so much for your help. Pradip From: Thomas Lumley [tlum...@uw.edu] Sent: Monday, October 01, 2012 6:45 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: Anthony Damico; R help Subject: Re: [R] svyboxplot - library (survey) The documentation says The grouping variable in svyboxplot, if present, must be a factor -thomas On Tue, Oct 2, 2012 at 4:28 AM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Dear Anthony, Yes, I can follow the example code you have given. But, do you know from the code shown below (following Thomas Lumley's Complex Surveys) why I am getting the boxplot of dthage for just xspd=1, not xspd2=2? My intent is the make this code work so that I can generate similar plots on other continuous variable. Any help will be appreciated. Thanks, Pradip nhis - svydesign (id=~psu, strat=~stratum, weights=~wt8, data=tor, nest=TRUE) svyboxplot (dthage~xspd2, subset (nhis, mortstat==1), col=gray80, varwidth=TRUE, ylab=Age at Death, xlab=SPD Status: 1-SPD, 2=No SPD) Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.govhttp://cbhsqsurvey.samhsa.gov/ From: Anthony Damico [mailto:ajdam...@gmail.com] Sent: Monday, October 01, 2012 10:07 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R help Subject: Re: [R] svyboxplot - library (survey) using a slight modification of the example shown in ?svyboxplot # load survey library library(survey) # load example data data(api) # create an example svydesign dstrat - svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat, fpc = ~fpc) # set the plot window to display 1 plot x 2 plots par(mfrow=c(1,2)) # generate two example boxplots svyboxplot(enroll~stype,dstrat,all.outliers=TRUE) svyboxplot(enroll~1,dstrat) # done # alternative: not as nice # set the plot window to display 2 plots x 1 plot par(mfrow=c(2,1)) # generate two example boxplots svyboxplot(enroll~stype,dstrat,all.outliers=TRUE) svyboxplot(enroll~1,dstrat) # done On Mon, Oct 1, 2012 at 9:50 AM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote: Hello, I have used the library (survey) package for boxplots using the following code. Could anyone please tell me why I am getting only 1 boxplot instead of 2 boxplots (1-SPD, 2-No SPD). What changes in the following code would be required to get 2 boxplots in the same plot frame? Thanks, Pradip ### nhis - svydesign (id=~psu, strat=~stratum, weights=~wt8, data=tor, nest=TRUE) svyboxplot (dthage~xspd2, subset (nhis, mortstat==1), col=gray80, varwidth=TRUE, ylab=Age at Death, xlab=SPD Status: 1-SPD, 2=No SPD) Pradip K. Muhuri Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov vide commented, minimal, self-contained, reproducible code. __ R-help@r-project.orgmailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R process must die - can I save history? [SOLVED]
This is a solution for UNIX/Linux-type OS users and a lot of it is only related to R in the sense that it can allow you to reconnect to an R session. I managed to do it and to save history, thanks to helpful messages from Ista Zahn and Brian Ripley. To explain this, I will call my two Linux boxes Desktop and Server. I logged into Desktop, an Ubuntu box, and ran this command: ps aux | grep ssh That showed me these ssh processes and a few extraneous things: USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND mbmiller 3520 0.0 0.0 46944 4668 pts/5S+ Jul25 0:23 ssh -X Server mbmiller 4602 0.0 0.0 47720 5544 pts/9S+ Aug09 1:07 ssh -X Server mbmiller 25614 0.0 0.0 45584 3344 pts/11 S+ Sep24 0:00 ssh -X Server I launched an xterm from which I would try to recapture those ssh sessions. (Spoiler: This worked for every ssh process.) I was missing the reptyr binary, so I installed it like so: sudo apt-get install reptyr The reptyr man page told me that I had to do this to allow reptyr to work (I could change the 0 back to 1 after finishing): $ sudo su root# echo 0 /proc/sys/kernel/yama/ptrace_scope root# exit After that I just ran reptyr for each process, for example: reptyr 3520 A few lines of text would appear (the last saying Set the controlling tty), I'd hit enter and it would give my my command prompt from my old shell, or the R prompt if R was running in that shell. I was then able to save command histories, etc. When I tried to exit from R using q(yes) it accepted the command, but it did not return my bash prompt. To deal with that, I tried Brian Ripley's recommendation: I logged into Server via ssh and ran this command: ps aux | grep R Which returned this along with some irrelevant stuff: USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND mbmiller 5156 0.0 0.1 213188 9244 pts/1S+ Aug06 1:00 /share/apps/R-2.15.1/lib64/R/bin/exec/R -q I tried the kill command... kill -USR1 5156 ...and that returned my bash prompt immediately in the other xterm. R was done, the .Rhistory looked perfect, and .RData also was there. I logged into Server, went to the appropriate directory, ran R and found that all of the objects were there and working correctly. So that was amazing. I could reattach to the R session and also kill it without losing history and data. This is a big deal for me because I get stuck like that about once a year and it's always a huge pain. Mike On Tue, 2 Oct 2012, Ista Zahn wrote (off-list): If you can find the process ID you can try connecting the process to another terminal with reptyr (https://github.com/nelhage/reptyr), and then just use savehistory() as usual. You don't say what flavor of Linux you're dealing with, but it looks like there are packaged versions for at least Ubuntu, Fedora, and Arch. On Tue, 2 Oct 2012, Prof Brian Ripley wrote: Maybe not. On a Unix-alike see ?Signals. If you can find the pid of the R process and it is still running (and not e.g. suspended), kill -USR1 pid will save the workspace and history. Original query: On Tue, 2 Oct 2012, Mike Miller wrote: I connected from my desktop Linux box to a Linux server using ssh in an xterm, but that xterm was running in Xvnc. I'm running R on the server in that xterm (over ssh). Something went wrong with Xvnc that has caused it to hang, probably this bug: https://bugs.launchpad.net/ubuntu/+source/vnc4/+bug/819473 So I can't get back to that ssh session or to R. I had done a bunch of work in R but the command history hasn't been written out. If I kill R, I assume the command history is gone. I wish I could somehow cause R to dump the command history. Is there any way to tell the running R process to write the history somewhere? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Annotate a segmented linear regression plot
On Oct 2, 2012, at 3:38 PM, Ben Harrison wrote: On 28 September 2012 16:38, David Winsemius dwinsem...@comcast.net wrote: ?text # should be fairly clear. Thank you. I was stupid to ask such a trivial question along with a not-so-trivial one. The second part of the question was probably more important: is there a way to obtain the location of segments produced by the segmented package, so that I can annotate them easily? I guess it depends on what you mean by the location of segments. It's easy enough to retrun the x-values for the knots: o$psi[ , Est.] I do not see a `predicted.segmented` function, so might need to construct separate lm.predict estimates for values between the pairs of x-values if you meant you wanted the y-values for the knots. -- David. I have since asked this question of Vito, the package's author. Here is his response: # dear Ben, here a possible solution, o is the segmented fit r-o$rangeZ[,1] est.psi-o$psi[,2] v-sort(c(r, est.psi)) xCoord-rowMeans(cbind(v[-length(v)], v[-1])) Z-o$model[,o$nameUV$Z] id-sapply(xCoord, function(x)which.min(abs(x-Z))) yCoord-broken.line(o)[id] plot(o, col=2:4, res=T) text(xCoord, yCoord, labels=formatC(slope(o)[[1]][,1], digits=2), pos=4, cex=1.3) Play with pos, cex and digits to modify the appearance of the labels. vito # Please learn to post in plain text. I hope this is better. Ben. David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] add values in one column getting range from other column
Hi, I guess this is what you wanted: dat1-read.table(text= Area Percent 456 0 3400 10 79 25 56 18 467 0 67 67 839 85 1120 0 3482 85 ,sep=,header=TRUE) aggregate(dat1$Percent, list(Area = dat1[,Area],Range=cut(dat1 $Percent,breaks=c(-Inf,0, 25, 50, 75, 100), labels=c(=0, 0-25, 25-50, 50-75, 75))),function(x) x) # Area Range x #1 456 =0 0 #2 467 =0 0 #3 1120 =0 0 #4 56 0-25 18 #5 79 0-25 25 #6 3400 0-25 10 #7 67 50-75 67 #8 839 75 85 #9 3482 75 85 A.K. - Original Message - From: Sapana Lohani lohani.sap...@ymail.com To: R help r-help@r-project.org Cc: Sent: Tuesday, October 2, 2012 5:25 PM Subject: [R] add values in one column getting range from other column Hi, My dataframe has two columns one with area and other with percent. How can i add the areas that are within a range of percentage?? My dataframe looks like Area Percent 456 0 3400 10 79 25 56 18 467 0 67 67 839 85 1120 0 3482 85 I want to add the area for values whose percent is 0, 0-25, 25-50, 50-75, 75. How can I do that??? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Efficient Way to gather data from various files
Hello, There are more R friendly ways to do what you want, it seems to me easy to avoid loops but you need to tell us how do you know which rows correspond to the 50th and 90th quantiles. Maybe this comes from the value in some other column? Give a bit more complete a description and we'll see what can be done. Hope this helps, Rui Barradas Em 03-10-2012 00:33, Sam Asin escreveu: Hello, Sorry if this process is too simple for this list. I know I can do it, but I always read online about how when using R one should always try to avoid loops and use vectors. I am wondering if there exists a more R friendly way to do this than to use for loops. I have a dataset that has a list of IDs. Let's call this dataset Master Each of these IDs has an associated DBF file. The DBF files each have the same title, and they are each located in a directory path that includes, as one of the folder names, the ID. These DBF files have 2 columns of interest. One is the run number the other is the statistic. I'm interested in the median and 90th percentile of the statistic as well as their corresponding run numbers. Ultimately, I want a table that consists of ID Run_50th Stat_50 Run_90 Stat_90 1AB 5102010 3 144376 1AC 399 6 9 etc. Where I currently have a dataset that has ID 1AB 1AC etc. And there are several DBF files that are in folders i.e. folder1/1AC/folder2/blah.dbf This dbf looks like run Stat 1 10 2 10 3 99 4 1000 5 1 6 99 7 1 8 10 9 10 1010 11 100 I know i could do this with a loop, but I can't see the efficient, R way. I was hoping that you experienced R programmers could give me some pointers on the most efficient way to achieve this result. Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] add values in one column getting range from other column
HI, Infact, there is no need for aggregate(). dat1$range-cut(dat3$Percent,breaks=c(-Inf,0,25,50,75,100),labels=c(=0,0-25,25-50,50-75,75)) dat1 # Area Percent range #1 456 0 =0 #2 3400 10 0-25 #3 79 25 0-25 #4 56 18 0-25 #5 467 0 =0 #6 67 67 50-75 #7 839 85 75 #8 1120 0 =0 #9 3482 85 75 A.K. - Original Message - From: Sapana Lohani lohani.sap...@ymail.com To: R help r-help@r-project.org Cc: Sent: Tuesday, October 2, 2012 5:25 PM Subject: [R] add values in one column getting range from other column Hi, My dataframe has two columns one with area and other with percent. How can i add the areas that are within a range of percentage?? My dataframe looks like Area Percent 456 0 3400 10 79 25 56 18 467 0 67 67 839 85 1120 0 3482 85 I want to add the area for values whose percent is 0, 0-25, 25-50, 50-75, 75. How can I do that??? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] add values in one column getting range from other column
Hello, Maybe this time you've got it wrong, Arun. The op wants to sum the areas, not just label them. Using your code, Range=cut(dat1$Percent, breaks=c(-Inf,0, 25, 50, 75, 100), labels=c(=0, 0-25, 25-50, 50-75, 75)) aggregate(Area ~ Range, data = dat1, FUN = sum) # Range Area #1 =0 2043 #2 0-25 3535 #3 50-75 67 #4 75 4321 Hope this helps, Rui Barradas Em 03-10-2012 01:43, arun escreveu: Hi, I guess this is what you wanted: dat1-read.table(text= Area Percent 456 0 3400 10 79 25 56 18 467 0 67 67 83985 1120 0 3482 85 ,sep=,header=TRUE) aggregate(dat1$Percent, list(Area = dat1[,Area],Range=cut(dat1 $Percent,breaks=c(-Inf,0, 25, 50, 75, 100), labels=c(=0, 0-25, 25-50, 50-75, 75))),function(x) x) # Area Range x #1 456 =0 0 #2 467 =0 0 #3 1120 =0 0 #4 56 0-25 18 #5 79 0-25 25 #6 3400 0-25 10 #7 67 50-75 67 #8 839 75 85 #9 3482 75 85 A.K. - Original Message - From: Sapana Lohani lohani.sap...@ymail.com To: R help r-help@r-project.org Cc: Sent: Tuesday, October 2, 2012 5:25 PM Subject: [R] add values in one column getting range from other column Hi, My dataframe has two columns one with area and other with percent. How can i add the areas that are within a range of percentage?? My dataframe looks like Area Percent 456 0 3400 10 79 25 56 18 467 0 67 67 83985 1120 0 3482 85 I want to add the area for values whose percent is 0, 0-25, 25-50, 50-75, 75. How can I do that??? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] svyby and make.formula
please double-check that you've got all of your parameters correct by typing ?svymean ?svyby and ?make.formula before you send questions to r-help :) # you spelled design wrong and probably need to throw out your NA values. try this # percentile by SPD status svyby(~dthage, ~xspd2, design=nhis, svyquantile, c( 0 , .25 , .5 , .75 , 1 ), keep.var = F, na.rm = TRUE) # mean for each of the 3 variables # this returns a logical vector, but make.formula requires a character vector vars - names(nhis) %in% c(dthage, ypll_ler, ypl_75) vars svymean(make.formula(vars),nhis,subset (nhis, mortstat==1), na.rm=TRUE) # create a character vector instead # note you also spelled the third variable wrong-- it will break unless you correct that vars - c(dthage, ypll_ler, ypll_75) # this statement has two survey design parameters, which won't work. which one do you want to use? svymean(make.formula(vars),nhis,subset (nhis, mortstat==1), na.rm=TRUE) # pick one svymean(make.formula(vars),nhis, na.rm=TRUE) svymean(make.formula(vars),subset(nhis, mortstat==1), na.rm=TRUE) # all of the variables in vars are NA whenever mortstat isn't 1, so they give the same results On Tue, Oct 2, 2012 at 7:51 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hello, Although my R code for the svymean () and svyquantile () functions works fine, I am stuck with the svyby () and make.formula () functions. I got the following error messages. - Error: object of type 'closure' is not subsettable # svyby () - Error in xx[[1]] : subscript out of bounds# make.formula () A reproducible example is appended below. I would appreciate if someone could help me. Thank you in advance. Pradip Muhuri Below is a reproducible example ## setwd (E:/RDATA) library (survey) xd1 - dthage ypll_ler ypll_75 xspd2 psu stratum wt8 mortstat NA NANA 2 1 1 1683.73870 NA NANA 2 1 1 640.89500 NA NANA 2 1 1 714.06620 NA NANA 2 1 1 714.06620 NA NANA 2 1 1 530.52630 NA NANA 2 1 1 2205.28630 NA NANA 2 1 339 1683.73870 NA NANA 2 1 339 640.89500 NA NANA 2 1 339 714.06620 NA NANA 2 1 339 714.06620 NA NANA 2 1 339 530.52630 NA NANA 2 1 339 2205.28630 788.817926 0 2 2 1 592.3100 1 809.291881 0 2 2 1 1014.7387 1 875.001076 0 2 2 1 853.4763 1 875.001076 0 2 2 1 505.1475 1 885.510514 0 2 2 1 1429.5963 1 788.817926 0 2 2 339 592.31001 809.291881 0 2 2 339 1014.73871 875.001076 0 2 2 339 853.47631 875.001076 0 2 2 339 505.14751 885.510514 0 2 2 339 1429.59631 788.817926 0 2 2 339 592.31001 809.291881 0 2 2 339 1014.73871 875.001076 0 2 2 339 853.47631 875.001076 0 2 2 339 505.14751 885.510514 0 2 2 339 1429.59631 newdata - read.table (textConnection(xd1), header=TRUE, as.is=TRUE) dim (newdata) # make the grouping variable (xspd)2 newdata$xspd2 - factor(newdata$xspd2,levels=c (1,2),labels=c('SPD', 'No SPD'), ordered=TRUE) nhis - svydesign (id=~psu,strat=~stratum, weights=~wt8, data=newdata, nest=TRUE) # mean age at death - nationwide svymean( ~dthage, data=nhis , subset (nhis, mortstat==1)) # mean by SPD status svyby(~dthage, ~xspd2 , design=nhis, svymean ) #percentile svyquantile(~dthage, data = nhis , subset (nhis, mortstat==1), c( 0 , .25 , .5 , .75 , 1 ) ) # percentile by SPD status svyby(~dthage, ~xspd2, desin=nhis, svyquantile, c( 0 , .25 , .5 , .75 , 1 ), keep.var = F) # mean for each of the 3 variables vars - names(nhis) %in% c(dthage, ypll_ler, ypl_75) vars svymean(make.formula(vars),nhis,subset (nhis, mortstat==1), na.rm=TRUE) # Pradip K. Muhuri Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail:
Re: [R] Efficient Way to gather data from various files
File operations are not vectorizable. About the only thing you can do for the iterating through files part might be to use lapply instead of a for loop, but that is mostly a style change. Once you have read the dbf files there will probably be vector functions you can use (quantile). Off the top of my head I don't know a function that tells you which value corresponds to a particular quantile, but you can probably sort the data with order(), find the value whose ecdf is just below your target with which.max, and look at the row number of that value. x - rnorm(11) names(x) - seq(x) xs - x[order(x)] Row90 - as.numeric(names (xs)[0.9=seq(xs)/length(xs))]) --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Sam Asin asin@gmail.com wrote: Hello, Sorry if this process is too simple for this list. I know I can do it, but I always read online about how when using R one should always try to avoid loops and use vectors. I am wondering if there exists a more R friendly way to do this than to use for loops. I have a dataset that has a list of IDs. Let's call this dataset Master Each of these IDs has an associated DBF file. The DBF files each have the same title, and they are each located in a directory path that includes, as one of the folder names, the ID. These DBF files have 2 columns of interest. One is the run number the other is the statistic. I'm interested in the median and 90th percentile of the statistic as well as their corresponding run numbers. Ultimately, I want a table that consists of ID Run_50th Stat_50 Run_90 Stat_90 1AB 5102010 3 144376 1AC 399 6 9 etc. Where I currently have a dataset that has ID 1AB 1AC etc. And there are several DBF files that are in folders i.e. folder1/1AC/folder2/blah.dbf This dbf looks like run Stat 1 10 2 10 3 99 4 1000 5 1 6 99 7 1 8 10 9 10 1010 11 100 I know i could do this with a loop, but I can't see the efficient, R way. I was hoping that you experienced R programmers could give me some pointers on the most efficient way to achieve this result. Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] svyby and make.formula
Dear Anthony, Thank you very much for helping me resolve the issues. I now got all the results, which I intended to generate. Pradip Muhuri From: Anthony Damico [ajdam...@gmail.com] Sent: Tuesday, October 02, 2012 9:50 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R help Subject: Re: svyby and make.formula please double-check that you've got all of your parameters correct by typing ?svymean ?svyby and ?make.formula before you send questions to r-help :) # you spelled design wrong and probably need to throw out your NA values. try this # percentile by SPD status svyby(~dthage, ~xspd2, design=nhis, svyquantile, c( 0 , .25 , .5 , .75 , 1 ), keep.var = F, na.rm = TRUE) # mean for each of the 3 variables # this returns a logical vector, but make.formula requires a character vector vars - names(nhis) %in% c(dthage, ypll_ler, ypl_75) vars svymean(make.formula(vars),nhis,subset (nhis, mortstat==1), na.rm=TRUE) # create a character vector instead # note you also spelled the third variable wrong-- it will break unless you correct that vars - c(dthage, ypll_ler, ypll_75) # this statement has two survey design parameters, which won't work. which one do you want to use? svymean(make.formula(vars),nhis,subset (nhis, mortstat==1), na.rm=TRUE) # pick one svymean(make.formula(vars),nhis, na.rm=TRUE) svymean(make.formula(vars),subset(nhis, mortstat==1), na.rm=TRUE) # all of the variables in vars are NA whenever mortstat isn't 1, so they give the same results On Tue, Oct 2, 2012 at 7:51 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote: Hello, Although my R code for the svymean () and svyquantile () functions works fine, I am stuck with the svyby () and make.formula () functions. I got the following error messages. - Error: object of type 'closure' is not subsettable # svyby () - Error in xx[[1]] : subscript out of bounds# make.formula () A reproducible example is appended below. I would appreciate if someone could help me. Thank you in advance. Pradip Muhuri Below is a reproducible example ## setwd (E:/RDATA) library (survey) xd1 - dthage ypll_ler ypll_75 xspd2 psu stratum wt8 mortstat NA NANA 2 1 1 1683.73870 NA NANA 2 1 1 640.89500 NA NANA 2 1 1 714.06620 NA NANA 2 1 1 714.06620 NA NANA 2 1 1 530.52630 NA NANA 2 1 1 2205.28630 NA NANA 2 1 339 1683.73870 NA NANA 2 1 339 640.8950tel:339%20%C2%A0640.89500 NA NANA 2 1 339 714.0662tel:339%20%C2%A0714.06620 NA NANA 2 1 339 714.0662tel:339%20%C2%A0714.06620 NA NANA 2 1 339 530.5263tel:339%20%C2%A0530.52630 NA NANA 2 1 339 2205.28630 788.817926 0 2 2 1 592.3100 1 809.291881 0 2 2 1 1014.7387 1 875.001076 0 2 2 1 853.4763 1 875.001076 0 2 2 1 505.1475 1 885.510514 0 2 2 1 1429.5963 1 788.817926 0 2 2 339 592.3100tel:339%20%C2%A0592.31001 809.291881 0 2 2 339 1014.73871 875.001076 0 2 2 339 853.4763tel:339%20%C2%A0853.47631 875.001076 0 2 2 339 505.1475tel:339%20%C2%A0505.14751 885.510514 0 2 2 339 1429.59631 788.817926 0 2 2 339 592.3100tel:339%20%C2%A0592.31001 809.291881 0 2 2 339 1014.73871 875.001076 0 2 2 339 853.4763tel:339%20%C2%A0853.47631 875.001076 0 2 2 339 505.1475tel:339%20%C2%A0505.14751 885.510514 0 2 2 339 1429.59631 newdata - read.table (textConnection(xd1), header=TRUE, as.ishttp://as.is=TRUE) dim (newdata) # make the grouping variable (xspd)2 newdata$xspd2 - factor(newdata$xspd2,levels=c (1,2),labels=c('SPD', 'No SPD'), ordered=TRUE) nhis - svydesign (id=~psu,strat=~stratum, weights=~wt8, data=newdata, nest=TRUE) # mean age at death - nationwide svymean( ~dthage, data=nhis , subset (nhis, mortstat==1)) # mean by SPD status svyby(~dthage, ~xspd2 , design=nhis, svymean ) #percentile svyquantile(~dthage, data = nhis , subset (nhis, mortstat==1), c( 0 , .25 , .5 , .75 , 1 ) ) # percentile by SPD status svyby(~dthage, ~xspd2,
Re: [R] help: ks test fit Poisson-ness (D and p) with one sample data
for a silly question, wondering how to test fit with the one sample as follow. I have read _fitting distributions with R_, but that doesn't answer my specific question. inclined to use Kolmogorov-Smirnov D, and its associative p value. much appreciation! X20.001 232 93 84 185 336 417 228 199 2110 1411 612 1913 1314 3015 2516 3517 3518 2219 3420 1721 3022 5023 6024 9825 6226 6227 6728 5429 5830 583110732 7933 5634 6535 8936111371063811339 8840 9841 9242 4243 1944 545 1146 1547 1548 2449 350 251 052 453 154 9 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] calculating gelman diagnostic for mice object
I am using -mice- for multiple imputation and would like to use the gelman diagnostic in -coda- to assess the convergence of my imputations. However, gelman.diag requires an mcmc list as input. van Buuren and Groothuis-Oudshoorn (2011) recommend running mice step-by-step to assess convergence (e.g. imp2 - mice.mids(imp1, maxit = 3, print = FALSE) ) but this creates mids objects. How can I convert the mids objects into an mcmc list? Or is the step-by-step approach wrong here? thanks, Rachel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.