Re: [R] lme: subject-specific slopes.
John: I think you want the output from coef(fitRIRT). The ranef(fitRIRT) will give you the subject specific random effect deviations from the fixed effects means. The coef(fitRIRT) will give you the combination of the fixed effect means with the subject specific random effect deviations and, thus, in the scale you are expecting. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 From: John Sorkin jsor...@grecc.umaryland.edu To: r-help@r-project.org, Kenneth Frost kfr...@wisc.edu Date: 12/04/2012 09:10 AM Subject: Re: [R] lme: subject-specific slopes. Sent by: r-help-boun...@r-project.org Ken, Thank you for your help. ranef(fitRIRT) does not give me what I expect. The subject-specific slopes, and subject-specific intercepts are not anywhere close to what I would expect them to be; the mean of the subject-specfic values should be close to those reported by summary(fitRIRT) and they are not. As you will see by examining the material below, the subject-specific slopes are off by many order of magnitude. The intercepts are also far from the value reported in summary(fitRIRT). Do you have any thoughts? Thanks, John fitRIRT - lme(echogen~time,random=~ 1+time|subject,data=repeatdata,na.action=na.omit) summary(fitRIRT)Linear mixed-effects model fit by REML Data: repeatdata AIC BIClogLik 495.097 507.2491 -241.5485 Random effects: Formula: ~1 + time | subject Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr (Intercept) 1.917511e+01 (Intr) time2.032276e-04 0 Residual1.044601e+01 Fixed effects: echogen ~ time Value Std.Error DF t-value p-value (Intercept) 64.54864 4.258235 32 15.158543 0. time 0.35795 0.227080 32 1.576307 0.1248 Correlation: (Intr) time -0.242 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -1.61362755 -0.52710871 0.02948008 0.41793322 1.77340082 Number of Observations: 58 Number of Groups: 25 ranef(fitRIRT) (Intercept) time 1-3.278112 2.221016e-09 2 -35.400618 4.314995e-08 311.493110 -6.797543e-09 4 -16.209586 -7.070834e-08 5 3.585227 -2.389705e-08 6 1.614320 -1.967700e-09 7 8.346905 5.827094e-08 830.917812 -3.768584e-08 9-0.394101 -9.158251e-09 104.437509 -4.057971e-08 11 31.956597 -2.126275e-08 12 41.567402 -4.853942e-08 13 -10.723993 1.307152e-08 14 -4.554837 5.551908e-09 15 -4.554501 4.815086e-08 16 13.296985 -3.743967e-08 17 -8.255439 1.733238e-08 18 -21.317239 2.203885e-08 19 -13.480159 2.194016e-08 20 -13.044766 2.269168e-08 21 11.639198 -1.418706e-08 22 -27.457388 -1.154099e-08 232.194001 -5.509119e-09 24 -3.992646 7.682188e-08 251.614320 -1.967700e-09 John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Kenneth Frost kfr...@wisc.edu 12/4/2012 10:44 AM I think this might be close to what you want. ranef(fitRIRT) On 12/04/12, John Sorkin wrote: I am running a random intercept random slope regression: fitRIRT - lme(echogen~time,random=~ 1+time|subject,data=repeatdata,na.action=na.omit) summary(fitRIRT) I would like to get the subject-specific slopes, i.e. the slope that the model computes for each subject. If I have 10-subjects I should have 10-slopes. I don't see the slope when I look at the items listed in names(summary(fitRIRT) nor when I look at the items listed in names(fitRIRT) Thanks John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Kenneth Frost Graduate Research Assistant - Dept. of Plant Pathology University of Wisconsin - Madison Lab
[R] quantreg installation and conflicts with R 2.15.2
I recently lost the partitions on my hard drive (second time in 6 months) so I had to have our IT folks image all my files over to a new drive. I completely reinstalled R (now 2.15.2) and all my libraries to my computer (Dell Latitude running Windows 7). A few of my previous workspaces (created with R 2.14.1) can't be restored, reporting an error similar to the one I get when I try to load quantreg package which requires SparseM (see below). So, not only will quantreg not load but some of my workspaces can't be restored when being loaded (see below). Not sure about what this is about. I asked Roger Koenker, the package maintainer, but he is on travel and won't have chance to seriously investigate this for awhile. So I thought I would put it out to the R community and see if anyone has any suggestions about why this conflict might be occurring. library(quantreg) Loading required package: SparseM Error : object ?kronecker? is not exported by 'namespace:methods' Error: package ?SparseM? could not be loaded or Error: object ?kronecker? is not exported by 'namespace:methods' During startup - Warning message: unable to restore saved data in C:\CADESTUFF\DATA\BarryNoon\.RData Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fw: quantreg installation and conflicts with R 2.15.2
Just noticed that I get a similar error about object 'kronecker' in Matrix package when trying to load lme4. So this is a more pervasive problem. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 - Forwarded by Brian S Cade/BRD/USGS/DOI on 11/30/2012 02:07 PM - From: Brian S Cade/BRD/USGS/DOI To: R-help@r-project.org Date: 11/30/2012 01:16 PM Subject: quantreg installation and conflicts with R 2.15.2 I recently lost the partitions on my hard drive (second time in 6 months) so I had to have our IT folks image all my files over to a new drive. I completely reinstalled R (now 2.15.2) and all my libraries to my computer (Dell Latitude running Windows 7). A few of my previous workspaces (created with R 2.14.1) can't be restored, reporting an error similar to the one I get when I try to load quantreg package which requires SparseM (see below). So, not only will quantreg not load but some of my workspaces can't be restored when being loaded (see below). Not sure about what this is about. I asked Roger Koenker, the package maintainer, but he is on travel and won't have chance to seriously investigate this for awhile. So I thought I would put it out to the R community and see if anyone has any suggestions about why this conflict might be occurring. library(quantreg) Loading required package: SparseM Error : object ?kronecker? is not exported by 'namespace:methods' Error: package ?SparseM? could not be loaded or Error: object ?kronecker? is not exported by 'namespace:methods' During startup - Warning message: unable to restore saved data in C:\CADESTUFF\DATA\BarryNoon\.RData Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves
Another alternative is to put the data in a linear model structure (1 column for the response, another column for an indicator variable indicating group) and estimate all possible quantile regressions with rq() in quantreg package using a model with y ~ intercept + indicator (0,1) variable for group. The estimated quantiles for the intercept will be the quantiles of the ecdf for one group and the estimated quantiles for the indicator grouping variable will be the differences in quantiles (ecdf) between the two groups. There is useful built in graphing capability in quantreg with the plot.rqs() function. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 From: user1234 mehenderso...@gmail.com To: r-help@r-project.org Date: 10/05/2012 06:46 AM Subject: Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves Sent by: r-help-boun...@r-project.org Rui, Your response nearly answered a similar question of mine except that I also have ecdfs of different lengths. Do you know how I can adjust x - seq(min(loga, logb), max(loga, logb), length.out=length(loga)) to account for this? It must be in length.out() but I'm unsure how to proceed. Any advice is much appreciated. -L Rui Barradas wrote Hello, Try the following. (i've changed the color of the first ecdf.) loga - log10(a+1) # do this logb - log10(b+1) # only once f.a - ecdf(loga) f.b - ecdf(logb) # (2) max distance D x - seq(min(loga, logb), max(loga, logb), length.out=length(loga)) x0 - x[which( abs(f.a(x) - f.b(x)) == max(abs(f.a(x) - f.b(x))) )] y0 - f.a(x0) y1 - f.b(x0) plot(f.a, verticals=TRUE, do.points=FALSE, col=blue) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) ## alternatine, use standard R plot of ecdf #plot(f.a, col=blue) #lines(f.b, col=green) points(c(x0, x0), c(y0, y1), pch=16, col=red) segments(x0, y0, x0, y1, col=red, lty=dotted) ## alternative, down to x axis #segments(x0, 0, x0, y1, col=red, lty=dotted) Hope this helps, Rui Barradas maxbre wrote Hi all, given this example #start a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430) length(a) b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 3220,490,20790,290,740,5350,940,3910,0,640,850,260) length(b) out-ks.test(log10(a+1),log10(b+1)) # max distance D out$statistic f.a-ecdf(log10(a+1)) f.b-ecdf(log10(b+1)) plot(f.a, verticals=TRUE, do.points=FALSE, col=red) plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) #inverse of ecdf a x.a-get(x, environment(f.a)) y.a-get(y, environment(f.a)) # inverse of ecdf b x.b-get(x, environment(f.b)) y.b-get(y, environment(f.b)) #end I want to plot the max distance between the two ecdf curves as in the above given chart Is that possible and how? Thanks for your help PS: this is an amended version of a previous thread (but no reply followed) that I?ve deleted from Nabble repository because I realised it was not enough clear (now I hope it?s a little better, sorry for that) -- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4645140.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Imputing data below detection limit
The below detection limit issue is similar to survival analysis with censoring (but left rather than right censoring). So many survival estimation approaches are thus appropriate for analyses with below detection limits (see NADA package, also censored quantile regression in quantreg package, etc). Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 From: Bert Gunter gunter.ber...@gene.com To: Jessica Streicher j.streic...@micromata.de Cc: r-help@r-project.org Date: 08/13/2012 09:28 AM Subject: Re: [R] Imputing data below detection limit Sent by: r-help-boun...@r-project.org Yes, Jessica, the practice -- of which I also have been and continue to be guilty -- does not really make a lot of sense. It usually doesn't affect estimation all that much, but it can certainly mess up inference. The proper approach is to use the proper approach: model it as left-censored data. The problem with that is: 1. It's complicated, and is beyond the statistical background of most folks who deal with such data -- it's a ubiquitous issue in science and engineering after all. 2. Typically, the LOD isn't: that is, there often is not a well defined value and that which is chosen is both arbitrary and inaccurate. What one often sees is an increasing loss of relative precision as one approaches the designated value. Modeling this effectively gets even more complicated. David Rocke and colleagues has published methodology on this, mostly in TECHNOMETRICS if memory serves. 3. So, as in other situations, we muddle along with rather crude statistical approaches and hope that they are adequate. Probably in most circumstances they are, but ... Cheers, Bert On Mon, Aug 13, 2012 at 1:15 AM, Jessica Streicher j.streic...@micromata.de wrote: Tempting a use of let me google that for you.. Anyway, theres a package called Imputation. I myself used the zoo package. There are probably lots of others since its a real common problem. They usually fill in places in you data that are designated as NA. I do not completely understand what you mean with detection limit. If you do not have NAs, but rather some kind of threshold, i'd suggest going over the data and filling any applicable values with NAs, then use the library of your choice. I find that kind of weird though, if you haven't detected much you haven't detected much. Its part of the data, why impute? On 11.08.2012, at 23:01, aynumazi wrote: Hello, I'm trying to impute data below detection limit (with multiple detection limits) so i need just a method or a code for imputation and then extract the complete dataset to do the analyses. Is there any package which could do that simply as i'm a beginner in R Thank you -- View this message in context: http://r.789695.n4.nabble.com/Imputing-data-below-detection-limit-tp4640057.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] quantreg Wald-Test
Stefan: Your comparison of two models, one with a nested reduced set of parameters by fixing the others at zero, with anova() can be used to make inference either based on a likelihood ratio form of test or the rankscore test for a given quantile (see ?anova.rq and the vignette for literature citations with details on the test statistics). So this comparison corresponds to the typical linear model hypothesis of a set of parameters equaling zero. I prefer the rankscore test if sample sizes are not too large (e.g., 5,000). Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 From: stefan23 stefan.vo...@uni-konstanz.de To: r-help@r-project.org Date: 07/28/2012 10:32 PM Subject: [R] quantreg Wald-Test Sent by: r-help-boun...@r-project.org Dear all, I know that my question is somewhat special but I tried several times to solve the problems on my own but I am unfortunately not able to compute the following test statistic using the quantreg package. Well, here we go, I appreciate every little comment or help as I really do not know how to tell R what I want it to do^^ My situation is as follows: I have a data set containing a (dependent) vector Y and the regressor X. My aim is to check whether the two variables do not granger-cause each other in quantiles. I started to compute via quantreg for a single tau:= q: rq(Y_t~Y_(t-1)+Y_(t-2)+...+X_(t-1)+X_(t-2)+...,tau=q) This gives me the quantile regression coefficients. Now I want to check whether all the coefficients of X are equal to zero (for this specific tau). Can I do this by applying rq.anova ? I have already asked a similiar question but I am not sure if anova is really calculating this for me.. Currently I am calculating fitunrestricted=rq(Y_t~Y_(t-1)+Y_(t-2)+...+X_(t-1)+X_(t-2)+...,tau=q) fitrestrited=rq(Y_t~Y_(t-1)+Y_(t-2)+...,tau=q) anova(fitrestricted,fitunrestricted) If this is correct can you tell me how the test value is calculated in this case, or in other words: My next step is going to replicate this procedure for a whole range of quantiles (say for tau in [a,b]). To apply a sup-Wald-test I am wondering if it is correct to choose the maximum of the different test values and to simulate the critical values by using the data tabulated in Andrees(1993) (or simulate the vectors of independent Brownian Motions)...please feel free to comment, I am really looking forward to your help! Thank you very much Cheers Stefan -- View this message in context: http://r.789695.n4.nabble.com/quantreg-Wald-Test-tp4638198.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Higher log-likelihood in null vs. fitted model
Had several instances of similar issues with weighted leasts squares and weighted logistic regression recently. It is important to remember that the log likelihoods for the weighted models include a term that is a function of the n weights and that these are not incorporated in deviances (or sums of squares for ls regression). Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 From: Andrew Miles rstuff.mi...@gmail.com To: Mark Leeds marklee...@gmail.com Cc: r-help@r-project.org Date: 05/31/2012 04:09 PM Subject: Re: [R] Higher log-likelihood in null vs. fitted model Sent by: r-help-boun...@r-project.org Interesting. If you use the deviances printed out by the fitted model (including the null deviance) and back-engineer it to log-likelihoods, you get results identical to Stata. m$deviance*(-.5) [1] -390.9304 m$null.deviance*(-.5) [1] -393.064 However, using the deviance to calculate the AIC by hand does not get you the AIC that the model returns. m$deviance + 2*2 [1] 785.8608 m$aic [1] 764.3816 The difference seems to be in how the function glm.fit() calculates the AIC that it returns, and which the function logLik() uses to return the log-likelihood, as Mark indicated. The function is: binomial()$aic function (y, n, mu, wt, dev) { m - if (any(n 1)) n else wt -2 * sum(ifelse(m 0, (wt/m), 0) * dbinom(round(m * y), round(m), mu, log = TRUE)) } On the other hand, glm() calculates the deviance based on the sum of the deviance residuals. This gives me a sense for how log-likelihoods calculated using the deviance and model AIC could differ (i.e., different ways of calculating them), but it is still not clear to me *why *the two approaches differ. And they only differ when I use weights with the data. More importantly, it makes me wonder which is more reliable - in my case, it seems the deviance residuals approach is more reliable in that a) it gives a sensible result (i.e., a model with a predictor has a higher log-likelihood than the null model), and b) it matches an independent estimation performed in Stata. But is that always the case? And if so, why use the other approach at all? On Thu, May 31, 2012 at 9:55 AM, Mark Leeds marklee...@gmail.com wrote: Hi Duncan: I don't know if the following can help but I checked the code and logLik defines the log likelihood as (p - glmobject$aic/2) where p is the glmobject$rank. So, the reason for the likelihood being less is that, in the null, it ends up being ( 1 - glmobject$aic/2) and in the other one it ends up being ( 2 - glmobject$aic/2). so 2 - 764.4/2 = -380.2 and 1 - 761.9/2 = -379.95 ( close enough for govt work ) So, that's where the #'s are coming from but it really depends on how AIC is defined. Likelihoods should not involve degrees of freedom ( atleast not where they make likelihood less like in the above example ) so maybe backing the likelihood out using AIC is the issue ? ( AIC = -2 * likelihood + 2p so p - AIC/2 = likelihood). AIC is a function of the likelihood but , as far as I know, likelihood is not a function of the AIC. Thanks for any insight. On Thu, May 31, 2012 at 9:26 AM, Duncan Murdoch murdoch.dun...@gmail.comwrote: On 12-05-31 8:53 AM, Andrew Miles wrote: Two related questions. First, I am fitting a model with a single predictor, and then a null model with only the intercept. In theory, the fitted model should have a higher log-likelihood than the null model, but that does not happen. See the output below. My first question is, how can this happen? I suspect you'll need to give sample data before anyone can really help with this. m Call: glm(formula = school ~ sv_conform, family = binomial, data = dat, weights = weight) Coefficients: (Intercept) sv_conform -2.5430 0.2122 Degrees of Freedom: 1488 Total (i.e. Null); 1487 Residual Null Deviance:786.1 Residual Deviance: 781.9 AIC: 764.4 null Call: glm(formula = school ~ 1, family = binomial, data = dat, weights = weight) Coefficients: (Intercept) -2.532 Degrees of Freedom: 1488 Total (i.e. Null); 1488 Residual Null Deviance:786.1 Residual Deviance: 786.1 AIC: 761.9 logLik(m); logLik(null) 'log Lik.' -380.1908 (df=2) 'log Lik.' -379.9327 (df=1) My second question grows out of the first. I ran the same two model on the same data in Stata and got identical coefficients. However, the log-likelihoods were different than the one's I got in R, and followed my expectations - that is, the null model has a lower log-likelihood than the fitted model. See the Stata model comparison below. So my question is, why do identical models fit in R and Stata have different log-likelihoods? That's easier: they use different base measures. The likelihood is only
Re: [R] Function to compute multi-response, multi-rater kappa?
Luk: Don't know if this solves your desire for an implementation in R, but the most general extension of Cohen's kappa for testing agreement that I'm aware of are the extensions made by using multi-response randomized block permutation procedures (MRBP) developed by Pual Mielke and Ken Berry. They calculate a generalized measure of agreement that can be applied to nominal, ordinal, or continuous data. I know it is used for computing and testing Cohen's kappa for multiple raters with nominal data. But I'm unsure whether it is easily applied to multiple responses at the same time as multiple raters but it might be. Might check out Mielke and Berry (2007. Permutation Methods, 2nd ed, pp 150-166). We distribute a package of permutation software (Blossom) that computes all the multiresponse permutation procedure family of statistics including MRBP ( we're in the process of porting it to an R package but it will be several months before its ready to go). There are versions of MRPP (less complete) available in the vegan package for R, but I don't know whether it will do the randomized block variant required for Cohen's kappa. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 From: Luk Arbuckle luk.arbuc...@gmail.com To: David Winsemius dwinsem...@comcast.net Cc: r-help@r-project.org Date: 02/01/2012 08:37 PM Subject: Re: [R] Function to compute multi-response, multi-rater kappa? Sent by: r-help-boun...@r-project.org Although interesting, Dave, this doesn't fit my problem. I want to measure the percentage agreement corrected for chance using an extension of kappa. The example data presented in the paper you linked to is considering an ordinal measure (ranked preference), whereas I'm looking to measure correlation between a nominal measure (agreement between non-ordered categories). The paper by Kraemer is cited over 100 times in Google Scholar, mostly in the health sciences, so I'm surprised it's not implemented in R. But I suppose this is a niche problem (multi-response version of kappa), or that there is some other extension to kappa, maybe in the social sciences, that I'm not aware of. Cheers, Luk Arbuckle On Wed, Feb 1, 2012 at 17:13, David Winsemius wrote: Searching on multiple raters attributes at the same site brings up http://finzi.psych.upenn.edu/R/library/smacof/doc/smacof.pdf (by Jan De Leeuw) Which has as one example multiple raters scoring different breads. On Feb 1, 2012, at 4:41 PM, Luk Arbuckle wrote: Thanks David, but those are not multi-response versions of the kappa. Extensions to multiple raters are common and well known. I am hoping someone familiar with multiple response extensions of kappa might see my post and be able to help. As I said, my search on cran has failed. I tried all the expected keywords, and looked through several kappa functions, but I don't see any that deal with the multi-response case as I've described it. Either it isn't available in R, or I'm looking in the wrong place. I did not intentionally double post, nor try to deceive your efforts to block double posting. I am not receiving my posts, contrary to my settings, so I rewrote the first one. I thought maybe it was blocked the first time because someone thought it wasn't an R question, so I changed the subject. Cheers, Luk Arbuckle On Wed, Feb 1, 2012 at 16:25, David Winsemius wrote: On Feb 1, 2012, at 3:13 PM, Luk Arbuckle wrote: I'm very sorry for double posting! My r-help setting Receive your own posts to the list? is set to Yes, and Mail delivery is Enabled. Yet I did not get a copy of my post (this message is a reply from my sent mail). I only learned of the double posting when I found it copied in an r-help archive. Again, my apologies. I actually had a chance to prevent that second posting. It looked familiar when viewed in the moderation queue and I took a quick look at what was in my inbox but since you used a different subject line my search failed. Speaking of searching ... you are asked in the Posting Guide (that no one reads) to post the specifics of your own efforts. My first search with multi-rater kappa failed. My second search is here: http://search.r-project.org/**cgi-bin/namazu.cgi?query=** multiple+raters+kappamax=100**result=normalsort=score** idxname=functionsidxname=**Rhelp08idxname=Rhelp10**idxname=Rhelp02 http://search.r-project.org/cgi-bin/namazu.cgi?query=multiple+raters+kappamax=100result=normalsort=scoreidxname=functionsidxname=Rhelp08idxname=Rhelp10idxname=Rhelp02 Having gotten more than one apparently on-target result with relatively minor effort, I see no point in my expending even more time. -- David. Luk Arbuckle On Wed, Feb 1, 2012 at 13:47, Luk Arbuckle wrote: I'm looking for a function in R that extends kappa
Re: [R] Quantreg-rq crashing trouble
Steve: I would guess that the problem relates to the large number of tied values of -1 in your dependent y variable. You could randomly jitter these y = -1 by adding a random uniform number between, say, [ -0.01, 0.01] and see if the rq() converges to a solution. Then you would know that was the numeric computing issue. Then the question would be what to do next? Seems like a funny data problem with a point mass of responses at -1. Perhaps only higher quantiles, say 0.80, are going to give usable estimates. Perhaps the jittered responses can yield a reasonably interpreted estimate. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 From: Steven R Corsi srco...@usgs.gov To: r-help@r-project.org Date: 07/21/2011 04:04 PM Subject: [R] Quantreg-rq crashing trouble Sent by: r-help-boun...@r-project.org Hi I am using the quantreg package for median regression for a large series of subsets of data. It works fabulously for all but one subset. When it reaches this subset, R takes the command and never responds. I end up having to kill R and restart it. It appears to be something with the particular data subset, but I can't pinpoint the problem. Here are some details Operating system: Windows 7 R version: 2.12.1 Here is the data and the rq command that gives me trouble: library(quantreg) x - c(-0.340778085786686,-0.573639751645382,-0.663932762810308,-0.438591328531796,0.302202380883637,-0.675558868120683,-0.764547425063882,-0.751796238115147,-0.481835451050657,-0.588287304540034,-0.622315341312595,-0.542777491991884,-0.552343921339062,-0.587743299883,-0.758233854317935,-0.783134744819092,-0.97774093234124,0.859832969267456,0.69037126308323,0.185409334523753,-0.432951955490942,-0.988120972598647,0.243223425575187) y- c(2.35531739878456,-1,2.26484142532915,-1,2.86895579641295,2.6655997506336,-1,1.33021078457153,-1,-1,-1,1.8263340056,1.60831204269733,-1,2.45313479655685,-1,-1,-1,-1,-1,-1,-1,-1) fit1 - rq(y ~ x, tau = .5) Any feedback would be greatly appreciated. Thanks Steve -- === Steven R. CorsiPhone: (608) 821-3835 Research Hydrologist email: srco...@usgs.gov U.S. Geological Survey Wisconsin Water Science Center 8505 Research Way Middleton, WI 53562 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] quantile regression: out of memory error
Using tau = -1 is causing rq() to try and estimate all possible quantiles and store the results. With 11253 observations this would be a formidable feat. Try estimating the model with say tau = 1:99/100 to give a more tractable number of estimates. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 From: Prew, Paul paul.p...@ecolab.com To: r-help@R-project.org r-help@r-project.org Date: 07/11/2011 11:42 AM Subject: [R] quantile regression: out of memory error Sent by: r-help-boun...@r-project.org Hello, I?m wondering if anyone can offer advice on the out-of-memory error I?m getting. I?m using R2.12.2 on Windows XP, Platform: i386-pc-mingw32/i386 (32-bit). I am using the quantreg package, trying to perform a quantile regression on a dataframe that has 11,254 rows and 5 columns. object.size(subsetAudit.dat) 450832 bytes str(subsetAudit.dat) 'data.frame': 11253 obs. of 5 variables: $ Satisfaction : num 0.64 0.87 0.78 0.75 0.83 0.75 0.74 0.8 0.89 0.78 ... $ Return : num 0.84 0.92 0.91 0.89 0.95 0.81 0.9 0.87 0.95 0.88 ... $ Recommend: num 0.53 0.64 0.58 0.58 0.62 0.6 0.56 0.7 0.64 0.65 ... $ Cust.Clean : num 0.75 0.85 0.72 0.72 0.81 0.79 0.79 0.8 0.78 0.75 ... $ CleanScore.Cycle1: num 96.7 83.3 93.3 86.7 96.7 96.7 90 80 81.7 86.7 ... rq(subsetAudit.dat$Satisfaction ~ subsetAudit.dat$CleanScore.Cycle1, tau = -1) ERROR: cannot allocate vector of size 2.8 Gb I don?t know much about computers ? software, hardware, algorithms ? but does this mean that the quantreg package is creating some massive intermediate vector as it performs the rq function? Quantile regression is something I?m just starting to explore, but believe it involves ordering data prior to the regression, which could be a huge job when using 11,000 records. Does bigmemory have functionality to help me with this? Thank you, Paul Paul Prew ? Statistician 651-795-5942 ? fax 651-204-7504 Ecolab Research Center ? Mail Stop ESC-F4412-A 655 Lone Oak Drive ? Eagan, MN 55121-1560 CONFIDENTIALITY NOTICE: \ This e-mail communication an...{{dropped:20}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fisher exact for 2x2 table
Rob: Fisher's exact test is conceptually possible for any r x c contingency table problem and uses the observed multinomial table probability as the test statistic. Other tests for r x c contingency tables use a different test statistic (Chi-squared, likelihood ratio, Zelterman's). It is possible that the probabilities for any of these procedures may differ slightly for the same table configuration even if the probabilities for each test are calculated by enumerating all possible permutations (hypergeometric) under the null hypothesis. See Mielke and Berry 2007 (Permutation Methods: A distance function approach) Chps 6 and7. Mielke has provided efficient Fortran algorithms for enumerating the exact probabilities for 2x2, 3x2, 4x2, 5x2, 6x2 ,3x3,and even 2x2x2 tables for Fisher's exact and Chi-square statistics. I don't remember whether Cyrus Meta's algorithms for Fisher's exact can do more.But the important point to keep in mind is that it is possible to use different statistics for evaluating the same null hypothesis for r x c tables (Fisher's exact uses one form, Chi-square uses another, etc.) and the probabilities can be computed by exact enumeration of all permutations (what people expect Fisher's exact to do but also possible for Chi-square statistic) or by some approximation (asymptotic distribution, Monte Carlo resampling). The complete enumeration of test statistics under the null becomes computationally intractable for large dimension r x c problems whether using the observed table probability (like Fisher's exact) as a test statistic or other like Chi-square statistic. So in short, yes you can use Fisher's exact on your 4 x 2 problem, and the result might differ from using a Chi-square statistic even if you compute the P-value for the Chi-square test by complete enumeration. Note that the minimum expected cell size for the Chi-square test is related to whether the Chi-square distributional approximation (an asymptotic argument) for evaluating the Chi-square statistic will be reasonable and is irrelevant if you calculate your probabilities by exact enumeration of all permutations. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 From: viostorm rob.sch...@gmail.com To: r-help@r-project.org Date: 04/29/2011 01:23 PM Subject: Re: [R] fisher exact for 2x2 table Sent by: r-help-boun...@r-project.org After I shared comments form the forum yesterday with the biostatistician he indicated this: Fisher's exact test is the non-parametric analog for the Chi-square test for 2x2 comparisons. A version (or extension) of the Fisher's Exact test, known as the Freeman-Halton test applies to comparisons for tables greater than 2x2. SAS can calculate both statistics using the following instructions. proc freq; tables a * b / fisher; Do people here still stand by position fisher exact test can be used for RxC contingency tables ? Sorry to both you all so much it is just important for a paper I am writing and planning to submit soon. ( I have a 4x2 table but does not meet expected frequencies requirements for chi-squared.) I guess people here have suggested R implements, the following, which unfortunately are unavailable at least easily at my library but at least by the titles indicates it is extending it to RxC Mehta CR, Patel NR. A network algorithm for performing Fisher's exact test in r c contingency tables. Journal of the American Statistical Association 1983;78:427-34. Mehta CR, Patel NR. Algorithm 643: FEXACT: A FORTRAN subroutine for Fisher's exact test on unordered r x c contingency tables. ACM Transactions on Mathematical Software 1986;12:154-61. The only reason I ask again is he is exceptionally clear on this point. Thanks again, -Rob viostorm wrote: Thank you all very kindly for your help. -Rob Robert Schutt III, MD, MCS Resident - Department of Internal Medicine University of Virginia, Charlottesville, Virginia viostorm wrote: Thank you all very kindly for your help. -Rob Robert Schutt III, MD, MCS Resident - Department of Internal Medicine University of Virginia, Charlottesville, Virginia -- View this message in context: http://r.789695.n4.nabble.com/fisher-exact-for-2x2-table-tp3481979p3484009.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo
Re: [R] Comparing non nested models with correlation coefficients (inspired from Lorch and Myers )
As a follow-up to Greg's suggested graphical presentation, it seems like the Vuong test is sometimes used to compare fits of non nested models. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 From: Greg Snow greg.s...@imail.org To: Boris New new.bo...@gmail.com, r-help@r-project.org r-help@r-project.org Date: 03/23/2011 01:14 PM Subject: Re: [R] Comparing non nested models with correlation coefficients (inspired from Lorch and Myers ) Sent by: r-help-boun...@r-project.org If you are interested in the fits, then I would just plot the fits. Plot the fitted/predicted values from the 1st model as the x-values and the fitted/predicted values from the second model as the y-values. It is best to plot on a square plotting region and use asp=1, probably also doing abline(0,1) after. This gives a nice visual of how the predictions from the models compare. You could take it a step further by taking the 2 sets of predictions and doing a Tukey mean-difference plot, or a Bland-Altman plot. I am not sure that the test you suggest would be meaningful. What null hypothesis is it testing and what does it mean if it is not rejected? (what is the power to reject?). Do you really need a formal test? If not then the plot is probably more meaningful. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Boris New Sent: Wednesday, March 23, 2011 7:32 AM To: r-help@r-project.org Subject: [R] Comparing non nested models with correlation coefficients (inspired from Lorch and Myers ) Hi, I would like to compare two models in R with the same dependant variable but different predictors (two different types of frequency and reaction times as RT). An editor told me to have a look at Lorch and Myers 1990. Lorch and Myers use the following technique: 1) they perform regressions on individual subjects' data 2) they extract the beta weights 3) they run t-test on these beta weights. The point is that I don't want to compare the size effect from the different models but how well the two models fit. So my idea was to extract the correlation coefficients instead of betas and doing t-tests on these. I checked and my correlation coefficients are normally distributed... Is it ok to do that? Best regards, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] standard error for predicted mean count from ZIP
Does anyone have code for computing the standard error of the predicted mean count from a zero-inflated Poisson regression model estimated by the zeroinfl() function from the pscl package (and yes, we've checked with A. Z. already)? Thank you Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weighted.median function from package R.basic
While perhaps not the solution you were looking for, you might consider estimating weighted medians with linear quantile regression (just specify an intercept for single sample analysis, tau=0.50, and weights = your weights) in the quantreg package. Quantile regression does not require sorting to estimate medians (minimizes and objective function) and thus might require less computing time on a large data set. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 From: Joris Meys jorism...@gmail.com To: R mailing list r-help@r-project.org Date: 03/30/2010 10:39 AM Subject: [R] weighted.median function from package R.basic Sent by: r-help-boun...@r-project.org Dear all, I want to apply a weighted median on a huge dataset, and I remember a function from the package R.basic that could do this using an internal sorting algorithm qsort. This speeded things up quite a bit. Alas, I can't find that package anywhere anymore. There is a weighted.median function in the package limma too, but I didn't use that before. Anybody who knows what happened to R.basic? Cheers Joris -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] opposite estimates from zeroinfl() and hurdle()
Tord: The logistic zero-inflation portion of the zeroinfl() implementation of ZIP or ZINB predict the probability of 0 rather than the probability of 1 (0 counts) so the signs of the coefficients are often reversed from how you would expect them to be if you had just performed a logistic regression. I'm guessing that the hurdle model as a two-stage model is using a logistic regression predicting the probability of 1, hence the reversed signs of the estimates in the logistic regression portion of the model. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 From: Tord Snäll tord.sn...@ekol.slu.se To: r-help@r-project.org Date: 10/23/2009 07:40 AM Subject: [R] opposite estimates from zeroinfl() and hurdle() Sent by: r-help-boun...@r-project.org Dear all, A question related to the following has been asked on R-help before, but I could not find any answer to it. Input will be much appreciated. I got an unexpected sign of the slope parameter associated with a covariate (diam) using zeroinfl(). It led me to compare the estimates given by zeroinfl() and hurdle(): The (significant) negative estimate here is surprising, given the biology of the species: summary(zeroinfl(bnl ~ 1| diam, dist = poisson, data = valdaekar, EM = TRUE)) Count model coefficients (poisson with log link): Estimate Std. Error z value Pr(|z|) (Intercept) 3.746040.02635 142.2 2e-16 *** Zero-inflation model coefficients (binomial with logit link): Estimate Std. Error z value Pr(|z|) (Intercept) 21.7510 7.6525 2.842 0.00448 ** diam -1.1437 0.3941 -2.902 0.00371 ** Number of iterations in BFGS optimization: 1 Log-likelihood: -582.8 on 3 Df The hurdle model gives the same estimates, but with opposite (and expected) signs of the parameters: summary(hurdle(bnl ~ 1| diam, dist = poisson, data = valdaekar)) Count model coefficients (truncated poisson with log link): Estimate Std. Error z value Pr(|z|) (Intercept) 3.746040.02635 142.2 2e-16 *** Zero hurdle model coefficients (binomial with logit link): Estimate Std. Error z value Pr(|z|) (Intercept) -21.7510 7.6525 -2.842 0.00448 ** diam 1.1437 0.3941 2.902 0.00371 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Number of iterations in BFGS optimization: 8 Log-likelihood: -582.8 on 3 Df Why is this so? thanks, Tord Windows NT, R 2.8.1, pcsl 1.03 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quantile GAM?
There are possibilities with rqss() as someone else mentioned. But you can also conduct a lot of useful modeling just by using b-splines within the the rq function - something like my.result - rq(y ~ bs(x,degree=3)), where bs() is the b-spline function from the splines package. You get to specifiy the degree of polynomial and number and location of knots. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 From: Jonathan Greenberg greenb...@ucdavis.edu To: r-help@r-project.org r-help@r-project.org Date: 05/29/2009 05:55 PM Subject: [R] Quantile GAM? Sent by: r-help-boun...@r-project.org R-ers: I was wondering if anyone had suggestions on how to implement a GAM in a quantile fashion? I'm trying to derive a model of a hull of points which are likely to require higher-order polynomial fitting (e.g. splines)-- would quantreg be sufficient, if the response and predictors are all continuous? Thanks! --j __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Goodness of fit in quantile regression
Laura: Part of the issue may depend on what you mean by goodness-of-ft. If you are looking for some global measure like a pseudo R or AIC to select among models, you ought to be able to make those calculations off the objective function that was minimized as you recognized. If qr.fit.sfn() does not return the objective function like rq.fit.br(), the simplex routine, you still ought to be able to do the calculations by performing the asymmetric weighting of the residuals from the model (see Roger Koenker's 2005 book). Now, if by goodness-of-fit you mean how the model fits in local regions of the predictor space, then you might want to check out Stef van Buuren's work on worm plots to diagnose fit in quantile regression. Don't remember where he published this but his email is stef.vanbuu...@tno.nl Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 From: laura m. mayorala...@gmail.com To: r-help@r-project.org Date: 05/22/2009 03:29 AM Subject: [R] Goodness of fit in quantile regression Sent by: r-help-boun...@r-project.org Dear R users, I've used the function qr.fit.sfn to estimate a quantile regression on a panel data set. Now I would like to compute an statistic to measure the goodness of fit of this model. Does someone know how could I do that? I could compute a pseudo R2 but in order to do that I would need the value of the objetive function at the optimum and I don't see how to get this from the function qr.fit.sfn. If someone has any good idea about how to solve this problem I would be most grateful! Best Laura -- View this message in context: http://www.nabble.com/Goodness-of-fit-in-quantile-regression-tp23666962p23666962.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weighted quantiles
Try the linear quantile regression function rq() in the quantreg package. For 1 sample estimates, your model would have just an intercept term. There is a weight argument. quantiles.out - rq(y ~ 1, data=mydata, tau=1:9/10, weight=myweights) would yield the 0.10, 0.20, ..., 0.80, 0.90 weighted quantile estimates. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: [EMAIL PROTECTED] tel: 970 226-9326 sj [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 10/07/2008 12:38 PM To r-help [EMAIL PROTECTED] cc Subject [R] weighted quantiles I have a set of values and their corresponding weights. I can use the function weighted.mean to calculate the weighted mean, I would like to be able to similarly calculate the weighted median and quantiles? Is there a function in R that can do this? thanks, Spencer [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quantile regression with complex survey data
Just as a follow-up on T. Lumley's post, 2 citations that may be useful in reference to application of quantile regression with survey samples are: Bassett and Saleh. 1994. L_1 estimation of the median of a survey population. Nonparametric Statistics 3: 277-283. (L_1 estimation of median is 0.50 quantile regression). Bassett et al. 2002. Quantile models and estimators for data analysis. Metrika 55: 17-26. (describes weighted QR for survey of school characteristics and student achievement scores). Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: [EMAIL PROTECTED] tel: 970 226-9326 Stas Kolenikov [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 08/20/2008 01:14 PM To Cheng, Yiling (CDC/CCHP/NCCDPHP) [EMAIL PROTECTED] cc r-help@r-project.org Subject Re: [R] Quantile regression with complex survey data On Wed, Aug 20, 2008 at 8:12 AM, Cheng, Yiling (CDC/CCHP/NCCDPHP) [EMAIL PROTECTED] wrote: I am working on the NHANES survey data, and want to apply quantile regression on these complex survey data. Does anyone know how to do this? There are no references in technical literature (thinking, Annals, JASA, JRSS B, Survey Methodology). Absolutely none. Zero. You might be able to apply the procedure mechanically and then adjust the standard errors, but God only knows what the population equivalent is of whatever that model estimates. If there is a population analogue at all. In general, a quantile regression is a heavily model based concept: for each value of the explanatory variables, there is a well defined distribution of the response, and quantile regression puts additional structure on it -- linearity of quantiles wrt to some explanatory variables. That does not mesh well with the design paradigm according to which the survey estimation is usually conducted. With the latter, the finite population and characteristics of every unit are assumed fixed, and randomness comes only from the sampling procedure. Within that paradigm, you can define the marginal distribution of the response (or any other) variable, but the conditional distributions may simply be unavailable because there are no units in the population satisfying the conditions. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] weights in lmer
I originally sent this to Doug Bates but have received no reply yet so I thought I would expand to a wider source. I've been trying to estimate linear mixed effect models in lmer() from the lme4 package using the weights option. The help and code for lmer() suggest to me that this is implemented but I can't seem to get it to do anything with weights = , no error message reported it just seems to ignore the weights option as my weighted and unweigted models are identical in all summary statistics. Below is an example of my model code. The weights are lengths of observational bouts on wild horse behavior. So is weights not really yet implemented or am I missing something? m2.wt- lmer(HD_RATE ~ NUM_PAIR + (NUM_PAIR|HMA),data=out2.5,weights=HOURS/max(HOURS), contrasts=list(NUM_PAIR=contr.treatment),family=gaussian) Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: [EMAIL PROTECTED] tel: 970 226-9326 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] categorical data analysis - fisher.exact for 2x2 and greater
Jeff and others: Paul Mielke and Ken Berry at Colorado State University developed very fast Fortran routines for doing Fisher exact tests (as well as some others like Chi-square and Zelterman's statistics). Their book (Permutation Methods: A Distance Function Approach, 2nd ed. 2007. Springer) indicates that these algorithms were only efficient for = 5 conditional loops so 2x2, 3x2, 4x2, 5x2, 6x2, 3x3, and 2x2x2 were feasible back in the 1990's. I'm not sure whether more recent computing capability permits larger tables now. Brian Brian S. Cade U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: [EMAIL PROTECTED] tel: 970 226-9326 Dr. Jeff Miller [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 05/07/2008 09:56 AM To 'Greg Snow' [EMAIL PROTECTED], 'David Winsemius' [EMAIL PROTECTED], [EMAIL PROTECTED] cc Subject Re: [R] categorical data analysis - fisher.exact for 2x2 and greater Hi Simon and all, I'm pretty sure that you are correct about this. I think it is a misconception to say that the fisher exact test is only for a 2 by 2 table. It is presented that way in textbooks because, for a 2x2 table, it is easy to perform. For larger tables, it becomes complex quickly due to the rate at which the permutations increase. When I use it for larger tables, it is hit or miss as to whether R will be able to do it. It's not uncommon to get an error implying that R is out of memory. Check out Agresti's Categorical Data Analysis on the use of the fisher exact for larger than 2x2 tables, and check out the R archives (and Google search) for all the posts about the error message people run into sometimes. On April 29th, Marc Schwartz replied to my question about this as follows: Take a look at the 'workspace' argument in ?fisher.test and review the second paragraph in Details: For 2 by 2 cases, p-values are obtained directly using the (central or non-central) hypergeometric distribution. Otherwise, computations are based on a C version of the FORTRAN subroutine FEXACT which implements the network developed by Mehta and Patel (1986) and improved by Clarkson, Fan and Joe (1993). The FORTRAN code can be obtained from http://www.netlib.org/toms/643. Note this fails (with an error message) when the entries of the table are too large. (It transposes the table if necessary so it has no more rows than columns. One constraint is that the product of the row marginals be less than 2^31 - 1.) Thank you, Jeff -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Greg Snow Sent: Wednesday, May 07, 2008 9:55 AM To: David Winsemius; [EMAIL PROTECTED] Subject: Re: [R] categorical data analysis The last example in ?fisher.test is not a 2x2 table, in fact it uses levels with a natural ordering similar to the original question. Why would this not be applicable to the situation? From: [EMAIL PROTECTED] [EMAIL PROTECTED] On Behalf Of David Winsemius [EMAIL PROTECTED] Sent: Wednesday, May 07, 2008 7:34 AM To: [EMAIL PROTECTED] Subject: Re: [R] categorical data analysis Simon Blomberg [EMAIL PROTECTED] wrote in news:[EMAIL PROTECTED]: But see these posts: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/119079.html http://finzi.psych.upenn.edu/R/Rhelp02a/archive/119080.html Simon. Interesting reading, but the OP specifically said he was not dealing with 2x2 tables, so neither fisher.test nor the suggested alternatives would be applicable to his data situation. -- David Winsemius __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Internal Virus Database is out-of-date. 11:27 AM Internal Virus Database is out-of-date. 11:27 AM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] graphics - line resolution/pixelation going from R to windows metafile
I have a recurring graphics issue that I've not been able to resolve with R. If I make a series of regression estimates and then plot the estimated function for the regression lines over a scatter plot of the data, e.g., using a sequence of plot( ) and lines ( ) similar to those below plot(dep12spp13ph1$DAYSWETm2,dep12spp13ph1$AFTERDECOMP,pch=dep12spp13ph1$spp,cex=0.75,ylab=Proportion of biomass after leaching, xlab=Number of days in wetland,xlim=c(0,250),ylim=c(0,1)) xplot- 0:243 xplotbsk64deg1-bs(xplot,knot=62,degree=1,Boundary.knots=c(0,243)) lines(xplot,exp(xplotbsk64deg1 %*% ph1.decomp.75.bs$coef[c(1,2)]),col=blue) lines(xplot,exp(xplotbsk64deg1 %*% ph1.decomp.75.bs$coef[c(3,4)]),col=red) and then attempt to copy the resulting graph from R graph window into a Windows metafile to paste into another application like Power Point or Word, the resulting regression lines have alot of jagged edges due to resolution/pixelation issues. For a nonlinear function like the one plotted above, changing the number of pairs of points evaluated with the lines( ) function makes the plot look better or worse but never as good as is possible if there was some sort of smoothing done on the plotted line pixels. I do this type of graphing in SYSTAT all the time and it looks great (but I would prefer not to have to jump back and forth between R and SYSTAT ). If I save the graph (or print) from R into an encapsulated Post-Script file, then the resulting regression line pixels are smoothed out and look nice, but then the symbols have been decomposed into their constituent elements and are screwed up. I'm guessing somewhere in R's graphing parameters/controls there might be a solution but I've yet to find it. Any suggestions would be welcome. Brian Brian S. Cade U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: [EMAIL PROTECTED] tel: 970 226-9326 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.