Re: [R] package loading smooth.lf (LOCFIT), couldn't find function smooth.lf
Y Y wrote: After loading locfit, I am unable to access functions within locfit. following http://www.herine.net/locfit/start.html library(locfit) x - 10*runif(100) y - 5*sin(x)+rnorm(100) fit - smooth.lf(x,y) Error: couldn't find function smooth.lf So it is time to ask the maintainers of the package and the cited URL (CCing both) to update at least one of them (package or http://www.herine.net/locfit/start.html) While the web page states the last update is (December 16, 2004 version), the version on CRAN is locfit_1.1-9.tar.gz dated 14-Sep-2004. Uwe Ligges fit - locfit(y~lp(x)) Error in eval(expr, envir, enclos) : couldn't find function lp library() or package manager GUI tells me locfit is loaded Any ideas on how to fix this ? SS, running on R 2.1, MacOS 10.3.9 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Garch in a model with explanatory variables
The Ox interface in fSeries is quite an easy way to accomplish this, although it produces some garbage, both in your current environment within R, as well as in the directory in which you are running R. You have to be careful also if you're on a Linux or other UNIX system, as the function has Windows pathnames hard coded into it, which you need to alter to the ones where your Ox resides. Tobias __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Quantile normalization and NA
Ravi Murthy wrote: Hi, I am new to R, I am doing quantile normalization with a dat matix of 384X124 and I find that while computing the quantile normailzation it introduces 'NA' into some of the cells, can someone help me to overcome this problem ? This is the command that goes like upto g62 for 124 colomns g1 - normalize.quantiles(exprs(MSExpr[,1:2])) Do you mean the function normalize.quantiles() from package affy (please always tell the package, if the function is not in base R)? It's more appropriate to ask on the Bioconductor mailing list if Bioconductor packages are the subject of interest. And you might want to give a simple, reproducible, but non-bandwith-wasting example (perhaps by uploadiung data to some web site) in order to make the Bioconductor folks able help you. Uwe Ligges For a small set of data there is no problem, but for a large set of data, it introduces NA in the place where it is suppose to geneerate data . Ravi [EMAIL PROTECTED] Raviabi __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Using a string as a filter
Hi , I want to be able to filter out results using a string. I'm running an automated script that reads a list of filters I get from an external source and applys them to my data frame consecutively. For example I want to get : data[protocol==1], data[protocol==2] ... If I define filter1 - protocol==1 (as a string) filter2 - protocol==2 ... How can I use these variables to choose subsets ? I managed to find a very awkward method by using a function that calls a formula (and using as.formula() for the string I want to get), but I would love to find a more efficient way Thank you ! Yzhar Toren, Tel-Aviv university. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] package loading smooth.lf (LOCFIT), couldn't find functio n smooth.lf
The version of locfit on the web site mentioned apparently has been revised by Prof. Loader, and is newer than the CRAN version that I have been maintaining. If Prof. Loader is OK with it, I will take a look and see if I can get the new version into CRAN-conforming form and upload to CRAN. Meanwhile, make sure you're using the package from Prof. Loader's web page, instead of the one on CRAN, if you want the new features. Andy From: Uwe Ligges Y Y wrote: After loading locfit, I am unable to access functions within locfit. following http://www.herine.net/locfit/start.html library(locfit) x - 10*runif(100) y - 5*sin(x)+rnorm(100) fit - smooth.lf(x,y) Error: couldn't find function smooth.lf So it is time to ask the maintainers of the package and the cited URL (CCing both) to update at least one of them (package or http://www.herine.net/locfit/start.html) While the web page states the last update is (December 16, 2004 version), the version on CRAN is locfit_1.1-9.tar.gz dated 14-Sep-2004. Uwe Ligges fit - locfit(y~lp(x)) Error in eval(expr, envir, enclos) : couldn't find function lp library() or package manager GUI tells me locfit is loaded Any ideas on how to fix this ? SS, running on R 2.1, MacOS 10.3.9 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Quantile normalization and NA
I agree with Uwe on both points. Have you checked if your inputs, i.e. exprs(MSExpr[ ,1:2]), contain missing values to begin with ? Out of curiosity are you trying to apply this method to two colour arrays ? You might want to enquire the BioConductor mailing list about the merits of doing this over standard techniques (i.e. LOESS normalisation) as I do not think this is widely done. Regards, Adai On Sun, 2005-07-10 at 12:34 +0200, Uwe Ligges wrote: Ravi Murthy wrote: Hi, I am new to R, I am doing quantile normalization with a dat matix of 384X124 and I find that while computing the quantile normailzation it introduces 'NA' into some of the cells, can someone help me to overcome this problem ? This is the command that goes like upto g62 for 124 colomns g1 - normalize.quantiles(exprs(MSExpr[,1:2])) Do you mean the function normalize.quantiles() from package affy (please always tell the package, if the function is not in base R)? It's more appropriate to ask on the Bioconductor mailing list if Bioconductor packages are the subject of interest. And you might want to give a simple, reproducible, but non-bandwith-wasting example (perhaps by uploadiung data to some web site) in order to make the Bioconductor folks able help you. Uwe Ligges For a small set of data there is no problem, but for a large set of data, it introduces NA in the place where it is suppose to geneerate data . Ravi [EMAIL PROTECTED] Raviabi __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help to make a specific matrix
In addition to Gabor's solution, you might be interested in the combinations function from the gtools package. library(gtools) combinations(5,2) [,1] [,2] [1,]12 [2,]13 [3,]14 [4,]15 [5,]23 [6,]24 [7,]25 [8,]34 [9,]35 [10,]45 On Sat, 2005-07-09 at 21:42 -0300, Jose Claudio Faria wrote: Dear R users, The solution is probably simple but I need someone to point me to it. How can I to generate a matrix from a numeric sequence of 1:10 like 'A' or 'B' below: A) || | 1 2 3 4 5 | || | 1 | 0 | | 2 | 1 0 | | 3 | 2 5 0 | | 4 | 3 6 8 0| | 5 | 4 7 9 10 0 | || B) || | 1 2 3 4 5 | || | 1 | 0 1 2 3 4 | | 2 | 1 0 5 6 7 | | 3 | 2 5 0 8 9 | | 4 | 3 6 8 0 10 | | 5 | 4 7 9 10 0 | || This question is related with the possible combinations of five objects two the two, i.e, C(5,2). Any help would be greatly appreciated. Regards, __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Using a string as a filter
On 7/10/05, Yzhar Toren [EMAIL PROTECTED] wrote: Hi , I want to be able to filter out results using a string. I'm running an automated script that reads a list of filters I get from an external source and applys them to my data frame consecutively. For example I want to get : data[protocol==1], data[protocol==2] ... If I define filter1 - protocol==1 (as a string) filter2 - protocol==2 ... protocol - 1:5 data - 11:15 filter - protocol==1 data[eval(parse(text=filter))] # 11 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] comparing strength of association instead of strength of evidence?
Weiwei Shi wrote: Dear all: I still need some further help since I think the question itself might be very interesting (i hope so:) : the question is on chisq.test, my concern is which criteria should be used here to evaluate the independence. The reason i use this old subject of the email is, b/c I think the problem here is about how to look at p.value, which evaluate the strength of evidence instead of association. If i am wrong, please correct me. the result looks like this: index word.comb id in.class0 in.class1 p.value odds.ratio 1 1 TOTAL|LAID 54|241 2 4 0.0004997501 0.00736433 2 2 THEFT|RECOV 52|53 40751 146 0.0004997501 4.17127643 3 3 BOLL|ACCID 10|21 36825 1202 0.0004997501 0.44178546 4 4 LAB|VANDAL 8|55 24192 429 0.0004997501 0.82876099 5 5 VANDAL|CAUS 55|59 80164 0.0004997501 0.18405918 6 6AI|TOTAL 9|54 194945 0.0034982509 0.63766766 7 7AI|RECOV 9|53 238561 0.0004997501 0.57547012 8 8 THEFT|TOTAL 52|54 33651 110 0.0004997501 4.56174408 the target is to look for best subset of word.comb to differentiate between class0 and class1. p.value is obtained via p.chisq.sim[i] - as.double(chisq.test(tab, sim=TRUE, B=myB)$p.value) and B=2 (I increased B and it won't help. the margin here is class0=2162792 class1=31859 ) So, in conclusion, which one I should use first, odds.ratio or p.value to find the best subset. If your goal is to discriminate between two different classes, why not calculate directly a measure of discriminative ability, such as probability of correct classification? Kjetil I read some on feature selection in text categorization (A comparative study on feature selection in text categorization might be a good ref.). Anyone here has other suggestions? thanks, weiwei On 6/24/05, Gabor Grothendieck [EMAIL PROTECTED] wrote: On 6/24/05, Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] wrote: Weiwei Shi wrote: Hi, I asked this question before, which was hidden in a bunch of questions. I repharse it here and hope I can get some help this time: I have 2 contingency tables which have the same group variable Y. I want to compare the strength of association between X1/Y and X2/Y. I am not sure if comparing p-values IS the way even though the probability of seeing such weird observation under H0 defines p-value and it might relate to the strength of association somehow. But I read the following statement from Alan Agresti's An Introduction to Categorical Data Analysis : Chi-squared tests simply indicate the degree of EVIDENCE for an associationIt is sensible to decompose chi-squared into components, study residuals, and estimate parameters such as odds ratios that describe the STRENGTH OF ASSOCIATION. Here are some things you can do: tab1-array(c(11266, 125, 2151526, 31734), dim=c(2,2)) tab2-array(c(43571, 52, 2119221, 31807), dim=c(2,2)) library(epitools) # on CRAN ?odds.ratio Help for 'odds.ratio' is shown in the browser library(help=epitools) # on CRAN tab1 [,1][,2] [1,] 11266 2151526 [2,] 125 31734 odds.ratio(11266, 125, 2151526, 31734) Error in fisher.test(tab) : FEXACT error 40. Out of workspace. # so this are evidently for tables with smaller counts library(vcd) # on CRAN ?oddsratio Help for 'oddsratio' is shown in the browser oddsratio( tab1) # really is logodds ratio [1] 0.2807548 plot(oddsratio( tab1) ) library(help=vcd) # on CRAN Read this for many nice functions. fourfoldplot(tab1) mosaicplot(tab1) # not really usefull for this table Also has a look at function Crosstable in package gmodels. To decompose the chisqure you can program yourselves: decomp.chi - function(tab) { rows - rowSums(tab) cols - colSums(tab) N - sum(rows) E - rows %o% cols / N contrib - (tab-E)^2/E contrib } decomp.chi(tab1) [,1] [,2] [1,] 0.1451026 0.0007570624 [2,] 9.8504915 0.0513942218 So you can easily see what cell contributes most to the overall chisquared. Kjetil Can I do this decomposition in R for the following example including 2 contingency tables? tab1-array(c(11266, 125, 2151526, 31734), dim=c(2,2)) tab1 [,1][,2] [1,] 11266 2151526 [2,] 125 31734 tab2-array(c(43571, 52, 2119221, 31807), dim=c(2,2)) tab2 [,1][,2] [1,] 43571 2119221 [2,]52 31807 Here are a few more ways of doing this using chisq.test, glm and assocplot: ## chisq.test ### tab1.chisq - chisq.test(tab1) # decomposition of chisq resid(tab1.chisq)^2 [,1] [,2] [1,] 0.1451026 0.0007570624 [2,] 9.8504915 0.0513942218 # same with(tab1.chisq, (observed - expected)^2/expected) [,1] [,2] [1,] 0.1451026
Re: [R] aregImpute: beginner's question
Anders Schwartz Corr wrote: . . . podb2+propdemocracy+avetrade1984dollars+concentration+cycle+polarity+propmid+terrgainer+ demgainer+ fedgainer+ popdengainer+ urbpopgainer+ tradeopgainer+ gdppcgainer+ terrloser+ demloser+ fedloser+ popdenloser+ urbpoploser+ tradeoploser+ gdppcloser, n.impute=100, defaultLinear=TRUE, data=d) Iteration:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Error in lm.fit.qr.bare(f$tx, f$ty) : NA/NaN/Inf in foreign function call (arg 1) This is probably a singularity. Remove one variable at a time from the formula and re-run aregImpute. That may help you find a culprit. Also, you may not need 100 imputations. Frank par(mfrow=c(2,3)) plot(f, diagnostics=TRUE, maxn=2) 22 fmi - fit.mult.impute(y ~ podb2+propdemocracy+avetrade1984dollars+concentration+cycle+polarity+propmid+terrgainer+ demgainer+ fedgainer+ popdengainer+ urbpopgainer+ tradeopgainer+ gdppcgainer+ terrloser+ demloser+ fedloser+ popdenloser+ urbpoploser+ tradeoploser+ gdppcloser, lm, f, +data=d) Error in impute.transcan(xtrans, imputation = i, data = data, list.out = TRUE, : inconsistant naming of observations led to differing length vectors -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing data imputation
(Ted Harding) wrote: [...] In many cases people simply treat negative estimates of variables which are intrinsically non-negative very crudely: if it comes out negative, replaceit with zero. This too is often a quick fix where the fact that it is a lie simply has no practical importance. But, of course, it may matter! That depends ... (see above). That will result in a strange distribution of imputed values. . . . I've also noted Frank Harrel's comment about aregImpute, and will bear it in mind. Note, however, that this does not do multiple imputation on the same lines as NORM (or the other Shafer-derived MI packages). See ?aregImpute section Details. And, specifically, from the Description: It is different, but aregImpute approximates the full Bayesian procedure. MICE is another approach to approximating it, and aregImpute seems to agree well with MICE when you force linearity in aregImpute (because like NORM, MICE cannot handle nonlinearity). Frank The 'transcan' function creates flexible additive imputation models but provides only an approximation to true multiple imputation as the imputation models are fixed before all multiple imputations are drawn. This ignores variability caused by having to fit the imputation models. 'aregImpute' takes all aspects of uncertainty in the imputations into account by using the bootstrap to approximate the process of drawing predicted values from a full Bayesian predictive distribution. so that the Rubin/Shafer method described above (see paragraph about dispersion of imputed values) is not fully implemented. Best wishes, Ted. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] comparing strength of association instead of strength of evidence?
For this step, my purpose is feature construction and feature subsetting. In this project, I am using contrast association rule mining to build word-combinations; in the previous example, 2-itemset was created from CAR and tested for their dependency on class for feature subsetting/selection. Other methods like mutual information might be used too. Any methods that won't replicate mechanisms in my following step (see below) can be tried here. Could you detail what you meant by a discriminative measure? The whole datasets also contain many non-words variables. To combine both data mining and text categorization is the focus and interests of this project. Decision tree, Bayesian network or SVM/LSI might be candidates. Thanks for further suggestion, Weiwei On 7/10/05, Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] wrote: Weiwei Shi wrote: Dear all: I still need some further help since I think the question itself might be very interesting (i hope so:) : the question is on chisq.test, my concern is which criteria should be used here to evaluate the independence. The reason i use this old subject of the email is, b/c I think the problem here is about how to look at p.value, which evaluate the strength of evidence instead of association. If i am wrong, please correct me. the result looks like this: index word.comb id in.class0 in.class1 p.value odds.ratio 1 1 TOTAL|LAID 54|241 2 4 0.0004997501 0.00736433 2 2 THEFT|RECOV 52|53 40751 146 0.0004997501 4.17127643 3 3 BOLL|ACCID 10|21 36825 1202 0.0004997501 0.44178546 4 4 LAB|VANDAL 8|55 24192 429 0.0004997501 0.82876099 5 5 VANDAL|CAUS 55|59 80164 0.0004997501 0.18405918 6 6AI|TOTAL 9|54 194945 0.0034982509 0.63766766 7 7AI|RECOV 9|53 238561 0.0004997501 0.57547012 8 8 THEFT|TOTAL 52|54 33651 110 0.0004997501 4.56174408 the target is to look for best subset of word.comb to differentiate between class0 and class1. p.value is obtained via p.chisq.sim[i] - as.double(chisq.test(tab, sim=TRUE, B=myB)$p.value) and B=2 (I increased B and it won't help. the margin here is class0=2162792 class1=31859 ) So, in conclusion, which one I should use first, odds.ratio or p.value to find the best subset. If your goal is to discriminate between two different classes, why not calculate directly a measure of discriminative ability, such as probability of correct classification? Kjetil I read some on feature selection in text categorization (A comparative study on feature selection in text categorization might be a good ref.). Anyone here has other suggestions? thanks, weiwei On 6/24/05, Gabor Grothendieck [EMAIL PROTECTED] wrote: On 6/24/05, Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] wrote: Weiwei Shi wrote: Hi, I asked this question before, which was hidden in a bunch of questions. I repharse it here and hope I can get some help this time: I have 2 contingency tables which have the same group variable Y. I want to compare the strength of association between X1/Y and X2/Y. I am not sure if comparing p-values IS the way even though the probability of seeing such weird observation under H0 defines p-value and it might relate to the strength of association somehow. But I read the following statement from Alan Agresti's An Introduction to Categorical Data Analysis : Chi-squared tests simply indicate the degree of EVIDENCE for an associationIt is sensible to decompose chi-squared into components, study residuals, and estimate parameters such as odds ratios that describe the STRENGTH OF ASSOCIATION. Here are some things you can do: tab1-array(c(11266, 125, 2151526, 31734), dim=c(2,2)) tab2-array(c(43571, 52, 2119221, 31807), dim=c(2,2)) library(epitools) # on CRAN ?odds.ratio Help for 'odds.ratio' is shown in the browser library(help=epitools) # on CRAN tab1 [,1][,2] [1,] 11266 2151526 [2,] 125 31734 odds.ratio(11266, 125, 2151526, 31734) Error in fisher.test(tab) : FEXACT error 40. Out of workspace. # so this are evidently for tables with smaller counts library(vcd) # on CRAN ?oddsratio Help for 'oddsratio' is shown in the browser oddsratio( tab1) # really is logodds ratio [1] 0.2807548 plot(oddsratio( tab1) ) library(help=vcd) # on CRAN Read this for many nice functions. fourfoldplot(tab1) mosaicplot(tab1) # not really usefull for this table Also has a look at function Crosstable in package gmodels. To decompose the chisqure you can program yourselves: decomp.chi - function(tab) { rows - rowSums(tab) cols - colSums(tab) N - sum(rows) E - rows %o% cols / N contrib - (tab-E)^2/E contrib } decomp.chi(tab1) [,1] [,2] [1,]
Re: [R] Help with Mahalanobis
Well, as I did not get a satisfactory reply to the original question I tried to make a basic function that, I find, solve the question. I think it is not the better function, but it is working. So, perhaps it can be useful to other people. # # Calculate the matrix of Mahalanobis Distances between groups # from data.frames # # by: José Cláudio Faria # date: 10/7/05 13:23:48 # D2Mah = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y= as.matrix(y) x= as.factor(x) man = manova(y ~ x) E= summary(man)$SS[2] #Matrix E S= as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) colnames(mds) = names(y) Objects = levels(x) rownames(mds) = Objects library(gtools) nObjects = nrow(mds) comb = combinations(nObjects, 2) tmpD2 = numeric() for (i in 1:dim(comb)[1]){ a = comb[i,1] b = comb[i,2] tmpD2[i] = (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]) } # Thanks Gabor for the below tmpMah = matrix(0, nObjects, nObjects, dimnames=list(Objects, Objects)) tmpMah[lower.tri(tmpMah)] = tmpD2 D2 = tmpMah + t(tmpMah) return(D2) } # # To try # D2M = D2Mah(iris[,1:4], iris[,5]) print(D2M) Thanks all for the complementary aid (specially to Gabor). Regards, -- Jose Claudio Faria Brasil/Bahia/UESC/DCET Estatistica Experimental/Prof. Adjunto mails: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] tel: 73-3634.2779 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help with Mahalanobis
I think you could simplify this by replacing everything after the nObjects = nrow(mds) line with just these two statements. f - function(a,b) mapply(function(a,b) (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b) D2 - outer(seq(nObjects), seq(nObjects), f) This also eliminates dependence on gtools and the complexity of dealing with triangular matrices. Regards. Here it is in full: D2Mah2 = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y= as.matrix(y) x= as.factor(x) man = manova(y ~ x) E= summary(man)$SS[2] #Matrix E S= as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) library(gtools) nObjects = nrow(mds) ### changed part is next two statements f - function(a,b) mapply(function(a,b) (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b) D2 - outer(seq(nObjects), seq(nObjects), f) } # # test # D2M2 = D2Mah2(iris[,1:4], iris[,5]) print(D2M2) On 7/10/05, Jose Claudio Faria [EMAIL PROTECTED] wrote: Well, as I did not get a satisfactory reply to the original question I tried to make a basic function that, I find, solve the question. I think it is not the better function, but it is working. So, perhaps it can be useful to other people. # # Calculate the matrix of Mahalanobis Distances between groups # from data.frames # # by: José Cláudio Faria # date: 10/7/05 13:23:48 # D2Mah = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y= as.matrix(y) x= as.factor(x) man = manova(y ~ x) E= summary(man)$SS[2] #Matrix E S= as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) colnames(mds) = names(y) Objects = levels(x) rownames(mds) = Objects library(gtools) nObjects = nrow(mds) comb = combinations(nObjects, 2) tmpD2 = numeric() for (i in 1:dim(comb)[1]){ a = comb[i,1] b = comb[i,2] tmpD2[i] = (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]) } # Thanks Gabor for the below tmpMah = matrix(0, nObjects, nObjects, dimnames=list(Objects, Objects)) tmpMah[lower.tri(tmpMah)] = tmpD2 D2 = tmpMah + t(tmpMah) return(D2) } # # To try # D2M = D2Mah(iris[,1:4], iris[,5]) print(D2M) Thanks all for the complementary aid (specially to Gabor). Regards, -- Jose Claudio Faria Brasil/Bahia/UESC/DCET Estatistica Experimental/Prof. Adjunto mails: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] tel: 73-3634.2779 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] not supressing leading zeros when reading a table?
Dear R list, I have a dataset with a column which should be read as character, like this: name surname answer 1 xx yyy 00100 2 rrr hhh 01 When reading this dataset with read.table, I get 1 xx yyy 100 2 rrr hhh 1 The string column consists in answers to multiple choice questions, not all having the same number of answers. I could format the answers using formatC but there are over a hundred different questions in there. I tried with quote=\' without any luck. Googling after this take me nowhere either. It should be simple but I seem to miss it... Can anybody point me to the right direction? TIA, Adrian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] not supressing leading zeros when reading a table?
On Sun, 2005-07-10 at 18:13 +, Adrian Dusa wrote: Dear R list, I have a dataset with a column which should be read as character, like this: name surname answer 1 xx yyy 00100 2 rrr hhh 01 When reading this dataset with read.table, I get 1 xx yyy 100 2 rrr hhh 1 The string column consists in answers to multiple choice questions, not all having the same number of answers. I could format the answers using formatC but there are over a hundred different questions in there. I tried with quote=\' without any luck. Googling after this take me nowhere either. It should be simple but I seem to miss it... Can anybody point me to the right direction? TIA, Adrian With your example data saved in a file called test.txt: df - read.table(test.txt, header = TRUE, colClasses = character) df name surname answer 1 xx yyy 00100 2 rrr hhh 01 str(df) `data.frame': 2 obs. of 3 variables: $ name : chr xx rrr $ surname: chr yyy hhh $ answer : chr 00100 01 See the colClasses argument in ?read.table. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] not supressing leading zeros when reading a table?
Adrian Dusa wrote: Dear R list, I have a dataset with a column which should be read as character, like this: name surname answer 1 xx yyy 00100 2 rrr hhh 01 When reading this dataset with read.table, I get 1 xx yyy 100 2 rrr hhh 1 The string column consists in answers to multiple choice questions, not all having the same number of answers. I could format the answers using formatC but there are over a hundred different questions in there. I tried with quote=\' without any luck. Googling after this take me nowhere either. It should be simple but I seem to miss it... Can anybody point me to the right direction? By default, read.table guesses about the column type. Yours looks numeric, even though it is not. Use the colClasses argument of read.table to specify the column type. If you only have the 3 columns above, colClasses=character should work. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] not supressing leading zeros when reading a table?
Adrian, To prevent coercion to numeric, try: mydata - read.table(myfile, colClasses=character) HTH. alejandro On 7/10/05, Adrian Dusa [EMAIL PROTECTED] wrote: Dear R list, I have a dataset with a column which should be read as character, like this: name surname answer 1 xx yyy 00100 2 rrr hhh 01 When reading this dataset with read.table, I get 1 xx yyy 100 2 rrr hhh 1 The string column consists in answers to multiple choice questions, not all having the same number of answers. I could format the answers using formatC but there are over a hundred different questions in there. I tried with quote=\' without any luck. Googling after this take me nowhere either. It should be simple but I seem to miss it... Can anybody point me to the right direction? TIA, Adrian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] not supressing leading zeros when reading a table?
On 7/10/05, alejandro munoz [EMAIL PROTECTED] wrote: Adrian, To prevent coercion to numeric, try: mydata - read.table(myfile, colClasses=character) HTH. alejandro On 7/10/05, Adrian Dusa [EMAIL PROTECTED] wrote: Dear R list, [...snip...] Thank you all, I got it. This is my favourite super fast ever helpful help list (gosh, I didn't even expect an answer Sundays at 10 pm! ). Best, Adrian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] O/T -2 Log Lambda and Chi Square
Hi R People: Sorry about the off topic question. Does anyone know the reference for -2 Log Lambda is approx dist. Chi square, please? It may be Bartlett, but I'm not sure thanks in advance! Sincerely, Laura Holt mailto: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help with Mahalanobis
And here is one more simplification using the buildin mahalanobis function: D2Mah3 = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y= as.matrix(y) x= as.factor(x) man = manova(y ~ x) E= summary(man)$SS[2] #Matrix E S= as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) nObjects = nrow(mds) ### changed part is next two statements f - function(a,b) mapply(function(a,b) mahalanobis(mds[a,],mds[b,],InvS,TRUE), a, b) D2 - outer(seq(nObjects), seq(nObjects), f) } # # test # D2M3 = D2Mah3(iris[,1:4], iris[,5]) On 7/10/05, Gabor Grothendieck [EMAIL PROTECTED] wrote: I think you could simplify this by replacing everything after the nObjects = nrow(mds) line with just these two statements. f - function(a,b) mapply(function(a,b) (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b) D2 - outer(seq(nObjects), seq(nObjects), f) This also eliminates dependence on gtools and the complexity of dealing with triangular matrices. Regards. Here it is in full: D2Mah2 = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y= as.matrix(y) x= as.factor(x) man = manova(y ~ x) E= summary(man)$SS[2] #Matrix E S= as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) library(gtools) nObjects = nrow(mds) ### changed part is next two statements f - function(a,b) mapply(function(a,b) (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b) D2 - outer(seq(nObjects), seq(nObjects), f) } # # test # D2M2 = D2Mah2(iris[,1:4], iris[,5]) print(D2M2) On 7/10/05, Jose Claudio Faria [EMAIL PROTECTED] wrote: Well, as I did not get a satisfactory reply to the original question I tried to make a basic function that, I find, solve the question. I think it is not the better function, but it is working. So, perhaps it can be useful to other people. # # Calculate the matrix of Mahalanobis Distances between groups # from data.frames # # by: José Cláudio Faria # date: 10/7/05 13:23:48 # D2Mah = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y= as.matrix(y) x= as.factor(x) man = manova(y ~ x) E= summary(man)$SS[2] #Matrix E S= as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) colnames(mds) = names(y) Objects = levels(x) rownames(mds) = Objects library(gtools) nObjects = nrow(mds) comb = combinations(nObjects, 2) tmpD2 = numeric() for (i in 1:dim(comb)[1]){ a = comb[i,1] b = comb[i,2] tmpD2[i] = (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]) } # Thanks Gabor for the below tmpMah = matrix(0, nObjects, nObjects, dimnames=list(Objects, Objects)) tmpMah[lower.tri(tmpMah)] = tmpD2 D2 = tmpMah + t(tmpMah) return(D2) } # # To try # D2M = D2Mah(iris[,1:4], iris[,5]) print(D2M) Thanks all for the complementary aid (specially to Gabor). Regards, -- Jose Claudio Faria Brasil/Bahia/UESC/DCET Estatistica Experimental/Prof. Adjunto mails: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] tel: 73-3634.2779 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Boxplot in R
I am trying to draw a plot like Matlab does: The upper extreme whisker represents 95% of the data; The upper hinge represents 75% of the data; The median represents 50% of the data; The lower hinge represents 25% of the data; The lower extreme whisker represents 5% of the data. It looks like: --- 95% | | --- 75% | | |-| 50% | | | | --- 25% | --- 5% Anyone can give me some hints as to how to draw a boxplot like that? What function does it? I tried boxplot() but couldn't figure it out. If it's boxplot(), what arguments should I pass to the function? Thank you for your help. I'd appreciate it. Larry __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help with Mahalanobis
Indeed, it is very nice Gabor (as always)! So, a doubt: how to preserve the 'rowname' and 'colname' of D2, like in the first function? I think it is useful to posterior analyzes (as cluster, for example). Regards, # A small correction (reference to gtools was eliminated) D2Mah2 = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y= as.matrix(y) x= as.factor(x) man = manova(y ~ x) E= summary(man)$SS[2] #Matrix E S= as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) nObjects = nrow(mds) f = function(a,b) mapply(function(a,b) (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a, b) D2 = outer(seq(nObjects), seq(nObjects), f) } # # test # D2M2 = D2Mah2(iris[,1:4], iris[,5]) print(D2M2) Gabor Grothendieck wrote: I think you could simplify this by replacing everything after the nObjects = nrow(mds) line with just these two statements. f - function(a,b) mapply(function(a,b) (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b) D2 - outer(seq(nObjects), seq(nObjects), f) This also eliminates dependence on gtools and the complexity of dealing with triangular matrices. Regards. Here it is in full: D2Mah2 = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y= as.matrix(y) x= as.factor(x) man = manova(y ~ x) E= summary(man)$SS[2] #Matrix E S= as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) library(gtools) nObjects = nrow(mds) ### changed part is next two statements f - function(a,b) mapply(function(a,b) (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b) D2 - outer(seq(nObjects), seq(nObjects), f) } # # test # D2M2 = D2Mah2(iris[,1:4], iris[,5]) print(D2M2) On 7/10/05, Jose Claudio Faria [EMAIL PROTECTED] wrote: Well, as I did not get a satisfactory reply to the original question I tried to make a basic function that, I find, solve the question. I think it is not the better function, but it is working. So, perhaps it can be useful to other people. # # Calculate the matrix of Mahalanobis Distances between groups # from data.frames # # by: José Cláudio Faria # date: 10/7/05 13:23:48 # D2Mah = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y= as.matrix(y) x= as.factor(x) man = manova(y ~ x) E= summary(man)$SS[2] #Matrix E S= as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) colnames(mds) = names(y) Objects = levels(x) rownames(mds) = Objects library(gtools) nObjects = nrow(mds) comb = combinations(nObjects, 2) tmpD2 = numeric() for (i in 1:dim(comb)[1]){ a = comb[i,1] b = comb[i,2] tmpD2[i] = (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]) } # Thanks Gabor for the below tmpMah = matrix(0, nObjects, nObjects, dimnames=list(Objects, Objects)) tmpMah[lower.tri(tmpMah)] = tmpD2 D2 = tmpMah + t(tmpMah) return(D2) } # # To try # D2M = D2Mah(iris[,1:4], iris[,5]) print(D2M) Thanks all for the complementary aid (specially to Gabor). Regards, -- Jose Claudio Faria Brasil/Bahia/UESC/DCET Estatistica Experimental/Prof. Adjunto mails: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] tel: 73-3634.2779 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Esta mensagem foi verificada pelo E-mail Protegido Terra. Scan engine: McAfee VirusScan / Atualizado em 08/07/2005 / Versão: 4.4.00 - Dat 4531 Proteja o seu e-mail Terra: http://mail.terra.com.br/ -- Jose Claudio Faria Brasil/Bahia/UESC/DCET Estatistica Experimental/Prof. Adjunto mails: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] tel: 73-3634.2779 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help with Mahalanobis
This one adds the labels: D2Mah4 = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y= as.matrix(y) x= as.factor(x) man = manova(y ~ x) E= summary(man)$SS[2] #Matrix E S= as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) f - function(a,b) mapply(function(a,b) mahalanobis(mds[a,],mds[b,],InvS,TRUE), a, b) seq. - seq(length = nrow(mds)) names(seq.) - levels(x) D2 - outer(seq., seq., f) } # # test # D2M4 = D2Mah4(iris[,1:4], iris[,5]) print(D2M4) On 7/10/05, Jose Claudio Faria [EMAIL PROTECTED] wrote: Indeed, it is very nice Gabor (as always)! So, a doubt: how to preserve the 'rowname' and 'colname' of D2, like in the first function? I think it is useful to posterior analyzes (as cluster, for example). Regards, # A small correction (reference to gtools was eliminated) D2Mah2 = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y= as.matrix(y) x= as.factor(x) man = manova(y ~ x) E= summary(man)$SS[2] #Matrix E S= as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) nObjects = nrow(mds) f = function(a,b) mapply(function(a,b) (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a, b) D2 = outer(seq(nObjects), seq(nObjects), f) } # # test # D2M2 = D2Mah2(iris[,1:4], iris[,5]) print(D2M2) Gabor Grothendieck wrote: I think you could simplify this by replacing everything after the nObjects = nrow(mds) line with just these two statements. f - function(a,b) mapply(function(a,b) (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b) D2 - outer(seq(nObjects), seq(nObjects), f) This also eliminates dependence on gtools and the complexity of dealing with triangular matrices. Regards. Here it is in full: D2Mah2 = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y= as.matrix(y) x= as.factor(x) man = manova(y ~ x) E= summary(man)$SS[2] #Matrix E S= as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) library(gtools) nObjects = nrow(mds) ### changed part is next two statements f - function(a,b) mapply(function(a,b) (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b) D2 - outer(seq(nObjects), seq(nObjects), f) } # # test # D2M2 = D2Mah2(iris[,1:4], iris[,5]) print(D2M2) On 7/10/05, Jose Claudio Faria [EMAIL PROTECTED] wrote: Well, as I did not get a satisfactory reply to the original question I tried to make a basic function that, I find, solve the question. I think it is not the better function, but it is working. So, perhaps it can be useful to other people. # # Calculate the matrix of Mahalanobis Distances between groups # from data.frames # # by: José Cláudio Faria # date: 10/7/05 13:23:48 # D2Mah = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y= as.matrix(y) x= as.factor(x) man = manova(y ~ x) E= summary(man)$SS[2] #Matrix E S= as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) colnames(mds) = names(y) Objects = levels(x) rownames(mds) = Objects library(gtools) nObjects = nrow(mds) comb = combinations(nObjects, 2) tmpD2 = numeric() for (i in 1:dim(comb)[1]){ a = comb[i,1] b = comb[i,2] tmpD2[i] = (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]) } # Thanks Gabor for the below tmpMah = matrix(0, nObjects, nObjects, dimnames=list(Objects, Objects)) tmpMah[lower.tri(tmpMah)] = tmpD2 D2 = tmpMah + t(tmpMah) return(D2) } # # To try # D2M = D2Mah(iris[,1:4], iris[,5]) print(D2M) Thanks all for the complementary aid (specially to Gabor). Regards, -- Jose Claudio Faria Brasil/Bahia/UESC/DCET Estatistica Experimental/Prof. Adjunto mails: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] tel: 73-3634.2779 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Esta mensagem foi verificada pelo E-mail Protegido Terra. Scan engine: McAfee VirusScan / Atualizado em 08/07/2005 / Versão: 4.4.00 - Dat 4531 Proteja o seu e-mail Terra: http://mail.terra.com.br/ -- Jose Claudio Faria Brasil/Bahia/UESC/DCET Estatistica Experimental/Prof. Adjunto mails: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] tel: 73-3634.2779 __
Re: [R] Help with Mahalanobis
I think we now have a very good R function here. Thanks for all Gabor! JCFaria Gabor Grothendieck wrote: This one adds the labels: D2Mah4 = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y= as.matrix(y) x= as.factor(x) man = manova(y ~ x) E= summary(man)$SS[2] #Matrix E S= as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) f - function(a,b) mapply(function(a,b) mahalanobis(mds[a,],mds[b,],InvS,TRUE), a, b) seq. - seq(length = nrow(mds)) names(seq.) - levels(x) D2 - outer(seq., seq., f) } # # test # D2M4 = D2Mah4(iris[,1:4], iris[,5]) print(D2M4) On 7/10/05, Jose Claudio Faria [EMAIL PROTECTED] wrote: Indeed, it is very nice Gabor (as always)! So, a doubt: how to preserve the 'rowname' and 'colname' of D2, like in the first function? I think it is useful to posterior analyzes (as cluster, for example). Regards, # A small correction (reference to gtools was eliminated) D2Mah2 = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y= as.matrix(y) x= as.factor(x) man = manova(y ~ x) E= summary(man)$SS[2] #Matrix E S= as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) nObjects = nrow(mds) f = function(a,b) mapply(function(a,b) (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a, b) D2 = outer(seq(nObjects), seq(nObjects), f) } # # test # D2M2 = D2Mah2(iris[,1:4], iris[,5]) print(D2M2) Gabor Grothendieck wrote: I think you could simplify this by replacing everything after the nObjects = nrow(mds) line with just these two statements. f - function(a,b) mapply(function(a,b) (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b) D2 - outer(seq(nObjects), seq(nObjects), f) This also eliminates dependence on gtools and the complexity of dealing with triangular matrices. Regards. Here it is in full: D2Mah2 = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y= as.matrix(y) x= as.factor(x) man = manova(y ~ x) E= summary(man)$SS[2] #Matrix E S= as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) library(gtools) nObjects = nrow(mds) ### changed part is next two statements f - function(a,b) mapply(function(a,b) (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b) D2 - outer(seq(nObjects), seq(nObjects), f) } # # test # D2M2 = D2Mah2(iris[,1:4], iris[,5]) print(D2M2) On 7/10/05, Jose Claudio Faria [EMAIL PROTECTED] wrote: Well, as I did not get a satisfactory reply to the original question I tried to make a basic function that, I find, solve the question. I think it is not the better function, but it is working. So, perhaps it can be useful to other people. # # Calculate the matrix of Mahalanobis Distances between groups # from data.frames # # by: José Cláudio Faria # date: 10/7/05 13:23:48 # D2Mah = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y= as.matrix(y) x= as.factor(x) man = manova(y ~ x) E= summary(man)$SS[2] #Matrix E S= as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) colnames(mds) = names(y) Objects = levels(x) rownames(mds) = Objects library(gtools) nObjects = nrow(mds) comb = combinations(nObjects, 2) tmpD2 = numeric() for (i in 1:dim(comb)[1]){ a = comb[i,1] b = comb[i,2] tmpD2[i] = (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]) } # Thanks Gabor for the below tmpMah = matrix(0, nObjects, nObjects, dimnames=list(Objects, Objects)) tmpMah[lower.tri(tmpMah)] = tmpD2 D2 = tmpMah + t(tmpMah) return(D2) } # # To try # D2M = D2Mah(iris[,1:4], iris[,5]) print(D2M) Thanks all for the complementary aid (specially to Gabor). Regards, -- Jose Claudio Faria Brasil/Bahia/UESC/DCET Estatistica Experimental/Prof. Adjunto mails: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] tel: 73-3634.2779 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Esta mensagem foi verificada pelo E-mail Protegido Terra. Scan engine: McAfee VirusScan / Atualizado em 08/07/2005 / Versão: 4.4.00 - Dat 4531 Proteja o seu e-mail Terra: http://mail.terra.com.br/ -- Jose Claudio Faria Brasil/Bahia/UESC/DCET Estatistica Experimental/Prof. Adjunto mails: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] tel: 73-3634.2779 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting
Re: [R] O/T -2 Log Lambda and Chi Square
There is a huge and growing literature on this, including Crainiceanu, Ruppert and Vogelsang (2003) some properties of likelihood ratio tests in linear mixed models (http://www.orie.cornell.edu/~davidr/papers/zeroprob_rev01.pdf). The nlme package includes a function simulate.lme to evalute the adequacy of alternative distributions for 2*log(likelihood ratio) for the results of lme. Much of the careful work on this rests on asymptotic normality of the maximum likelihood estimates, and this is the same for 2*log(likelihood ratio) as the standard quadratic form in the MLEs. However, the latter is affected by parameter effects, whereas the likelihood ratio statistic is only impacted by the intrinsic curvature of the manifold upon which the log(likelihood) vector is projected to obtain the MLEs. For nonlinear regression, Bates and Watts (1988) Nonlinear Regression and Its Applications (Wiley) computed measures of intrinsic and parameter effects curvature for a number of published nonlinear regression examples. In nearly all their examples, the intrinsic curvature was in negligible, especially when compared to the parameter effects. If this does not answer your question (or lead you to an answer), please try a more specific question. spencer graves Laura Holt wrote: Hi R People: Sorry about the off topic question. Does anyone know the reference for -2 Log Lambda is approx dist. Chi square, please? It may be Bartlett, but I'm not sure thanks in advance! Sincerely, Laura Holt mailto: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA [EMAIL PROTECTED] www.pdf.com http://www.pdf.com Tel: 408-938-4420 Fax: 408-280-7915 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help: Mahalanobis distances between 'Species' from iris
Dear Mulholland, I'm sorry, I was not clearly and objective sufficiently. Below you can see what I'm was trying to do: # D2Mah by JCFaria and Gabor Grothiendieck: 10/7/05 20:46:41 D2Mah = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y= as.matrix(y) x= as.factor(x) man = manova(y ~ x) E= summary(man)$SS[2] #Matrix E S= as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) f = function(a,b) mapply(function(a,b) mahalanobis(mds[a,], mds[b,], InvS, TRUE), a, b) seq. = seq(length = nrow(mds)) names(seq.) = levels(x) D2 = outer(seq., seq., f) } # # test # D2M = D2Mah(iris[,1:4], iris[,5]) print(D2M) dend = hclust(as.dist(sqrt(D2M)), method='complete') plot(dend) So, Thanks for the reply. Best, JCFaria Mulholland, Tom wrote: Why don't you use mahalanobis in the stats package. The function Returns the Mahalanobis distance of all rows in 'x' and the vector mu='center' with respect to Sigma='cov'. This is (for vector 'x') defined as D^2 = (x - mu)' Sigma^{-1} (x - mu) I don't have any idea what this does but it appears to be talking about the same topic. If it is not suitable package fpc has related functions and adehabitat has a function for Habitat Suitability Mapping with Mahalanobis Distances Tom -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Jose Claudio Faria Sent: Thursday, 7 July 2005 5:29 AM To: r-help@stat.math.ethz.ch Subject: [R] Help: Mahalanobis distances between 'Species' from iris Dear R list, I'm trying to calculate Mahalanobis distances for 'Species' of 'iris' data as obtained below: Squared Distance to Species From Species: Setosa Versicolor Virginica Setosa 0 89.86419 179.38471 Versicolor 89.86419 0 17.20107 Virginica 179.38471 17.20107 0 This distances above were obtained with proc 'CANDISC' of SAS, please, see Output 21.1.2: Iris Data: Squared Mahalanobis Distances from http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/ chap21/sect19.htm From this distance my intention is to make a cluster analysis as below, using the package 'mclust': # # --- Begin R script --- # # For units compatibility of 'iris' from R dataset and 'iris' data used in # the SAS example: Measures = iris[,1:4]*10 Species = iris[,5] irisSAS = data.frame(Measures, Species) n = 3 Mah = c(0, 89.86419,0, 179.38471, 17.20107, 0) # My Question is: how to obtain 'Mah' with R from 'irisSAS' data? D = matrix(0, n, n) nam = c('Set', 'Ver', 'Vir') rownames(D) = nam colnames(D) = nam k = 0 for (i in 1:n) { for (j in 1:i) { k = k+1 D[i,j] = Mah[k] D[j,i] = Mah[k] } } D=sqrt(D) #D2 - D library(mclust) dendroS = hclust(as.dist(D), method='single') dendroC = hclust(as.dist(D), method='complete') win.graph(w = 3.5, h = 6) split.screen(c(2, 1)) screen(1) plot(dendroS, main='Single', sub='', xlab='', ylab='', col='blue') screen(2) plot(dendroC, main='Complete', sub='', xlab='', col='red') # # --- End R script --- # I always need of this type of analysis and I'm not founding how to make it in the CRAN documentation (Archives, packages: mclust, cluster, fpc and mva). Regards, -- Jose Claudio Faria Brasil/Bahia/UESC/DCET Estatistica Experimental/Prof. Adjunto mails: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Boxplot in R
1) The boxplot in R does the 25%, 50% and 75% mark as you want 2) Check out the range argument in boxplot. I think you can redefine your 5% and 95% quantile in terms of IQR for symmetric distribution and hence use this feature. However I find it easier to calculate these numbers manually and feed it into bxp() function. Here is one such function (with lots of room for improvement). matlab.boxplot - function(m){ m - as.matrix(m) bp - boxplot(data.frame(m), plot=FALSE) bp$stats - apply( m, 2, function(x) quantile(x, c(0.05,0.25, 0.5, 0.75, 0.95), na.rm=T) ) tmp - apply( m, 2, function(x){ under - x[ which( x quantile(x, 0.05, na.rm=T) ) ] over - x[ which( x quantile(x, 0.95, na.rm=T) ) ] return( c(under, over) ) }) # always a matrix in this case bp$out - c(tmp) bp$group - rep(1:ncol(tmp), each=nrow(tmp)) bxp(bp) } Some usage examples : matlab.boxplot( rnorm(50) )# a vector my.mat - matrix( rnorm(300), nc=3 ) matlab.boxplot( my.mat ) # a matrix Regards, Adai On Sun, 2005-07-10 at 18:10 -0500, Larry Xie wrote: I am trying to draw a plot like Matlab does: The upper extreme whisker represents 95% of the data; The upper hinge represents 75% of the data; The median represents 50% of the data; The lower hinge represents 25% of the data; The lower extreme whisker represents 5% of the data. It looks like: --- 95% | | --- 75% | | |-| 50% | | | | --- 25% | --- 5% Anyone can give me some hints as to how to draw a boxplot like that? What function does it? I tried boxplot() but couldn't figure it out. If it's boxplot(), what arguments should I pass to the function? Thank you for your help. I'd appreciate it. Larry __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Boxplot in R
Just an addendum on the philosophical aspect of doing this. By selecting the 5% and 95% quantiles, you are always going to get 10% of the data as extreme and these points may not necessarily outliers. So when you are comparing information from multiple columns (i.e. boxplots), it is harder to say which column contains more extreme value compared to others etc. Regards, Adai On Sun, 2005-07-10 at 18:10 -0500, Larry Xie wrote: I am trying to draw a plot like Matlab does: The upper extreme whisker represents 95% of the data; The upper hinge represents 75% of the data; The median represents 50% of the data; The lower hinge represents 25% of the data; The lower extreme whisker represents 5% of the data. It looks like: --- 95% | | --- 75% | | |-| 50% | | | | --- 25% | --- 5% Anyone can give me some hints as to how to draw a boxplot like that? What function does it? I tried boxplot() but couldn't figure it out. If it's boxplot(), what arguments should I pass to the function? Thank you for your help. I'd appreciate it. Larry __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html