date:20050710

Re: [R] package loading smooth.lf (LOCFIT), couldn't find function smooth.lf

2005-07-10 Thread Uwe Ligges

Y Y wrote:

 After loading locfit, I am unable to access functions within locfit.
 
 following
 http://www.herine.net/locfit/start.html
 
 
library(locfit)
x - 10*runif(100)
y - 5*sin(x)+rnorm(100)
fit - smooth.lf(x,y)
 
 Error: couldn't find function smooth.lf

So it is time to ask the maintainers of the package and the cited URL 
(CCing both) to update at least one of them (package or 
http://www.herine.net/locfit/start.html)

While the web page states the last update is (December 16, 2004 
version), the version on CRAN is locfit_1.1-9.tar.gz dated 14-Sep-2004.

Uwe Ligges



 
 
fit - locfit(y~lp(x))
 
 Error in eval(expr, envir, enclos) : couldn't find function lp
 
 library() or package manager GUI  tells me locfit is loaded
 
 Any ideas on how to fix this ?
 
 SS, running on R 2.1, MacOS 10.3.9
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Garch in a model with explanatory variables

2005-07-10 Thread Tobias Muhlhofer

The Ox interface in fSeries is quite an easy way to accomplish this, 
although it produces some garbage, both in your current environment 
within R, as well as in the directory in which you are running R. You 
have to be careful also if you're on a Linux or other UNIX system, as 
the function has Windows pathnames hard coded into it, which you need to 
alter to the ones where your Ox resides.

Tobias

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Quantile normalization and NA

2005-07-10 Thread Uwe Ligges

Ravi Murthy wrote:

 Hi,
 I am new to R,
 I am doing quantile normalization with a dat matix of
 384X124 and I find that while computing the quantile
 normailzation it introduces 'NA' into some of the
 cells, can someone help me to overcome this problem ?
 
 
 This is the command that goes like upto g62 for 124
 colomns
 
 
g1 - normalize.quantiles(exprs(MSExpr[,1:2]))

Do you mean the function normalize.quantiles() from package affy 
(please always tell the package, if the function is not in base R)?
It's more appropriate to ask on the Bioconductor mailing list if 
Bioconductor packages are the subject of interest.

And you might want to give a simple, reproducible, but 
non-bandwith-wasting example (perhaps by uploadiung data to some web 
site) in order to make the Bioconductor folks able help you.

Uwe Ligges

 
 For a small set of data there is no problem, but for a
 large set of data, it introduces NA in the place
 where it is suppose to geneerate data .
 
 Ravi
 [EMAIL PROTECTED]  
 
 Raviabi
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Using a string as a filter

2005-07-10 Thread Yzhar Toren

Hi ,

I want to be able to filter out results using a string. I'm running an 
automated script that reads a list of filters I get from an external 
source and applys them to my data frame consecutively.

For example I want to get : data[protocol==1], data[protocol==2] ...

If I define
filter1 - protocol==1 (as a string)
filter2 - protocol==2
...
How can I use these variables to choose subsets ?

I managed to find a very awkward method by using a function that calls a 
formula (and using as.formula() for the string I want to get), but I 
would love to find a more efficient way

Thank you !
Yzhar Toren, Tel-Aviv university.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] package loading smooth.lf (LOCFIT), couldn't find functio n smooth.lf

2005-07-10 Thread Liaw, Andy

The version of locfit on the web site mentioned apparently has been revised
by Prof. Loader, and is newer than the CRAN version that I have been
maintaining.  If Prof. Loader is OK with it, I will take a look and see if I
can get the new version into CRAN-conforming form and upload to CRAN.

Meanwhile, make sure you're using the package from Prof. Loader's web page,
instead of the one on CRAN, if you want the new features.

Andy

 From: Uwe Ligges 
 
 Y Y wrote:
 
  After loading locfit, I am unable to access functions within locfit.
  
  following
  http://www.herine.net/locfit/start.html
  
  
 library(locfit)
 x - 10*runif(100)
 y - 5*sin(x)+rnorm(100)
 fit - smooth.lf(x,y)
  
  Error: couldn't find function smooth.lf
 
 So it is time to ask the maintainers of the package and the cited URL 
 (CCing both) to update at least one of them (package or 
 http://www.herine.net/locfit/start.html)
 
 While the web page states the last update is (December 16, 2004 
 version), the version on CRAN is locfit_1.1-9.tar.gz dated 
 14-Sep-2004.
 
 Uwe Ligges
 
 
 
  
  
 fit - locfit(y~lp(x))
  
  Error in eval(expr, envir, enclos) : couldn't find function lp
  
  library() or package manager GUI  tells me locfit is loaded
  
  Any ideas on how to fix this ?
  
  SS, running on R 2.1, MacOS 10.3.9
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Quantile normalization and NA

2005-07-10 Thread Adaikalavan Ramasamy

I agree with Uwe on both points.

Have you checked if your inputs, i.e. exprs(MSExpr[ ,1:2]), contain
missing values to begin with ?

Out of curiosity are you trying to apply this method to two colour
arrays ? You might want to enquire the BioConductor mailing list about
the merits of doing this over standard techniques (i.e. LOESS
normalisation) as I do not think this is widely done.

Regards, Adai



On Sun, 2005-07-10 at 12:34 +0200, Uwe Ligges wrote:
 Ravi Murthy wrote:
 
  Hi,
  I am new to R,
  I am doing quantile normalization with a dat matix of
  384X124 and I find that while computing the quantile
  normailzation it introduces 'NA' into some of the
  cells, can someone help me to overcome this problem ?
  
  
  This is the command that goes like upto g62 for 124
  colomns
  
  
 g1 - normalize.quantiles(exprs(MSExpr[,1:2]))
 
 Do you mean the function normalize.quantiles() from package affy 
 (please always tell the package, if the function is not in base R)?
 It's more appropriate to ask on the Bioconductor mailing list if 
 Bioconductor packages are the subject of interest.
 
 And you might want to give a simple, reproducible, but 
 non-bandwith-wasting example (perhaps by uploadiung data to some web 
 site) in order to make the Bioconductor folks able help you.
 
 Uwe Ligges
 
  
  For a small set of data there is no problem, but for a
  large set of data, it introduces NA in the place
  where it is suppose to geneerate data .
  
  Ravi
  [EMAIL PROTECTED]  
  
  Raviabi
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Help to make a specific matrix

2005-07-10 Thread Adaikalavan Ramasamy

In addition to Gabor's solution, you might be interested in the
combinations function from the gtools package.

 library(gtools)
 combinations(5,2)
  [,1] [,2]
 [1,]12
 [2,]13
 [3,]14
 [4,]15
 [5,]23
 [6,]24
 [7,]25
 [8,]34
 [9,]35
[10,]45



On Sat, 2005-07-09 at 21:42 -0300, Jose Claudio Faria wrote:
 Dear R users,
 
 The solution is probably simple but I need someone to point me to it.
 How can I to generate a matrix from a numeric sequence of 1:10 like 'A' or 
 'B' 
 below:
 
 A)
 ||
 |  1  2  3  4  5 |
 ||
 | 1 |  0 |
 | 2 |  1  0  |
 | 3 |  2  5  0   |
 | 4 |  3  6  8  0|
 | 5 |  4  7  9 10  0 |
 ||
 
 B)
 ||
 |  1  2  3  4  5 |
 ||
 | 1 |  0  1  2  3  4 |
 | 2 |  1  0  5  6  7 |
 | 3 |  2  5  0  8  9 |
 | 4 |  3  6  8  0 10 |
 | 5 |  4  7  9 10  0 |
 ||
 
 This question is related with the possible combinations of five objects two 
 the 
 two, i.e, C(5,2).
 
 Any help would be greatly appreciated.
 
 Regards,

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Using a string as a filter

2005-07-10 Thread Gabor Grothendieck

On 7/10/05, Yzhar Toren [EMAIL PROTECTED] wrote:
 Hi ,
 
 I want to be able to filter out results using a string. I'm running an
 automated script that reads a list of filters I get from an external
 source and applys them to my data frame consecutively.
 
 For example I want to get : data[protocol==1], data[protocol==2] ...
 
 If I define
 filter1 - protocol==1 (as a string)
 filter2 - protocol==2
 ...

protocol - 1:5
data - 11:15
filter - protocol==1
data[eval(parse(text=filter))] # 11

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] comparing strength of association instead of strength of evidence?

2005-07-10 Thread Kjetil Brinchmann Halvorsen

Weiwei Shi wrote:

Dear all:
I still need some further help since I think the question itself might
be very interesting (i hope so:) :
the question is on chisq.test, my concern is which criteria should be
used here to evaluate the independence. The reason i use this old
subject of the email is, b/c I think the problem here is about how to
look at p.value, which evaluate the strength of evidence instead of
association. If i am wrong, please correct me.

the result looks like this:
   index   word.comb id in.class0 in.class1  p.value odds.ratio
1  1  TOTAL|LAID 54|241 2 4 0.0004997501 0.00736433
2  2 THEFT|RECOV  52|53 40751   146 0.0004997501 4.17127643
3  3  BOLL|ACCID  10|21 36825  1202 0.0004997501 0.44178546
4  4  LAB|VANDAL   8|55 24192   429 0.0004997501 0.82876099
5  5 VANDAL|CAUS  55|59   80164 0.0004997501 0.18405918
6  6AI|TOTAL   9|54  194945 0.0034982509 0.63766766
7  7AI|RECOV   9|53  238561 0.0004997501 0.57547012
8  8 THEFT|TOTAL  52|54 33651   110 0.0004997501 4.56174408

the target is to look for best subset of word.comb to differentiate
between class0 and class1. p.value is obtained via
p.chisq.sim[i] - as.double(chisq.test(tab, sim=TRUE, B=myB)$p.value)
and B=2 (I increased B and it won't help. the margin here is
class0=2162792
class1=31859
)

So, in conclusion, which one I should use first, odds.ratio or p.value
to find the best subset.

  

If your goal is to discriminate between two different classes, why not 
calculate directly
a measure of discriminative ability, such as probability of correct 
classification?

Kjetil

I read some on feature selection in text categorization (A comparative
study on feature selection in text categorization might be a good
ref.). Anyone here has other suggestions?

thanks,

weiwei


On 6/24/05, Gabor Grothendieck [EMAIL PROTECTED] wrote:
  

On 6/24/05, Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] wrote:


Weiwei Shi wrote:

  

Hi,
I asked this question before, which was hidden in a bunch of
questions. I repharse it here and hope I can get some help this time:

I have 2 contingency tables which have the same group variable Y. I
want to compare the strength of association between X1/Y and X2/Y. I
am not sure if comparing p-values IS the way  even though the
probability of seeing such weird observation under H0 defines
p-value and it might relate to the strength of association somehow.
But I read the following statement from Alan Agresti's An
Introduction to Categorical Data Analysis :
Chi-squared tests simply indicate the degree of EVIDENCE for an
associationIt is sensible to decompose chi-squared into
components, study residuals, and estimate parameters such as odds
ratios that describe the STRENGTH OF ASSOCIATION.





Here are some things you can do:

  tab1-array(c(11266, 125, 2151526, 31734), dim=c(2,2))

  tab2-array(c(43571, 52, 2119221, 31807), dim=c(2,2))
  library(epitools) # on CRAN
  ?odds.ratio
Help for 'odds.ratio' is shown in the browser
  library(help=epitools) # on CRAN
  tab1
 [,1][,2]
[1,] 11266 2151526
[2,]   125   31734
  odds.ratio(11266, 125, 2151526, 31734)
Error in fisher.test(tab) : FEXACT error 40.
Out of workspace. # so this are evidently for tables
with smaller counts
  library(vcd) # on CRAN

  ?oddsratio
Help for 'oddsratio' is shown in the browser
  oddsratio( tab1)  # really is logodds ratio
[1] 0.2807548
  plot(oddsratio( tab1) )
  library(help=vcd) # on CRAN  Read this for many nice functions.
  fourfoldplot(tab1)
  mosaicplot(tab1) # not really usefull for this table

Also has a look at function Crosstable in package gmodels.

To decompose the chisqure you can program yourselves:

decomp.chi - function(tab) {
   rows -  rowSums(tab)
   cols -   colSums(tab)
   N -   sum(rows)
E - rows %o% cols / N
contrib - (tab-E)^2/E
contrib }


  decomp.chi(tab1)
 [,1] [,2]
[1,] 0.1451026 0.0007570624
[2,] 9.8504915 0.0513942218
 

So you can easily see what cell contributes most to the overall chisquared.

Kjetil





  

Can I do this decomposition in R for the following example including
2 contingency tables?





tab1-array(c(11266, 125, 2151526, 31734), dim=c(2,2))
tab1


  

 [,1][,2]
[1,] 11266 2151526
[2,]   125   31734





tab2-array(c(43571, 52, 2119221, 31807), dim=c(2,2))
tab2


  

 [,1][,2]
[1,] 43571 2119221
[2,]52   31807



Here are a few more ways of doing this using chisq.test,
glm and assocplot:



## chisq.test ###
  

tab1.chisq - chisq.test(tab1)
  

# decomposition of chisq
resid(tab1.chisq)^2
  

  [,1] [,2]
[1,] 0.1451026 0.0007570624
[2,] 9.8504915 0.0513942218



# same
with(tab1.chisq, (observed - expected)^2/expected)
  

  [,1] [,2]
[1,] 0.1451026

Re: [R] aregImpute: beginner's question

2005-07-10 Thread Frank E Harrell Jr

Anders Schwartz Corr wrote:

. . .
 
 podb2+propdemocracy+avetrade1984dollars+concentration+cycle+polarity+propmid+terrgainer+
 demgainer+ fedgainer+ popdengainer+ urbpopgainer+ tradeopgainer+
 gdppcgainer+ terrloser+ demloser+ fedloser+ popdenloser+ urbpoploser+
 tradeoploser+ gdppcloser, n.impute=100, defaultLinear=TRUE, data=d)
 Iteration:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34
 Error in lm.fit.qr.bare(f$tx, f$ty) :
 NA/NaN/Inf in foreign function call (arg 1)

This is probably a singularity.  Remove one variable at a time from the 
formula and re-run aregImpute.  That may help you find a culprit.  Also, 
you may not need 100 imputations.

Frank

 
par(mfrow=c(2,3))
plot(f, diagnostics=TRUE, maxn=2)
 
 22 fmi - fit.mult.impute(y ~
 podb2+propdemocracy+avetrade1984dollars+concentration+cycle+polarity+propmid+terrgainer+
 demgainer+ fedgainer+ popdengainer+ urbpopgainer+ tradeopgainer+
 gdppcgainer+ terrloser+ demloser+ fedloser+ popdenloser+ urbpoploser+
 tradeoploser+ gdppcloser, lm, f,
 +data=d)
 Error in impute.transcan(xtrans, imputation = i, data = data, list.out =
 TRUE,  :
 inconsistant naming of observations led to differing length
 vectors
 

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing data imputation

2005-07-10 Thread Frank E Harrell Jr

(Ted Harding) wrote:
[...]

 
 In many cases people simply treat negative estimates of variables
 which are intrinsically non-negative very crudely: if it comes
 out negative, replaceit with zero. This too is often a quick
 fix where the fact that it is a lie simply has no practical
 importance. But, of course, it may matter! That depends ...
 (see above).

That will result in a strange distribution of imputed values.

. . .

 I've also noted Frank Harrel's comment about aregImpute, and
 will bear it in mind. Note, however, that this does not do
 multiple imputation on the same lines as NORM (or the other
 Shafer-derived MI packages). See ?aregImpute section Details.
 And, specifically, from the Description:

It is different, but aregImpute approximates the full Bayesian 
procedure.  MICE is another approach to approximating it, and aregImpute 
seems to agree well with MICE when you force linearity in aregImpute 
(because like NORM, MICE cannot handle nonlinearity).

Frank

 
   The 'transcan' function creates flexible additive imputation
models but provides only an approximation to true multiple
imputation as the imputation models are fixed before all
multiple imputations are drawn. This ignores variability
caused by having to fit the imputation models. 'aregImpute'
takes all aspects of uncertainty in the imputations into
account by using the bootstrap to approximate the process
of drawing predicted values from a full Bayesian predictive
distribution.
 
 so that the Rubin/Shafer method described above (see paragraph
 about dispersion of imputed values) is not fully implemented.
   
 Best wishes,
 Ted.
 


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] comparing strength of association instead of strength of evidence?

2005-07-10 Thread Weiwei Shi

For this step, my purpose is feature construction and feature
subsetting. In this project, I am using contrast association rule
mining to build word-combinations; in the previous example, 2-itemset
was created from CAR and tested for their dependency on class for
feature subsetting/selection. Other methods like mutual information
might be used too.  Any methods that won't replicate mechanisms in
my following step (see below) can be tried here.  Could you detail
what you meant by a discriminative measure?

The whole datasets also contain many non-words variables. To combine
both data mining and text categorization is the focus and interests of
this project. Decision tree, Bayesian network or SVM/LSI might be
candidates.

Thanks for further suggestion,

Weiwei



On 7/10/05, Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] wrote:
 Weiwei Shi wrote:
 
 Dear all:
 I still need some further help since I think the question itself might
 be very interesting (i hope so:) :
 the question is on chisq.test, my concern is which criteria should be
 used here to evaluate the independence. The reason i use this old
 subject of the email is, b/c I think the problem here is about how to
 look at p.value, which evaluate the strength of evidence instead of
 association. If i am wrong, please correct me.
 
 the result looks like this:
index   word.comb id in.class0 in.class1  p.value odds.ratio
 1  1  TOTAL|LAID 54|241 2 4 0.0004997501 0.00736433
 2  2 THEFT|RECOV  52|53 40751   146 0.0004997501 4.17127643
 3  3  BOLL|ACCID  10|21 36825  1202 0.0004997501 0.44178546
 4  4  LAB|VANDAL   8|55 24192   429 0.0004997501 0.82876099
 5  5 VANDAL|CAUS  55|59   80164 0.0004997501 0.18405918
 6  6AI|TOTAL   9|54  194945 0.0034982509 0.63766766
 7  7AI|RECOV   9|53  238561 0.0004997501 0.57547012
 8  8 THEFT|TOTAL  52|54 33651   110 0.0004997501 4.56174408
 
 the target is to look for best subset of word.comb to differentiate
 between class0 and class1. p.value is obtained via
 p.chisq.sim[i] - as.double(chisq.test(tab, sim=TRUE, B=myB)$p.value)
 and B=2 (I increased B and it won't help. the margin here is
 class0=2162792
 class1=31859
 )
 
 So, in conclusion, which one I should use first, odds.ratio or p.value
 to find the best subset.
 
 
 
 If your goal is to discriminate between two different classes, why not
 calculate directly
 a measure of discriminative ability, such as probability of correct
 classification?
 
 Kjetil
 
 I read some on feature selection in text categorization (A comparative
 study on feature selection in text categorization might be a good
 ref.). Anyone here has other suggestions?
 
 thanks,
 
 weiwei
 
 
 On 6/24/05, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 
 
 On 6/24/05, Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] wrote:
 
 
 Weiwei Shi wrote:
 
 
 
 Hi,
 I asked this question before, which was hidden in a bunch of
 questions. I repharse it here and hope I can get some help this time:
 
 I have 2 contingency tables which have the same group variable Y. I
 want to compare the strength of association between X1/Y and X2/Y. I
 am not sure if comparing p-values IS the way  even though the
 probability of seeing such weird observation under H0 defines
 p-value and it might relate to the strength of association somehow.
 But I read the following statement from Alan Agresti's An
 Introduction to Categorical Data Analysis :
 Chi-squared tests simply indicate the degree of EVIDENCE for an
 associationIt is sensible to decompose chi-squared into
 components, study residuals, and estimate parameters such as odds
 ratios that describe the STRENGTH OF ASSOCIATION.
 
 
 
 
 
 Here are some things you can do:
 
   tab1-array(c(11266, 125, 2151526, 31734), dim=c(2,2))
 
   tab2-array(c(43571, 52, 2119221, 31807), dim=c(2,2))
   library(epitools) # on CRAN
   ?odds.ratio
 Help for 'odds.ratio' is shown in the browser
   library(help=epitools) # on CRAN
   tab1
  [,1][,2]
 [1,] 11266 2151526
 [2,]   125   31734
   odds.ratio(11266, 125, 2151526, 31734)
 Error in fisher.test(tab) : FEXACT error 40.
 Out of workspace. # so this are evidently for tables
 with smaller counts
   library(vcd) # on CRAN
 
   ?oddsratio
 Help for 'oddsratio' is shown in the browser
   oddsratio( tab1)  # really is logodds ratio
 [1] 0.2807548
   plot(oddsratio( tab1) )
   library(help=vcd) # on CRAN  Read this for many nice functions.
   fourfoldplot(tab1)
   mosaicplot(tab1) # not really usefull for this table
 
 Also has a look at function Crosstable in package gmodels.
 
 To decompose the chisqure you can program yourselves:
 
 decomp.chi - function(tab) {
rows -  rowSums(tab)
cols -   colSums(tab)
N -   sum(rows)
 E - rows %o% cols / N
 contrib - (tab-E)^2/E
 contrib }
 
 
   decomp.chi(tab1)
  [,1] [,2]
 [1,]

Re: [R] Help with Mahalanobis

2005-07-10 Thread Jose Claudio Faria

Well, as I did not get a satisfactory reply to the original question I tried to 
make a basic function that, I find, solve the question.

I think it is not the better function, but it is working.

So, perhaps it can be useful to other people.

#
# Calculate the matrix of Mahalanobis Distances between groups
# from data.frames
#
# by: José Cláudio Faria
# date: 10/7/05 13:23:48
#

D2Mah = function(y, x) {

   stopifnot(is.data.frame(y), !missing(x))
   stopifnot(dim(y)[1] != dim(x)[1])
   y= as.matrix(y)
   x= as.factor(x)
   man  = manova(y ~ x)
   E= summary(man)$SS[2] #Matrix E
   S= as.matrix(E$Residuals)/man$df.residual
   InvS = solve(S)
   mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))

   colnames(mds) = names(y)
   Objects   = levels(x)
   rownames(mds) = Objects

   library(gtools)
   nObjects = nrow(mds)
   comb = combinations(nObjects, 2)

   tmpD2 = numeric()
   for (i in 1:dim(comb)[1]){
 a = comb[i,1]
 b = comb[i,2]
 tmpD2[i] = (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,])
   }

   # Thanks Gabor for the below
   tmpMah = matrix(0, nObjects, nObjects, dimnames=list(Objects, Objects))
   tmpMah[lower.tri(tmpMah)] = tmpD2
   D2 = tmpMah + t(tmpMah)
   return(D2)
}

#
# To try
#
D2M = D2Mah(iris[,1:4], iris[,5])
print(D2M)

Thanks all for the complementary aid (specially to Gabor).

Regards,
-- 
Jose Claudio Faria
Brasil/Bahia/UESC/DCET
Estatistica Experimental/Prof. Adjunto
mails:
  [EMAIL PROTECTED]
  [EMAIL PROTECTED]
  [EMAIL PROTECTED]
tel: 73-3634.2779

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Help with Mahalanobis

2005-07-10 Thread Gabor Grothendieck

I think you could simplify this by replacing everything after the
nObjects = nrow(mds) line with just these two statements.

  f - function(a,b) mapply(function(a,b)
(mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b)

  D2 - outer(seq(nObjects), seq(nObjects), f)

This also eliminates dependence on gtools and the complexity
of dealing with triangular matrices.

Regards.

Here it is in full:

D2Mah2 = function(y, x) {

  stopifnot(is.data.frame(y), !missing(x))
  stopifnot(dim(y)[1] != dim(x)[1])
  y= as.matrix(y)
  x= as.factor(x)
  man  = manova(y ~ x)
  E= summary(man)$SS[2] #Matrix E
  S= as.matrix(E$Residuals)/man$df.residual
  InvS = solve(S)
  mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))

  library(gtools)
  nObjects = nrow(mds)

  ### changed part is next two statements
  f - function(a,b) mapply(function(a,b)
(mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b)

  D2 - outer(seq(nObjects), seq(nObjects), f)
}

#
# test
#
D2M2 = D2Mah2(iris[,1:4], iris[,5])
print(D2M2)




On 7/10/05, Jose Claudio Faria [EMAIL PROTECTED] wrote:
 Well, as I did not get a satisfactory reply to the original question I tried 
 to
 make a basic function that, I find, solve the question.
 
 I think it is not the better function, but it is working.
 
 So, perhaps it can be useful to other people.
 
 #
 # Calculate the matrix of Mahalanobis Distances between groups
 # from data.frames
 #
 # by: José Cláudio Faria
 # date: 10/7/05 13:23:48
 #
 
 D2Mah = function(y, x) {
 
   stopifnot(is.data.frame(y), !missing(x))
   stopifnot(dim(y)[1] != dim(x)[1])
   y= as.matrix(y)
   x= as.factor(x)
   man  = manova(y ~ x)
   E= summary(man)$SS[2] #Matrix E
   S= as.matrix(E$Residuals)/man$df.residual
   InvS = solve(S)
   mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))
 
   colnames(mds) = names(y)
   Objects   = levels(x)
   rownames(mds) = Objects
 
   library(gtools)
   nObjects = nrow(mds)
   comb = combinations(nObjects, 2)
 
   tmpD2 = numeric()
   for (i in 1:dim(comb)[1]){
 a = comb[i,1]
 b = comb[i,2]
 tmpD2[i] = (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,])
   }
 
   # Thanks Gabor for the below
   tmpMah = matrix(0, nObjects, nObjects, dimnames=list(Objects, Objects))
   tmpMah[lower.tri(tmpMah)] = tmpD2
   D2 = tmpMah + t(tmpMah)
   return(D2)
 }
 
 #
 # To try
 #
 D2M = D2Mah(iris[,1:4], iris[,5])
 print(D2M)
 
 Thanks all for the complementary aid (specially to Gabor).
 
 Regards,
 --
 Jose Claudio Faria
 Brasil/Bahia/UESC/DCET
 Estatistica Experimental/Prof. Adjunto
 mails:
  [EMAIL PROTECTED]
  [EMAIL PROTECTED]
  [EMAIL PROTECTED]
 tel: 73-3634.2779
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] not supressing leading zeros when reading a table?

2005-07-10 Thread Adrian Dusa

Dear R list,

I have a dataset with a column which should be read as character, like this:

  name surname answer
1 xx   yyy 00100
2 rrr  hhh 01

When reading this dataset with read.table, I get
1 xx   yyy 100
2 rrr  hhh 1

The string column consists in answers to multiple choice questions, not all 
having the same number of answers. I could format the answers using formatC but 
there are over a hundred different questions in there.

I tried with quote=\' without any luck. Googling after this take me nowhere 
either. It should be simple but I seem to miss it...
Can anybody point me to the right direction?

TIA,
Adrian

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] not supressing leading zeros when reading a table?

2005-07-10 Thread Marc Schwartz

On Sun, 2005-07-10 at 18:13 +, Adrian Dusa wrote:
 Dear R list,
 
 I have a dataset with a column which should be read as character, like this:
 
   name surname answer
 1 xx   yyy 00100
 2 rrr  hhh 01
 
 When reading this dataset with read.table, I get
 1 xx   yyy 100
 2 rrr  hhh 1
 
 The string column consists in answers to multiple choice questions, not all 
 having the same number of answers. I could format the answers using formatC 
 but 
 there are over a hundred different questions in there.
 
 I tried with quote=\' without any luck. Googling after this take me 
 nowhere 
 either. It should be simple but I seem to miss it...
 Can anybody point me to the right direction?
 
 TIA,
 Adrian

With your example data saved in a file called test.txt:

 df - read.table(test.txt, header = TRUE, colClasses = character)

 df
  name surname answer
1   xx yyy  00100
2  rrr hhh 01

 str(df)
`data.frame':   2 obs. of  3 variables:
 $ name   : chr  xx rrr
 $ surname: chr  yyy hhh
 $ answer : chr  00100 01


See the colClasses argument in ?read.table.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] not supressing leading zeros when reading a table?

2005-07-10 Thread Duncan Murdoch

Adrian Dusa wrote:
 Dear R list,
 
 I have a dataset with a column which should be read as character, like this:
 
   name surname answer
 1 xx   yyy 00100
 2 rrr  hhh 01
 
 When reading this dataset with read.table, I get
 1 xx   yyy 100
 2 rrr  hhh 1
 
 The string column consists in answers to multiple choice questions, not all 
 having the same number of answers. I could format the answers using formatC 
 but 
 there are over a hundred different questions in there.
 
 I tried with quote=\' without any luck. Googling after this take me 
 nowhere 
 either. It should be simple but I seem to miss it...
 Can anybody point me to the right direction?

By default, read.table guesses about the column type. Yours looks 
numeric, even though it is not.

Use the colClasses argument of read.table to specify the column type. 
If you only have the 3 columns above, colClasses=character should work.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] not supressing leading zeros when reading a table?

2005-07-10 Thread alejandro munoz

Adrian,

To prevent coercion to numeric, try:

mydata - read.table(myfile, colClasses=character)

HTH.

alejandro
On 7/10/05, Adrian Dusa [EMAIL PROTECTED] wrote:
 Dear R list,
 
 I have a dataset with a column which should be read as character, like this:
 
   name surname answer
 1 xx   yyy 00100
 2 rrr  hhh 01
 
 When reading this dataset with read.table, I get
 1 xx   yyy 100
 2 rrr  hhh 1
 
 The string column consists in answers to multiple choice questions, not all
 having the same number of answers. I could format the answers using formatC 
 but
 there are over a hundred different questions in there.
 
 I tried with quote=\' without any luck. Googling after this take me nowhere
 either. It should be simple but I seem to miss it...
 Can anybody point me to the right direction?
 
 TIA,
 Adrian
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] not supressing leading zeros when reading a table?

2005-07-10 Thread Adrian Dusa

On 7/10/05, alejandro munoz [EMAIL PROTECTED] wrote:
 Adrian,
 
 To prevent coercion to numeric, try:
 
 mydata - read.table(myfile, colClasses=character)
 
 HTH.
 
 alejandro
 On 7/10/05, Adrian Dusa [EMAIL PROTECTED] wrote:
  Dear R list,
 [...snip...]

Thank you all, I got it. 
This is my favourite super fast ever helpful help list (gosh, I didn't
even expect an answer Sundays at 10 pm! ).
Best,
Adrian

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] O/T -2 Log Lambda and Chi Square

2005-07-10 Thread Laura Holt

Hi R People:

Sorry about the off topic question.  Does anyone know the reference
for -2 Log Lambda  is approx dist. Chi square, please?

It may be Bartlett, but I'm not sure

thanks in advance!

Sincerely,
Laura Holt
mailto: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Help with Mahalanobis

2005-07-10 Thread Gabor Grothendieck

And here is one more simplification using the buildin mahalanobis
function:

D2Mah3 = function(y, x) {

 stopifnot(is.data.frame(y), !missing(x))
 stopifnot(dim(y)[1] != dim(x)[1])
 y= as.matrix(y)
 x= as.factor(x)
 man  = manova(y ~ x)
 E= summary(man)$SS[2] #Matrix E
 S= as.matrix(E$Residuals)/man$df.residual
 InvS = solve(S)
 mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))

 nObjects = nrow(mds)

 ### changed part is next two statements
 f - function(a,b) mapply(function(a,b)
   mahalanobis(mds[a,],mds[b,],InvS,TRUE), a, b)

 D2 - outer(seq(nObjects), seq(nObjects), f)
}

#
# test
#
D2M3 = D2Mah3(iris[,1:4], iris[,5])


On 7/10/05, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 I think you could simplify this by replacing everything after the
 nObjects = nrow(mds) line with just these two statements.
 
  f - function(a,b) mapply(function(a,b)
(mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b)
 
  D2 - outer(seq(nObjects), seq(nObjects), f)
 
 This also eliminates dependence on gtools and the complexity
 of dealing with triangular matrices.
 
 Regards.
 
 Here it is in full:
 
 D2Mah2 = function(y, x) {
 
  stopifnot(is.data.frame(y), !missing(x))
  stopifnot(dim(y)[1] != dim(x)[1])
  y= as.matrix(y)
  x= as.factor(x)
  man  = manova(y ~ x)
  E= summary(man)$SS[2] #Matrix E
  S= as.matrix(E$Residuals)/man$df.residual
  InvS = solve(S)
  mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))
 
  library(gtools)
  nObjects = nrow(mds)
 
  ### changed part is next two statements
  f - function(a,b) mapply(function(a,b)
(mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b)
 
  D2 - outer(seq(nObjects), seq(nObjects), f)
 }
 
 #
 # test
 #
 D2M2 = D2Mah2(iris[,1:4], iris[,5])
 print(D2M2)
 
 
 
 
 On 7/10/05, Jose Claudio Faria [EMAIL PROTECTED] wrote:
  Well, as I did not get a satisfactory reply to the original question I 
  tried to
  make a basic function that, I find, solve the question.
 
  I think it is not the better function, but it is working.
 
  So, perhaps it can be useful to other people.
 
  #
  # Calculate the matrix of Mahalanobis Distances between groups
  # from data.frames
  #
  # by: José Cláudio Faria
  # date: 10/7/05 13:23:48
  #
 
  D2Mah = function(y, x) {
 
stopifnot(is.data.frame(y), !missing(x))
stopifnot(dim(y)[1] != dim(x)[1])
y= as.matrix(y)
x= as.factor(x)
man  = manova(y ~ x)
E= summary(man)$SS[2] #Matrix E
S= as.matrix(E$Residuals)/man$df.residual
InvS = solve(S)
mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))
 
colnames(mds) = names(y)
Objects   = levels(x)
rownames(mds) = Objects
 
library(gtools)
nObjects = nrow(mds)
comb = combinations(nObjects, 2)
 
tmpD2 = numeric()
for (i in 1:dim(comb)[1]){
  a = comb[i,1]
  b = comb[i,2]
  tmpD2[i] = (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,])
}
 
# Thanks Gabor for the below
tmpMah = matrix(0, nObjects, nObjects, dimnames=list(Objects, Objects))
tmpMah[lower.tri(tmpMah)] = tmpD2
D2 = tmpMah + t(tmpMah)
return(D2)
  }
 
  #
  # To try
  #
  D2M = D2Mah(iris[,1:4], iris[,5])
  print(D2M)
 
  Thanks all for the complementary aid (specially to Gabor).
 
  Regards,
  --
  Jose Claudio Faria
  Brasil/Bahia/UESC/DCET
  Estatistica Experimental/Prof. Adjunto
  mails:
   [EMAIL PROTECTED]
   [EMAIL PROTECTED]
   [EMAIL PROTECTED]
  tel: 73-3634.2779
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Boxplot in R

2005-07-10 Thread Larry Xie

I am trying to draw a plot like Matlab does: 

The upper extreme whisker represents 95% of the data;
The upper hinge represents 75% of the data;
The median represents 50% of the data;
The lower hinge represents 25% of the data;
The lower extreme whisker represents 5% of the data.

It looks like:

  --- 95%
   |
   |
---   75%
| |
|-|   50%
| |
| |
---   25%
   |
  --- 5%

Anyone can give me some hints as to how to draw a boxplot like that?
What function does it? I tried boxplot() but couldn't figure it out.
If it's boxplot(), what arguments should I pass to the function? Thank
you for your help. I'd appreciate it.

Larry

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Help with Mahalanobis

2005-07-10 Thread Jose Claudio Faria

Indeed, it is very nice Gabor (as always)!

So, a doubt: how to preserve the 'rowname' and 'colname' of D2, like in the 
first function? I think it is useful to posterior analyzes (as cluster, for 
example).

Regards,

# A small correction (reference to gtools was eliminated)
D2Mah2 = function(y, x) {
   stopifnot(is.data.frame(y), !missing(x))
   stopifnot(dim(y)[1] != dim(x)[1])
   y= as.matrix(y)
   x= as.factor(x)
   man  = manova(y ~ x)
   E= summary(man)$SS[2] #Matrix E
   S= as.matrix(E$Residuals)/man$df.residual
   InvS = solve(S)
   mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))
   nObjects = nrow(mds)
   f = function(a,b) mapply(function(a,b)
 (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a, b)
   D2 = outer(seq(nObjects), seq(nObjects), f)
}

#
# test
#
D2M2 = D2Mah2(iris[,1:4], iris[,5])
print(D2M2)

Gabor Grothendieck wrote:
 I think you could simplify this by replacing everything after the
 nObjects = nrow(mds) line with just these two statements.
 
   f - function(a,b) mapply(function(a,b)
 (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b)
 
   D2 - outer(seq(nObjects), seq(nObjects), f)
 
 This also eliminates dependence on gtools and the complexity
 of dealing with triangular matrices.
 
 Regards.
 
 Here it is in full:
 
 D2Mah2 = function(y, x) {
 
   stopifnot(is.data.frame(y), !missing(x))
   stopifnot(dim(y)[1] != dim(x)[1])
   y= as.matrix(y)
   x= as.factor(x)
   man  = manova(y ~ x)
   E= summary(man)$SS[2] #Matrix E
   S= as.matrix(E$Residuals)/man$df.residual
   InvS = solve(S)
   mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))
 
   library(gtools)
   nObjects = nrow(mds)
 
   ### changed part is next two statements
   f - function(a,b) mapply(function(a,b)
 (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b)
 
   D2 - outer(seq(nObjects), seq(nObjects), f)
 }
 
 #
 # test
 #
 D2M2 = D2Mah2(iris[,1:4], iris[,5])
 print(D2M2)
 
 
 
 
 On 7/10/05, Jose Claudio Faria [EMAIL PROTECTED] wrote:
 
Well, as I did not get a satisfactory reply to the original question I tried 
to
make a basic function that, I find, solve the question.

I think it is not the better function, but it is working.

So, perhaps it can be useful to other people.

#
# Calculate the matrix of Mahalanobis Distances between groups
# from data.frames
#
# by: José Cláudio Faria
# date: 10/7/05 13:23:48
#

D2Mah = function(y, x) {

  stopifnot(is.data.frame(y), !missing(x))
  stopifnot(dim(y)[1] != dim(x)[1])
  y= as.matrix(y)
  x= as.factor(x)
  man  = manova(y ~ x)
  E= summary(man)$SS[2] #Matrix E
  S= as.matrix(E$Residuals)/man$df.residual
  InvS = solve(S)
  mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))

  colnames(mds) = names(y)
  Objects   = levels(x)
  rownames(mds) = Objects

  library(gtools)
  nObjects = nrow(mds)
  comb = combinations(nObjects, 2)

  tmpD2 = numeric()
  for (i in 1:dim(comb)[1]){
a = comb[i,1]
b = comb[i,2]
tmpD2[i] = (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,])
  }

  # Thanks Gabor for the below
  tmpMah = matrix(0, nObjects, nObjects, dimnames=list(Objects, Objects))
  tmpMah[lower.tri(tmpMah)] = tmpD2
  D2 = tmpMah + t(tmpMah)
  return(D2)
}

#
# To try
#
D2M = D2Mah(iris[,1:4], iris[,5])
print(D2M)

Thanks all for the complementary aid (specially to Gabor).

Regards,
--
Jose Claudio Faria
Brasil/Bahia/UESC/DCET
Estatistica Experimental/Prof. Adjunto
mails:
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
tel: 73-3634.2779

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 
 
 Esta mensagem foi verificada pelo E-mail Protegido Terra.
 Scan engine: McAfee VirusScan / Atualizado em 08/07/2005 / Versão: 4.4.00 - 
 Dat 4531
 Proteja o seu e-mail Terra: http://mail.terra.com.br/
 
 
 


-- 
Jose Claudio Faria
Brasil/Bahia/UESC/DCET
Estatistica Experimental/Prof. Adjunto
mails:
  [EMAIL PROTECTED]
  [EMAIL PROTECTED]
  [EMAIL PROTECTED]
tel: 73-3634.2779

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Help with Mahalanobis

2005-07-10 Thread Gabor Grothendieck

This one adds the labels:


D2Mah4 = function(y, x) {

 stopifnot(is.data.frame(y), !missing(x))
 stopifnot(dim(y)[1] != dim(x)[1])
 y= as.matrix(y)
 x= as.factor(x)
 man  = manova(y ~ x)
 E= summary(man)$SS[2] #Matrix E
 S= as.matrix(E$Residuals)/man$df.residual
 InvS = solve(S)
 mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))

 f - function(a,b) mapply(function(a,b)
   mahalanobis(mds[a,],mds[b,],InvS,TRUE), a, b)
 seq. - seq(length = nrow(mds))
 names(seq.) - levels(x)
 D2 - outer(seq., seq., f)
}

#
# test
#
D2M4 = D2Mah4(iris[,1:4], iris[,5])
print(D2M4)



On 7/10/05, Jose Claudio Faria [EMAIL PROTECTED] wrote:
 Indeed, it is very nice Gabor (as always)!
 
 So, a doubt: how to preserve the 'rowname' and 'colname' of D2, like in the
 first function? I think it is useful to posterior analyzes (as cluster, for
 example).
 
 Regards,
 
 # A small correction (reference to gtools was eliminated)
 D2Mah2 = function(y, x) {
   stopifnot(is.data.frame(y), !missing(x))
   stopifnot(dim(y)[1] != dim(x)[1])
   y= as.matrix(y)
   x= as.factor(x)
   man  = manova(y ~ x)
   E= summary(man)$SS[2] #Matrix E
   S= as.matrix(E$Residuals)/man$df.residual
   InvS = solve(S)
   mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))
   nObjects = nrow(mds)
   f = function(a,b) mapply(function(a,b)
 (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a, b)
   D2 = outer(seq(nObjects), seq(nObjects), f)
 }
 
 #
 # test
 #
 D2M2 = D2Mah2(iris[,1:4], iris[,5])
 print(D2M2)
 
 Gabor Grothendieck wrote:
  I think you could simplify this by replacing everything after the
  nObjects = nrow(mds) line with just these two statements.
 
f - function(a,b) mapply(function(a,b)
  (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b)
 
D2 - outer(seq(nObjects), seq(nObjects), f)
 
  This also eliminates dependence on gtools and the complexity
  of dealing with triangular matrices.
 
  Regards.
 
  Here it is in full:
 
  D2Mah2 = function(y, x) {
 
stopifnot(is.data.frame(y), !missing(x))
stopifnot(dim(y)[1] != dim(x)[1])
y= as.matrix(y)
x= as.factor(x)
man  = manova(y ~ x)
E= summary(man)$SS[2] #Matrix E
S= as.matrix(E$Residuals)/man$df.residual
InvS = solve(S)
mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))
 
library(gtools)
nObjects = nrow(mds)
 
### changed part is next two statements
f - function(a,b) mapply(function(a,b)
  (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b)
 
D2 - outer(seq(nObjects), seq(nObjects), f)
  }
 
  #
  # test
  #
  D2M2 = D2Mah2(iris[,1:4], iris[,5])
  print(D2M2)
 
 
 
 
  On 7/10/05, Jose Claudio Faria [EMAIL PROTECTED] wrote:
 
 Well, as I did not get a satisfactory reply to the original question I 
 tried to
 make a basic function that, I find, solve the question.
 
 I think it is not the better function, but it is working.
 
 So, perhaps it can be useful to other people.
 
 #
 # Calculate the matrix of Mahalanobis Distances between groups
 # from data.frames
 #
 # by: José Cláudio Faria
 # date: 10/7/05 13:23:48
 #
 
 D2Mah = function(y, x) {
 
   stopifnot(is.data.frame(y), !missing(x))
   stopifnot(dim(y)[1] != dim(x)[1])
   y= as.matrix(y)
   x= as.factor(x)
   man  = manova(y ~ x)
   E= summary(man)$SS[2] #Matrix E
   S= as.matrix(E$Residuals)/man$df.residual
   InvS = solve(S)
   mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))
 
   colnames(mds) = names(y)
   Objects   = levels(x)
   rownames(mds) = Objects
 
   library(gtools)
   nObjects = nrow(mds)
   comb = combinations(nObjects, 2)
 
   tmpD2 = numeric()
   for (i in 1:dim(comb)[1]){
 a = comb[i,1]
 b = comb[i,2]
 tmpD2[i] = (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,])
   }
 
   # Thanks Gabor for the below
   tmpMah = matrix(0, nObjects, nObjects, dimnames=list(Objects, Objects))
   tmpMah[lower.tri(tmpMah)] = tmpD2
   D2 = tmpMah + t(tmpMah)
   return(D2)
 }
 
 #
 # To try
 #
 D2M = D2Mah(iris[,1:4], iris[,5])
 print(D2M)
 
 Thanks all for the complementary aid (specially to Gabor).
 
 Regards,
 --
 Jose Claudio Faria
 Brasil/Bahia/UESC/DCET
 Estatistica Experimental/Prof. Adjunto
 mails:
  [EMAIL PROTECTED]
  [EMAIL PROTECTED]
  [EMAIL PROTECTED]
 tel: 73-3634.2779
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
 
 
  Esta mensagem foi verificada pelo E-mail Protegido Terra.
  Scan engine: McAfee VirusScan / Atualizado em 08/07/2005 / Versão: 4.4.00 - 
  Dat 4531
  Proteja o seu e-mail Terra: http://mail.terra.com.br/
 
 
 
 
 
 --
 Jose Claudio Faria
 Brasil/Bahia/UESC/DCET
 Estatistica Experimental/Prof. Adjunto
 mails:
  [EMAIL PROTECTED]
  [EMAIL PROTECTED]
  [EMAIL PROTECTED]
 tel: 73-3634.2779


__

Re: [R] Help with Mahalanobis

2005-07-10 Thread Jose Claudio Faria

I think we now have a very good R function here.
Thanks for all Gabor!

JCFaria

Gabor Grothendieck wrote:
 This one adds the labels:
 
 
 D2Mah4 = function(y, x) {
 
  stopifnot(is.data.frame(y), !missing(x))
  stopifnot(dim(y)[1] != dim(x)[1])
  y= as.matrix(y)
  x= as.factor(x)
  man  = manova(y ~ x)
  E= summary(man)$SS[2] #Matrix E
  S= as.matrix(E$Residuals)/man$df.residual
  InvS = solve(S)
  mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))
 
  f - function(a,b) mapply(function(a,b)
mahalanobis(mds[a,],mds[b,],InvS,TRUE), a, b)
  seq. - seq(length = nrow(mds))
  names(seq.) - levels(x)
  D2 - outer(seq., seq., f)
 }
 
 #
 # test
 #
 D2M4 = D2Mah4(iris[,1:4], iris[,5])
 print(D2M4)
 
 
 
 On 7/10/05, Jose Claudio Faria [EMAIL PROTECTED] wrote:
 
Indeed, it is very nice Gabor (as always)!

So, a doubt: how to preserve the 'rowname' and 'colname' of D2, like in the
first function? I think it is useful to posterior analyzes (as cluster, for
example).

Regards,

# A small correction (reference to gtools was eliminated)
D2Mah2 = function(y, x) {
  stopifnot(is.data.frame(y), !missing(x))
  stopifnot(dim(y)[1] != dim(x)[1])
  y= as.matrix(y)
  x= as.factor(x)
  man  = manova(y ~ x)
  E= summary(man)$SS[2] #Matrix E
  S= as.matrix(E$Residuals)/man$df.residual
  InvS = solve(S)
  mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))
  nObjects = nrow(mds)
  f = function(a,b) mapply(function(a,b)
(mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a, b)
  D2 = outer(seq(nObjects), seq(nObjects), f)
}

#
# test
#
D2M2 = D2Mah2(iris[,1:4], iris[,5])
print(D2M2)

Gabor Grothendieck wrote:

I think you could simplify this by replacing everything after the
nObjects = nrow(mds) line with just these two statements.

  f - function(a,b) mapply(function(a,b)
(mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b)

  D2 - outer(seq(nObjects), seq(nObjects), f)

This also eliminates dependence on gtools and the complexity
of dealing with triangular matrices.

Regards.

Here it is in full:

D2Mah2 = function(y, x) {

  stopifnot(is.data.frame(y), !missing(x))
  stopifnot(dim(y)[1] != dim(x)[1])
  y= as.matrix(y)
  x= as.factor(x)
  man  = manova(y ~ x)
  E= summary(man)$SS[2] #Matrix E
  S= as.matrix(E$Residuals)/man$df.residual
  InvS = solve(S)
  mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))

  library(gtools)
  nObjects = nrow(mds)

  ### changed part is next two statements
  f - function(a,b) mapply(function(a,b)
(mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b)

  D2 - outer(seq(nObjects), seq(nObjects), f)
}

#
# test
#
D2M2 = D2Mah2(iris[,1:4], iris[,5])
print(D2M2)




On 7/10/05, Jose Claudio Faria [EMAIL PROTECTED] wrote:


Well, as I did not get a satisfactory reply to the original question I 
tried to
make a basic function that, I find, solve the question.

I think it is not the better function, but it is working.

So, perhaps it can be useful to other people.

#
# Calculate the matrix of Mahalanobis Distances between groups
# from data.frames
#
# by: José Cláudio Faria
# date: 10/7/05 13:23:48
#

D2Mah = function(y, x) {

 stopifnot(is.data.frame(y), !missing(x))
 stopifnot(dim(y)[1] != dim(x)[1])
 y= as.matrix(y)
 x= as.factor(x)
 man  = manova(y ~ x)
 E= summary(man)$SS[2] #Matrix E
 S= as.matrix(E$Residuals)/man$df.residual
 InvS = solve(S)
 mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))

 colnames(mds) = names(y)
 Objects   = levels(x)
 rownames(mds) = Objects

 library(gtools)
 nObjects = nrow(mds)
 comb = combinations(nObjects, 2)

 tmpD2 = numeric()
 for (i in 1:dim(comb)[1]){
   a = comb[i,1]
   b = comb[i,2]
   tmpD2[i] = (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,])
 }

 # Thanks Gabor for the below
 tmpMah = matrix(0, nObjects, nObjects, dimnames=list(Objects, Objects))
 tmpMah[lower.tri(tmpMah)] = tmpD2
 D2 = tmpMah + t(tmpMah)
 return(D2)
}

#
# To try
#
D2M = D2Mah(iris[,1:4], iris[,5])
print(D2M)

Thanks all for the complementary aid (specially to Gabor).

Regards,
--
Jose Claudio Faria
Brasil/Bahia/UESC/DCET
Estatistica Experimental/Prof. Adjunto
mails:
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
tel: 73-3634.2779

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html



Esta mensagem foi verificada pelo E-mail Protegido Terra.
Scan engine: McAfee VirusScan / Atualizado em 08/07/2005 / Versão: 4.4.00 - 
Dat 4531
Proteja o seu e-mail Terra: http://mail.terra.com.br/





--
Jose Claudio Faria
Brasil/Bahia/UESC/DCET
Estatistica Experimental/Prof. Adjunto
mails:
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
tel: 73-3634.2779

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting

Re: [R] O/T -2 Log Lambda and Chi Square

2005-07-10 Thread Spencer Graves

  There is a huge and growing literature on this, including 
Crainiceanu, Ruppert and Vogelsang (2003) some properties of likelihood 
ratio tests in linear mixed models 
(http://www.orie.cornell.edu/~davidr/papers/zeroprob_rev01.pdf).  The 
nlme package includes a function simulate.lme to evalute the adequacy 
of alternative distributions for 2*log(likelihood ratio) for the results 
of lme.

  Much of the careful work on this rests on asymptotic normality of the 
maximum likelihood estimates, and this is the same for 2*log(likelihood 
ratio) as the standard quadratic form in the MLEs.  However, the latter 
is affected by parameter effects, whereas the likelihood ratio statistic 
is only impacted by the intrinsic curvature of the manifold upon which 
the log(likelihood) vector is projected to obtain the MLEs.  For 
nonlinear regression, Bates and Watts (1988) Nonlinear Regression and 
Its Applications (Wiley) computed measures of intrinsic and parameter 
effects curvature for a number of published nonlinear regression 
examples.  In nearly all their examples, the intrinsic curvature was in 
negligible, especially when compared to the parameter effects.

  If this does not answer your question (or lead you to an answer), 
please try a more specific question.

  spencer graves

Laura Holt wrote:

 Hi R People:
 
 Sorry about the off topic question.  Does anyone know the reference
 for -2 Log Lambda  is approx dist. Chi square, please?
 
 It may be Bartlett, but I'm not sure
 
 thanks in advance!
 
 Sincerely,
 Laura Holt
 mailto: [EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

[EMAIL PROTECTED]
www.pdf.com http://www.pdf.com
Tel:  408-938-4420
Fax: 408-280-7915

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Help: Mahalanobis distances between 'Species' from iris

2005-07-10 Thread Jose Claudio Faria

Dear Mulholland,

I'm sorry, I was not clearly and objective sufficiently.
Below you can see what I'm was trying to do:

# D2Mah by JCFaria and Gabor Grothiendieck: 10/7/05 20:46:41
D2Mah = function(y, x) {

  stopifnot(is.data.frame(y), !missing(x))
  stopifnot(dim(y)[1] != dim(x)[1])
  y= as.matrix(y)
  x= as.factor(x)
  man  = manova(y ~ x)
  E= summary(man)$SS[2] #Matrix E
  S= as.matrix(E$Residuals)/man$df.residual
  InvS = solve(S)
  mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))

  f = function(a,b) mapply(function(a,b)
mahalanobis(mds[a,], mds[b,], InvS, TRUE), a, b)
  seq. = seq(length = nrow(mds))
  names(seq.) = levels(x)
  D2 = outer(seq., seq., f)
}

#
# test
#
D2M = D2Mah(iris[,1:4], iris[,5])
print(D2M)
dend = hclust(as.dist(sqrt(D2M)), method='complete')
plot(dend)

So, Thanks for the reply.
Best,

JCFaria

Mulholland, Tom wrote:
 Why don't you use mahalanobis in the stats package. The function Returns the 
 Mahalanobis distance of all rows in 'x' and the vector mu='center' with 
 respect to Sigma='cov'. This is (for vector 'x') defined as
 
  D^2 = (x - mu)' Sigma^{-1} (x - mu)
 
 I don't have any idea what this does but it appears to be talking about the 
 same topic. If it is not suitable package fpc has related functions and 
 adehabitat has a function for Habitat Suitability Mapping with Mahalanobis 
 Distances
 
 Tom
 
 
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Jose 
Claudio Faria
Sent: Thursday, 7 July 2005 5:29 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Help: Mahalanobis distances between 'Species' from iris


Dear R list,

I'm trying to calculate Mahalanobis distances for 'Species' 
of 'iris' data
as obtained below:

Squared Distance to Species From Species:

   Setosa Versicolor Virginica
Setosa   0   89.86419 179.38471
Versicolor  89.86419  0  17.20107
Virginica  179.38471   17.20107 0

This distances above were obtained with proc 'CANDISC' of SAS, please,
see Output 21.1.2: Iris Data: Squared Mahalanobis Distances from
http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/
chap21/sect19.htm

 From this distance my intention is to make a cluster 
analysis as below, using
the package 'mclust':

#
# --- Begin R script ---
#
# For units compatibility of 'iris' from R dataset and 'iris' 
data used in
# the SAS example:
Measures = iris[,1:4]*10
Species  = iris[,5]
irisSAS  = data.frame(Measures, Species)

n   = 3
Mah = c(0,
  89.86419,0,
 179.38471, 17.20107, 0)

# My Question is: how to obtain 'Mah' with R from 'irisSAS' data?

D = matrix(0, n, n)

nam = c('Set', 'Ver', 'Vir')
rownames(D) = nam
colnames(D) = nam

k = 0
for (i in 1:n) {
for (j in 1:i) {
   k  = k+1
   D[i,j] = Mah[k]
   D[j,i] = Mah[k]
}
}

D=sqrt(D) #D2 - D

library(mclust)
dendroS = hclust(as.dist(D), method='single')
dendroC = hclust(as.dist(D), method='complete')

win.graph(w = 3.5, h = 6)
split.screen(c(2, 1))
screen(1)
plot(dendroS, main='Single', sub='', xlab='', ylab='', col='blue')

screen(2)
plot(dendroC, main='Complete', sub='', xlab='', col='red')
#
# --- End R script ---
#

I always need of this type of analysis and I'm not founding 
how to make it in 
the CRAN documentation (Archives, packages: mclust, cluster, 
fpc and mva).

Regards,
-- 
Jose Claudio Faria
Brasil/Bahia/UESC/DCET
Estatistica Experimental/Prof. Adjunto
mails:
  [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Boxplot in R

2005-07-10 Thread Adaikalavan Ramasamy

1) The boxplot in R does the 25%, 50% and 75% mark as you want
2) Check out the range argument in boxplot. I think you can redefine
your 5% and 95% quantile in terms of IQR for symmetric distribution and
hence use this feature.

However I find it easier to calculate these numbers manually and feed it
into bxp() function. Here is one such function (with lots of room for
improvement).


matlab.boxplot - function(m){
  m  - as.matrix(m)
  bp - boxplot(data.frame(m), plot=FALSE)
  
  bp$stats - apply( m, 2, function(x) 
   quantile(x, c(0.05,0.25, 0.5, 0.75, 0.95), na.rm=T) )
  
  tmp - apply( m, 2, function(x){
under - x[ which( x  quantile(x, 0.05, na.rm=T) ) ]
over  - x[ which( x  quantile(x, 0.95, na.rm=T) ) ]
return( c(under, over) )
  })   # always a matrix in this case
  bp$out   - c(tmp)
  
  bp$group - rep(1:ncol(tmp), each=nrow(tmp))
  
  bxp(bp)
}


Some usage examples : 

 matlab.boxplot( rnorm(50) )# a vector

 my.mat - matrix( rnorm(300), nc=3 )
 matlab.boxplot( my.mat )   # a matrix


Regards, Adai




On Sun, 2005-07-10 at 18:10 -0500, Larry Xie wrote:
 I am trying to draw a plot like Matlab does: 
 
 The upper extreme whisker represents 95% of the data;
 The upper hinge represents 75% of the data;
 The median represents 50% of the data;
 The lower hinge represents 25% of the data;
 The lower extreme whisker represents 5% of the data.
 
 It looks like:
 
   --- 95%
|
|
 ---   75%
 | |
 |-|   50%
 | |
 | |
 ---   25%
|
   --- 5%
 
 Anyone can give me some hints as to how to draw a boxplot like that?
 What function does it? I tried boxplot() but couldn't figure it out.
 If it's boxplot(), what arguments should I pass to the function? Thank
 you for your help. I'd appreciate it.
 
 Larry
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Boxplot in R

2005-07-10 Thread Adaikalavan Ramasamy

Just an addendum on the philosophical aspect of doing this.

By selecting the 5% and 95% quantiles, you are always going to get 10%
of the data as extreme and these points may not necessarily outliers.
So when you are comparing information from multiple columns (i.e.
boxplots), it is harder to say which column contains more extreme value
compared to others etc.

Regards, Adai



On Sun, 2005-07-10 at 18:10 -0500, Larry Xie wrote:
 I am trying to draw a plot like Matlab does: 
 
 The upper extreme whisker represents 95% of the data;
 The upper hinge represents 75% of the data;
 The median represents 50% of the data;
 The lower hinge represents 25% of the data;
 The lower extreme whisker represents 5% of the data.
 
 It looks like:
 
   --- 95%
|
|
 ---   75%
 | |
 |-|   50%
 | |
 | |
 ---   25%
|
   --- 5%
 
 Anyone can give me some hints as to how to draw a boxplot like that?
 What function does it? I tried boxplot() but couldn't figure it out.
 If it's boxplot(), what arguments should I pass to the function? Thank
 you for your help. I'd appreciate it.
 
 Larry
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] package loading smooth.lf (LOCFIT), couldn't find function smooth.lf

Re: [R] Garch in a model with explanatory variables

Re: [R] Quantile normalization and NA

[R] Using a string as a filter

Re: [R] package loading smooth.lf (LOCFIT), couldn't find functio n smooth.lf

Re: [R] Quantile normalization and NA

Re: [R] Help to make a specific matrix

Re: [R] Using a string as a filter

Re: [R] comparing strength of association instead of strength of evidence?

Re: [R] aregImpute: beginner's question

Re: [R] missing data imputation

Re: [R] comparing strength of association instead of strength of evidence?

Re: [R] Help with Mahalanobis

Re: [R] Help with Mahalanobis

[R] not supressing leading zeros when reading a table?

Re: [R] not supressing leading zeros when reading a table?

Re: [R] not supressing leading zeros when reading a table?

Re: [R] not supressing leading zeros when reading a table?

Re: [R] not supressing leading zeros when reading a table?

[R] O/T -2 Log Lambda and Chi Square

Re: [R] Help with Mahalanobis

[R] Boxplot in R

Re: [R] Help with Mahalanobis

Re: [R] Help with Mahalanobis

Re: [R] Help with Mahalanobis

Re: [R] O/T -2 Log Lambda and Chi Square

Re: [R] Help: Mahalanobis distances between 'Species' from iris

Re: [R] Boxplot in R

Re: [R] Boxplot in R

29 matches

Site Navigation

Mail list logo

Footer information