Re: [R] scale or not to scale that is the question - prcomp

Duncan Murdoch Wed, 19 Aug 2009 07:36:19 -0700

On 8/19/2009 10:14 AM, Petr PIKAL wrote:

Duncan Murdoch <murd...@stats.uwo.ca> napsal dne 19.08.2009 15:25:00:
On 19/08/2009 9:02 AM, Petr PIKAL wrote:
> Thank you
>> Duncan Murdoch <murd...@stats.uwo.ca> napsal dne 19.08.2009 14:49:52:>>> On 19/08/2009 8:31 AM, Petr PIKAL wrote:
>>> Dear all
>>>
>> <snip>>>> I would say the answer depends on the meaning of the variables. In
the
>> unusual case that they are measured in dimensionless units, it might>> make sense not to scale. But if you are using arbitrary units of>> measurement, do you want your answer to depend on them? For example,
if
>>> you change from Kg to mg, the numbers will become much larger, the>> variable will contribute much more variance, and it will become a
more
>> important part of the largest principal component.  Is that sensible?
>> Basically variables are in percentages (all between 0 and 6%) except
dus
> which is present or not present (for the purpose of prcomp transformed
to
> 0/1 by as.numeric:). The only variable which is not such is iep which
is
> basically in range 5-8. So ranges of all variables are quite similar.>> What surprises me is that biplot without scaling I can interpret by
used
> variables while biplot with scaling is totally different and those two
> pictures does not match at all. This is what surprised me as I would> expected just a small difference between results from those two
settings
> as all numbers are quite comparable and does not differ much.
If you look at the standard deviations in the two cases, I think you can
see why this happens:

Scaled:

Standard deviations:
[1] 1.3335175 1.2311551 1.0583667 0.7258295 0.2429397

Not Scaled:

Standard deviations:
[1] 1.0030048 0.8400923 0.5679976 0.3845088 0.1531582
The first two sds are close, so small changes to the data will affect
I see. But I would expect that changes to data made by scaling would notchange it in such a way that unscaled and scaled results are completelydifferent.
their direction a lot.  Your biplots look at the 2nd and 3rd components.
Yes because grouping in 2nd and 3rd component biplot can be easilyexplained by values of some variables (without scaling).I must admit that I do not use prcomp much often and usually scaling cangive me "explainable" result, especially if I use it to "variablereduction". Therefore I am reluctant to use it in this case.
when I try "more standard" way
fit<-lm(iep~sio2+al2o3+p2o5+as.numeric(dus), data=rglp)
summary(fit)
Call:
lm(formula = iep ~ sio2 + al2o3 + p2o5 + as.numeric(dus), data = rglp)

Residuals:
Min 1Q Median 3Q Max-0.41751 -0.15568 -0.03613 0.20124 0.43046
Coefficients:
Estimate Std. Error t value Pr(>|t|)(Intercept) 7.12085 0.62257 11.438 8.24e-08 ***sio2 -0.67250 0.20953 -3.210 0.007498 **al2o3 0.40534 0.08641 4.691 0.000522 ***
p2o5            -0.76909    0.11103  -6.927 1.59e-05 ***
as.numeric(dus) -0.64020 0.18101 -3.537 0.004094 **
I get quite plausible result which can be interpreted without problems.
My data is a result of designed experiment (more or less :) and thereforeall variables are significant. Is that the reason why scaling may byeinappropriate in this case?

No, I think it's just that the cloud of points is approximatelyspherical in the first 2 or 3 principal components, so the principalcomponent directions are somewhat arbitrary. You just got lucky thatthe 2nd and 3rd components are interpretable: I wouldn't put too muchfaith in being able to repeat that if you went out and collected a newset of data using the same design.


Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] scale or not to scale that is the question - prcomp

Reply via email to