Re: [R] Statistical analysis of olive dataset

Michael Dewey Sun, 13 Mar 2016 01:27:08 -0800

Dear Axel

Since you are using princomp (among other things) you might find thebiplot function useful on the output of princomp.

I have not studies your code in detail but you do seem to be doingseveral things in multiple ways using functions from different sources.I wonder whether it might be better to stick to fewer functions.


On 12/03/2016 17:39, Axel wrote:

Hi to all the members of the list!

I am a novice as regards to statistical
analysis and the use of the R software, so I am experimenting with the dataset
"olive" included in the package "tourr".
This dataset contains the results of
the determination of the fatty acids in 572 samples of olive oil from Italy
(columns from 3 to 10) along with the area and the region of origin of the oil
(respectively, column 1 and column 2).

The main goal of my analysis is to
determine which are the fatty acids that characterize the origin of an oil. As
a secondary goal, I wolud like to insert the results of the chemical analysis
of an oil that I analyzed (I am a Chemistry student) in order to determine its
region of production. I do not know if this last thing is possibile.

I am
using R 3.2.4 on MacOS X El Capitan with the packages "tourr" and "psych"
loaded.
Here are the commands I have used up to now:

olivenum <- olive[,c(3:
10)]
mean <- colMeans(olivenum)
sd <- sapply(olivenum,sd)
describeBy(olivenum,
olive[2])
pairs(olivenum)
R <- cor(olivenum)
eigen(R)
# Since the first three
autovalues are greater than 1, these are the main components (column 1, 2 and
3). But I can determine them also using a scree diagram as following. Right?

autoval <- eigen(R)$values
autovec <- eigen(R)$vectors
pvarsp <- autoval/ncol
(olivenum)
plot(autoval,type="b",main="Scree diagram",xlab="Number of
components",ylab="Autovalues")
abline(h=1,lwd=3,col="red")

eigen (R)$vectors[,
1:3]
olive.scale <- scale(olivenum,T,T)
points <- olive.scale%*%autovec[,1:3]


#Since I selected three main components (three columns), how should I plot the
dispersion graph? I do not think that what I have done is right:
plot(points,
main="Dispersion graph",xlab="Component 1",ylab="Component 2")
princomp
(olivenum,cor=T)
#With the following command I obtain a summary of the
importance of components. For example, the variance of component 1 is about
0,465, of component 2 is 0,220 and of component 3 is 0,127 with a cumulative
variance of 0,812. This means that the values in the first three columns of the
matrix "olivenum" mostly characterize the differences between the observations.
Right?
summary(princomp(olivenum,cor=T))
screeplot(princomp(olivenum,cor=T))

plot(princomp(olivenum,cor=T)$scores,rownames(olivenum))
abline(h=0,v=0)

I
determined that three components can explain a great part of variability but I
don't know which are these components. How should I continue?

Thank you for

attention,
Axel

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Michael
http://www.dewey.myzen.co.uk/home.html

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Statistical analysis of olive dataset

Reply via email to