Dear All,
I have being working with pair samples for 3 subjects using edgeR package
and I am puzzle with the results of my normalization. After
normalization, the data is skewed towards the LS group, and as a result,
I get much more genes up than down-regulated. We have study this disease
extensively in large samples with microarray and this is not the case
there, so now I am suspicious of my normalization.
I am including teh code and a pdf with the smear plot using the
normalization options in edgeR. On all of them the data looks worst than
after normalization.
If someone can look to what I did and point to any mistake, I will
really appreciate.
I dont know if the point is that I am deleting the unmapped reads before
normalization.
I was instructed as such in the SeqAnswer forum.
## Reading Files
files<- dir(pattern="*\\counts.txt$")
files.pheno<-data.frame(files=files,
group=factor(substr(files,1,2),levels=c("NL","LS")),
Patient=factor(substr(files,3,4)))
PScounts<-readDGE(files.pheno)
colnames(PScounts)<-paste(PScounts$samples$group,PScounts$samples$Patient,sep='-')
##delete unmmaped reads
unmmaped<-c('no_feature','ambiguous','not aligned','too low aQual')
PScounts<-PScounts[-which(rownames(PScounts$counts)%in%unmmaped),]
#Calculate Normalizations
d.PS<- calcNormFactors(PScounts)
pdf('Normalization Plots.pdf',height=10,width=10)
layout(matrix(1:4,2,2,byrow=TRUE))
a<-plotSmear(PScounts,
panel.first=grid(),smooth.scatter=FALSE,main='before normalization')
ma.plot(a$A,a$M,plot.method='add',cex=0)
b<-plotSmear(d.PS, panel.first=grid(),smooth.scatter=FALSE,main='after TMM')
ma.plot(b$A,b$M,plot.method='add',cex=0)
rm(b)
d.PS.2<- calcNormFactors(PScounts,method='RLE')
b<-plotSmear(d.PS, panel.first=grid(),smooth.scatter=FALSE,main='after RLE')
ma.plot(b$A,b$M,plot.method='add',cex=0)
rm(b)
d.PS.3<- calcNormFactors(PScounts,method='quantile')
b<-plotSmear(d.PS.3, panel.first=grid(),smooth.scatter=FALSE,main='after
quantile')
ma.plot(b$A,b$M,plot.method='add',cex=0)
rm(b)
dev.off()
d.PS$sample ###(after TMM)
files group Patient lib.size norm.factors
LS-25 LS252.counts.txt LS 25 23067191 0.9085
LS-28 LS287.counts.txt LS 28 20684675 0.9056
LS-29 LS292.counts.txt LS 29 19881245 0.9965
NL-25 NL251.counts.txt NL 25 19665929 1.0129
NL-28 NL286.counts.txt NL 28 22938039 1.1554
NL-29 NL291.counts.txt NL 29 20541691 1.0422
d.PS.2$sample ###after RLE
files group Patient lib.size norm.factors
LS-25 LS252.counts.txt LS 25 23067191 0.9495
LS-28 LS287.counts.txt LS 28 20684675 0.9898
LS-29 LS292.counts.txt LS 29 19881245 1.0385
NL-25 NL251.counts.txt NL 25 19665929 0.9592
NL-28 NL286.counts.txt NL 28 22938039 1.0572
NL-29 NL291.counts.txt NL 29 20541691 1.0104
d.PS.3$sample ###after quantiles
files group Patient lib.size norm.factors
LS-25 LS252.counts.txt LS 25 23067191 0.8659
LS-28 LS287.counts.txt LS 28 20684675 0.9656
LS-29 LS292.counts.txt LS 29 19881245 1.1302
NL-25 NL251.counts.txt NL 25 19665929 0.8887
NL-28 NL286.counts.txt NL 28 22938039 1.0885
NL-29 NL291.counts.txt NL 29 20541691 1.0939
Mayte Suarez-Farinas
Research Associate, The Rockefeller University
Biostatistician, The Rockefeller University Hospital
1230 York Ave, Box 178,
New York, NY, 10065
+1(212) 327-8213
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing