Hi Davis/Gordon:
I posted my question here again hope you can see it.
When I tried edgeR and met a problem with the number of pseudocounts
for each library after normalization, which should come to close numbers. This
have been addressed in edgeR
several times that "the total counts in each libray of the pseudocounts
agrees well with the common library size" (page 27 & 44 of the
user's guide), but my result are quite different between treatments although
for the replicates within treatment the pseudocounts are very similar. I
can't get to the common.lib.size for each treatment after I tried several
methods (TMM, RLE and quantile).
1) Did I miss anything during my run with edgeR? How can I assure the
normalization went well?
2) Does the normalized library size of the conditions matter or NOT, if they
are different from the common.lib.size?
3) Is the result still meaningful even the library sizes of pseudocounts are
different?
4) What could probably be the reason(s) to cause the library sizes of
pseudocounts so different?
5) Should I remove the smaller number reads as some other people do?
After I removed the smaller numbers of counts (<=40 in >=6 out of
14 samples), the normalized library sizes become very similar.
I can feel my lack of mathematics for the packages. I attach part of my code
here.
---------------------------------------------------------------------
d$samples$lib.size
#"Zygote1", 21012147
"Zygote2", 19924212
"Octant1", 9660245
"Octant2", 26002900
"Globular1",17139388
"Globular2", 7649319
"Heart1", 16430105
"Heart2", 20101956
"Torpedo1", 12920266
"Torpedo2", 6306742
"Bent1", 44241095
"Bent2", 20094409
"Mature1", 15166090
"Mature2", 23203758
d$common.lib.size
[1] 16554344.47
colSums(d$pseudo.alt)
# Zygote1 21523774.62
Zygote2 21638415.63
Octant1 14533481.82
Octant2 12046955.46
Globular1 18920316.62
Globular2 18439528.30
Heart1 11754608.30
Heart2 12759230.11
Torpedo1 11248245.52
Torpedo2 11410667.92
Bent1 16101723.65
Bent2 17980670.24
Mature1 26785396.02
Mature2 27067289.80
#
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8
LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_CA.UTF-8 LC_PAPER=en_CA.UTF-8
LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ALL_1.4.7 Biobase_2.12.1 limma_3.8.2 edgeR_2.2.5
loaded via a namespace (and not attached):
[1] tools_2.13.0
---------------------------------------------------------------------
[[elided Hotmail spam]]
Yifang
Yifang Tan
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing