Hi everyone,

I thought I would try this list before the general Bioconductor one because my question pertains to NGS counts, although in reality it's a general statistical theory question. I hope someone can help me or point me in the right direction! Typically, you cannot compare counts from different samples directly, but instead you have adjust by the total number of counts obtained for each sample, correct? This assumes that any changes in the counts of particular sequences will not substantially affect the total count number... but what if it might? I'm helping a colleague with some data where they sequenced the 18-30 nt fraction of RNA to look for miRNAs; they got 1.1 to 2.1 million reads per sample, but these aligned to only 187 miRNAs! Some of the miRNAs have up to 30% of all reads, which is a really large percentage. Say a miRNA "X" that is 30% of the reads doubles its count number in another sample, but the counts for all other miRNAs are the same. The new percentage of "X" in the second sample is not 60%, but instead 46.15%, and the observed ratios of all the other miRNAs are decreased by a factor or 0.77 (= 1/1.3). Is there any way to correct for this? What do you do when the top 5 miRNAs make up 70% of the counts??

Thanks,
Jenny

Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA

ph: 217-244-7355
fax: 217-265-5066
e-mail: [email protected]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to